• 沒有找到結果。

Comparative study of audio spatializers for dual-loudspeaker mobile phones

N/A
N/A
Protected

Academic year: 2021

Share "Comparative study of audio spatializers for dual-loudspeaker mobile phones"

Copied!
12
0
0

加載中.... (立即查看全文)

全文

(1)

Comparative study of audio spatializers for dual-loudspeaker

mobile phones

Mingsian R. Bai,a兲 Geng-Yu Shih, and Chih-Chung Lee

Department of Mechanical Engineering, National Chiao-Tung University, 1001 Ta-Hsueh Road, Hsin-Chu 300, Taiwan, Republic of China

共Received 16 May 2006; revised 11 October 2006; accepted 11 October 2006兲

MPEG-1, layer 3 handsets equipped with dual loudspeakers and three-dimensional audio modules have received much attention in the market of consumer electronics. To create spatial impression during audio reproduction, the head-related transfer function共HRTF兲 and the crosstalk cancellation system共CCS兲 are key elements in many audio spatializers. However, there are many factors that one should take into account during the design and implementation stages of an audio spatializer in the handset application. In the paper, a comprehensive study was undertaken to compare various audio spatializers for use with dual-loudspeaker handsets, in the context of inverse filtering strategies. Two deconvolution approaches, the frequency-domain method and the time-domain method, are employed to design the required inverse filters. Different approaches to design audio spatializers with the HRTF, CCS, and their combination are compared. In particular, two modified CCS approaches are suggested. Issues in the implementation phase such as regularization, complex smoothing, and structures of inverse filters are also addressed in the paper. Comprehensive objective and subjective tests were conducted to investigate the aforementioned aspects of audio spatializers. The data obtained from the subjective tests are processed by using the multianalysis of variance to justify statistical significance of the results. © 2007 Acoustical Society of America.

关DOI: 10.1121/1.2387121兴

PACS number共s兲: 43.60.Dh, 43.60.Pt, 43.60.Qv, 43.60.Uv 关EJS兴 Pages: 298–309

I. INTRODUCTION

Thanks to rapid advances of mobile communication technology, handsets have swiftly entered everyone’s daily life. In addition to a simple phone, a nowadays’ handset has to serve also as a camera, a personal digital assistant, MPEG-1, layer 3 共MP3兲 player, and even a video player in the third-generation application. In order to cater to the ever-increasing demands of high quality audio, three-dimensional 共3D兲 audio reproduction for use with dual-loudspeaker hand-sets has emerged. In 3D audio reproduction, the head-related transfer function共HRTF兲 and the crosstalk cancellation sys-tem共CCS兲 are two core technologies. HRTF is a mathemati-cal model representing the propagation process from a sound source to the human ears. HRTFs thus contain localization cues as a result of the propagation delay and the diffraction effects due to the head, ears, and even torso. This allows us to create a directional impression by properly synthesizing HRTFs at the prescribed direction.1Although this is effective in headphone reproduction, a crosstalk problem arises when loudspeakers are used as the rendering transducers.2,3 To overcome this problem, the CCS based on inverse filtering are employed to minimize the effects due to crosstalk that can obscure sound image. In general, two types of deconvo-lution approaches, the frequency-domain method4 and the time-domain method,5,6can be utilized to design the required inverse filters. Since the acoustic systems, or plants, are usu-ally noninvertible, some regularization measures have to be

taken in these methods to avoid excessive boosts for the inverse filters caused by overcompensating the acoustic sys-tem. As an effective alternative, excessive gain of the inverse filters can also be avoided by smoothing the frequency re-sponse functions of the acoustic system prior to the inversion process.7

In inverse filter design, Norcross et al. pointed out that the time-domain methods are subjectively more robust but computationally less efficient than the frequency-domain method.8The main difficulty in the inversion process lies in the fact that the acoustic plants are typically nonminimum phase, meaning that a causal inverse filter does not exist.9To cope with the problem, a modeling delay was first introduced by Clarkson et al.10 Furthermore, Kirkeby et al.11 used the least-squares method along with a modeling delay to find the causal inverse filters. Wang and Pai also applied the time-domain method to determine the optimal modeling delay for the inverse filters.12

Conventional inverse filtering leads to reduced crosstalk and equalized ipsilateral response. However, if the CCS is inadequately designed, the latter effect can result in audible high-frequency artifacts. To address the problem, two modi-fied CCS are proposed in this paper. The idea underlying these modified methods is to eliminate the crosstalk of the contralateral paths from the loudspeakers to the listener’s ears without equalizing the ipsilateral paths. The modified CCS methods also have a desirable property that the CCS is loudspeaker independent. Extensive tests were conducted in the work to compare different approaches of audio spatializ-ers based on the HRTF, CCS, and their combination. a兲Author to whom correspondence should be addressed; electronic mail:

(2)

Another issue concerning the implementation phase is the structures of inverse filters. In considering psychoacous-tic aspects and computational cost, the CCS can be imple-mented in a few different ways. Three structures of CCS are compared in this paper: the direct filtering method, the filter bank method,13 and the simple lowpass mixing method.14 The direct filtering method can be further divided into the full-band and the band-limited design.3The band-limited de-sign limits the crosstalk cancellation to function only within the band 200– 6 kHz.

In this work, comprehensive objective and subjective tests were conducted to investigate the aforementioned as-pects of audio spatializers for mobile phones. The data of subjective tests are processed by using the multianalysis of variance共MANOVA兲 to justify the statistical significance of the results.15

II. CROSSTALK CANCELLATION SYSTEMS A. Problem of crosstalk cancellation

Figure 1 shows a two-channel loudspeaker reproduction scenario, where H11and H22are ipsilateral transfer functions, and H12and H21are contralateral transfer functions from the loudspeakers to the listener’s ears. The contralateral transfer functions, also known as the crosstalk, interfere with hu-man’s localization of sound sources when the binaural sig-nals are reproduced by loudspeakers. In order to mitigate the effects of crosstalk, the crosstalk canceller is chosen to be the inverse of the acoustic plants such that the overall response becomes a diagonalized and distortionless response

共n − m兲 0 0 ␦共n − m兲

=

h11共n兲 h12共n兲 h21共n兲 h22共n兲

c11共n兲 c12共n兲 c21共n兲 c22共n兲

, 共1兲

where 丢 denotes convolution operation and hij共n兲, cij共n兲,

and␦共n−m兲 represent the impulse responses of the

respec-tive acoustic paths, the inverse filters, and the discrete delta function delayed by m samples of delay to ensure a causal inverse filter. On the basis of inverse filtering, two deconvo-lution schemes along with regularization techniques are de-scribed in the following.

B. Multichannel inverse filtering with regularization 1. Frequency-domain deconvolution

The first method to be considered is the frequency-domain method4suggested by Kirkeby et al. In this method, a cost function J is defined as the sum of the “performance error” eHe and the “input power” vHv,

J共ei兲 = eH共ej兲e共ej␻兲 +␤共␻兲vH共ej兲v共ej␻兲 共2兲 with␻being the angular frequency. A regularization param-eter␤共␻兲 which varies from zero to infinite weighs the input power against the performance error. This is a well known Tikhonov regularization procedure. The optimal inverse fil-ters obtained by minimizing J can be written in terms of discrete frequency index k as follows:

C共k兲 = 关HH共k兲H共k兲 +共k兲I兴−1HH共k兲, k = 1,2, ... ,N c,

共3兲 where Nc-point fast Fourier transform共FFT兲 is assumed, and

H共k兲 is the transfer matrix of acoustic plant. The coefficients

of inverse filters can be obtained using the inverse FFT of the frequency response in Eq.共3兲, with the aid of appropriate windowing. In order to ensure the causality of the CCS fil-ters, circular shift 共Nc/ 2 maximum兲 of the resulting impulse

response is needed to introduce a modeling delay.16

2. Time-domain deconvolution

The time-domain method is based on a matrix formalism of Eq.共1兲. In this method, a single-channel inverse filter can be obtained by solving the following matrix equation:5,6

d共0兲 ⯗ d共Nh+ Nc− 2兲 0 ⯗ 0

=

h共0兲 0 ⯗  ⯗ h共Nh− 1兲  h共0兲 ⯗  ⯗ 0 h共Nh− 1兲 ␧ . . . 0 ⯗  ⯗ 0 . . . ␧

c共0兲 ⯗ c共Nc− 1兲

, 共4兲 or simply d = hc. 共5兲

In the preceding two equations, the vector d represents the desired response, the matrix h is composed of the impulse responses h共n兲 of acoustical plants measured a priori, Nhis

the length of the plant impulse response h共n兲, the vector c represents the impulse response of the inverse filters, and Nc

FIG. 1. Schematic diagram showing an audio reproduction system using two-channel stereo loudspeakers. Acoustic transfer functions between the loudspeakers and the listener’s ears are indicated in the figure.

(3)

is the length of the inverse filter. The parameter ␧ in the lower part of the matrix h is a small regularization constant. The forgoing single-channel deconvolution technique can be readily extended to the two-channel case described by the following matching matrix:

d 0 0 d

=

h11 0 h12 0 0 h11 0 h12 h21 0 h22 0 0 h21 0 h22

冥冤

c11 c12 c21 c22

, 共6兲

where hij and cij represent the matrices composed of the

impulse responses hij共n兲 and the coefficient vectors of the

filters cij共n兲.

The size of the matrix in Eq. 共6兲 can be quite large. Instead of brute-force inversion, more efficient iteration tech-niques are employed in the work. By exploiting these prop-erties one may use the iterative algorithms such as steepest descent and conjugate-gradient共CG兲 method to calculate the solution.6In both methods, a residual vector R is defined as

R = Dt− HtCt, 共7兲

where Dt, Ht, and Ct, represent the matrices in Eq.共6兲. In the

steepest descent algorithm, the recursive relation for updat-ing the coefficient of the inverse filters can be described as

Ct共i + 1兲 = Ct共i兲 +g共i兲, 共8兲

where i is the iterative index and g is the gradient vector of the cost function with a step size␮. Unlike the steepest de-scent algorithm, a plane search strategy based on the linear combination of gradient vectors consecutive iterations is used in the CG algorithm. Specifically, the coefficient update equation is given as

Ct共i + 1兲 = Ct共i兲 +g共i兲 +s共i兲, 共9兲

where s is the gradient vector in last iteration and␣ is an-other step size parameter. In general, the convergence

behav-ior of the CG method is superbehav-ior to the steepest descent method due to the plane search nature of the former ap-proach.

3. Generalized complex smoothing techniques

Due to the ill-conditioned nature of the acoustical sys-tem, how to properly limit the gain of the inverse filter is a critical issue in designing the CCS. One way to deal with this problem is the regularization method, as already mentioned in the previous section. Another simple but elegant way is to smooth the peaks and dips of the acoustic plant using the generalized complex smoothing technique suggested by Hat-ziantoniou and Mourjopoulos.7 There are two alternative methods for implementing complex smoothing. The first method, uniform smoothing, is to calculate the impulse re-sponse using the inverse FFT of the frequency rere-sponse. Then, apply a time-domain window to truncate and taper the impulse response, which in effect smoothes out the fre-quency response. Finally, recover the frefre-quency response by FFT of the modified impulse response. Alternatively, a non-uniform smoothing method can also be used. This method performs smoothing directly in the frequency domain. The frequency response is circularly convolved with a window whose bandwidth increases with frequency. The choice of the window follows the psychoacoustics that the spectral resolution of human hearing increases with frequency. There-fore, the nonuniformly smoothed frequency response

Hns共m,k兲 =

i=0 N−1

H关共k − i兲mod N兴Wsm共m,i兲, 共10兲

where k, 0ⱕkⱕN−1 is the frequency index and m is the smoothing index corresponding to the length of the smooth-ing window. The smoothsmooth-ing window Wsm共m,k兲 is given by

Wsm共m,k兲 =

b −共b − 1兲cos关共/m兲k兴 2b共m + 1兲 − 1 , k = 0,1, . . . ,m b −共b − 1兲cos关共/m兲共k − N兲兴 2b共m + 1兲 − 1 , k = N − m,N −共m − 1兲, ... ,N − 1 0, k = m + 1, . . . ,N −共m + 1兲. 共11兲

The integer, m = m共k兲, can be considered as a bandwidth function by which a fractional octave or any other nonuni-form frequency smoothing scheme can be implemented. The variable b determines the roll-off rate of the smoothing win-dow. As a special case when b = 1, the window reduces to a rectangular window.

C. Structures of inverse filters

There are a number of different ways to implement the inverse filters of CCS. The direct filtering method, the filter bank method, and the simple lowpass mixing method are three major filtering structures to discuss in this section.

(4)

1. Direct filtering method

In this structure, crosstalk cancellation is carried out by direct filtering using inverse filters. However, crosstalk can-cellation can be demanded either for a full-band 共200–24 kHz兲 performance or just a band-limited perfor-mance 共200–6 kHz兲 in the design stage of inverse filters. The reason for the latter design is twofold. First, the sweet spot in which CCS is effective becomes impractically small at high frequencies. Second, a listener’s head provides natu-ral shadowing at high frequencies so that the need for can-cellation becomes less important. The match equation appro-priate for the band-limited design is written as3

共n − m兲 0 0 ␦共n − m兲

=

h11共n兲 h12共n兲fLP共n兲 h21共n兲fLP共n兲 h22共n兲

c11共n兲 c12共n兲 c21共n兲 c22共n兲

, 共12兲

where fLP共n兲 denotes the impulse response function of a lowpass filter. Thus, the inverse filters should in principle give rise to a flat response within the intended band after compensation.

2. Filter bank method

In the direct filtering approach, even if the inverse filters are deigned for band-limited performance, the filtering pro-cess is still carried out at a sampling rate of 48 kHz. To take advantage of the band-limited design, a subband filtering ap-proach is exploited to simplify the computation. Specifically, a four-channel quadrature mirror filter共QMF兲 bank13is used to implement the CCS. For further enhancement of process-ing efficiency, the polyphase representation is employed to implement the QMF bank, as shown in Fig. 2. The block

E共z兲 is the type 1 polyphase matrix for the analysis bank, and

the block R共z兲 is the type 2 polyphase matrix for the synthe-sis bank. ␯i共n兲 represents the subband signal. The first

sub-band signal is processed by the CCS and the other subsub-band signals are simply delayed by the delay block D共z兲 and trans-mitted to the synthesis filter bank, as shown in Fig. 3.

3. Simple lowpass mixing method

For reference, a brief review of an alternative way of implementing the band-limited design originally proposed by Elliott et al. is also given 共Fig. 4兲.14 In this simple lowpass mixing approach, the input signal is lowpass filtered and down-sampled before sending to the CCS. Sufficient model-ing delays must be inserted in the path. The CCS filters are adaptively updated by comparing the lowpass and delayed input and the lowpass plant out put at the control point 共ears兲. Finally, the output of the CCS is up-sampled and re-mixed into the original full-band signal. The major difference between this method and the preceding filter bank method lies in the fact that the CCS-processed signal is mixed with the unprocessed full-band input in the simple mixing ap-proach, while it is not the case in the filter bank method. This could have potential effect on the localization performance of spatializers.

D. Implemental issue

To facilitate the inverse filter design, the aforementioned smoothing techniques is employed to modify the impulse responses. On the other hand, the regularization parameters

␤ and ␧ are selected to be 0.01 and 0.1 in the frequency-domain and time-frequency-domain deconvolutions, respectively, to limit the gain of the inverse filter to 10 dB maximum.

An objective index, channel separation, is employed to assess the cancellation performance

FIG. 2. The block diagram of a four-channel QMF bank using the polyphase representation.

FIG. 3. Block diagram depicting the filter bank implementation of CCS.

FIG. 4. Block diagram depicting the simple lowpass mixing implementation of CCS.

(5)

Sep共j⍀兲 = Hc共j⍀兲/Hi共j⍀兲, 共13兲

where Hc共j⍀兲 and Hi共j⍀兲 represent the contralateral

共H12, H21兲 and the ipsilateral 共H11, H22兲 frequency re-sponses, respectively. According to the definition, a small 共negative兲 value of channel separation indicates good can-cellation performance.

III. DESIGN OF AUDIO SPATIALIZERS

A brief description of various approached based on HRTF and CCS will be given. For clarity, the experiments of audio spatializers were summarized in Table I.

A. HRTF

As mentioned previously, directional impression can be created by electronically synthesizing the HRTF in the de-sired angle. This is especially important in the case of mobile phones, where loudspeakers closely spaced. In this study, the HRTF database available in the website of the MIT media lab1 was employed to “widen” the sound image. Each im-pulse response originally measured at a Knowles Electronic Mannequin for Acoustic Research 共KEMAR兲 with a sam-pling frequency 44.1 kHz. HRTFs at the azimuth ±30° are implemented as 128-tapped finite impulse response共FIR兲 fil-ters by which the audio input signals are filtered before send-ing to the loudspeakers. The processsend-ing can be written in matrix form as follows:

1共n兲 2共n兲

=

h30 ipsi共n兲 h30 contra共n兲 h30 contra共n兲 h30 ipsi共n兲

x1共n兲 x2共n兲

, 共14兲 where h30 ipsi共n兲 and h30 contra共n兲 denote the ipsilateral and contralateral HRTFs, respectively, at the azimuths ±30°.

B. CCS

The objective of CCS is to minimize the effect of crosstalk. A generic inverse filter of a two-channel CCS can be a factored into the following expression:

C = 1 1 − ITF1ITF2

1/H11 0 0 1/H22

册冋

1 − ITF2 − ITF1 1

, 共15兲 where ITF1= H12/ H11, ITF2= H21/ H22are interaural transfer functions, and the ipsilateral transfer functions H11, H22 and the contralateral transfer functions H12, H21 are de-fined as in Fig. 1. The earlier expression reveals the fact that the inverse filters attempt not only cancel the crosstalk with delays 共the third term on the right hand side兲 but also equalize the ipsilateral response 共the second term on the right hand side兲. The poles of the comb filter of the first term on the right hand side give the ringing frequency.17

The ipsilateral equalization共the second term兲 in the in-verse filters may not be always desirable in practical appli-cation. For example, coloration problem may arise at around 10 kHz when the inverse filters strive to compensate the con-cha dip in the ipsilateral responses, which is largely indepen-dent of loudspeaker span. In addition, the other dips and roll-offs particularly at the very at the low and high frequen-cies in the ipsilateral responses further aggravate this situa-tion. Consequently, an unnatural change of sound quality is often audible during reproduction due to over-compensating the ipsilateral responses. To address the problem, two modi-fied techniques of CCS are suggested in the following.

1. The modified CCS-1

In this method, the diagonal terms of the matching model in the left hand side of Eq. 共1兲 are replaced with delayed ipsilateral impulse responses

TABLE I. The test items used in the subjective evaluation.

Experiment 1 Test 1 Full-band frequency-domain CCS with uniform smoothing

Test 2 Full-band time-domain CCS with uniform smoothing Experiment 2 Test 1 Full-band conventional CCS with uniform smoothing Test 2 Full-band modified CCS-1 with uniform smoothing Test 3 Full-band modified CCS-2 with uniform smoothing Test 4 Commercial spatializer: DiMAGIC VX™ virtual sound

imaging system

Experiment 3 Test 1 Full-band conventional CCS with uniform smoothing Test 2 Band-limited conventional CCS with uniform smoothing Test 3 Filter bank conventional CCS with uniform smoothing Test 4 Simple lowpass mixing conventional CCS with uniform

smoothing

Experiment 4 Test 1 HRTF widening

Test 2 Full-band conventional CCS with uniform smoothing Test 3 Full-band modified CCS-1 with uniform smoothing Test 4 HRTF+ Full-band conventional CCS with uniform

smoothing

Test 5 HRTF+ Full-band modified CCS-1 with uniform smoothing

(6)

h11共n − m兲 ␥ ␥ h22共n − m兲

=

h11共n兲 h12共n兲 h21共n兲 h22共n兲

c11共n兲 c12共n兲 c21共n兲 c22共n兲

, 共16兲 where␥is a small constant, e.g., 0.0001 and m is the mod-eling delay. This in effect modifies the transfer functions of inverse filters in Eq.共15兲 into

C⬇ 1

1 − ITF1ITF2

1 − ITF2

− ITF1 1

. 共17兲

The modified CCS makes no attempt to compensate the ip-silateral responses when canceling the crosstalk. It follows that the sound quality can be better preserved by using this method.

There is another potential benefit in the use of this method. Assume that two speaker responses are displaced by a factor S. Neglecting the parameter, the z-domain version of Eq.共16兲 can be written as

z−mH˜11共z兲S 0 0 z−mH˜ 22共z兲S

11共z兲S H˜12共z兲S 21共z兲S H˜22共z兲S

C11共z兲 C12共z兲 C21共z兲 C22共z兲

, 共18兲 where H˜i represents the transfer function without

loud-speaker responses. Thus, the factor S cancels out on both sides. The implication of this is that the CCS is loudspeaker independent as long as the characteristics of two loudspeak-ers are well matched. This could be a desirable property in practical applications in that a CCS designed off-line is ap-plicable to all systems with different loudspeaker character-istics.

2. The modified CCS-2

Along the same line, another modified CCS is developed to underplay the equalization of ipsilateral response during cancellation of crosstalk. In this approach, the ipsilateral in-verse filters are assigned to be a delayed discrete delta func-tion, i.e., c11= c22=␦共n−m兲 such that the sound quality can be preserved because of the direct transmission of ipsilateral paths. In this setting, the match equation should be modified into

dL共n兲 0 0 dR共n兲

=

h11共n兲 h12共n兲 h21共n兲 h22共n兲

共n − m兲 c12共n兲 c21共n兲共n − m兲

, 共19兲 where the diagonal terms dLand dRare the resulting

ipsilat-eral responses. Expanding this equation only for the off-diagonal terms leads to two equations

共h12共n兲丢␦共n − m兲兲 = − h12共n − m兲 = h11共n兲c12共n兲, 共20兲

共h21共n兲丢␦共n − m兲兲 = − h21共n − m兲 = h22共n兲c12共n兲. 共21兲 The contralateral inverse filters can be obtained by solving this inverse problem. By the same token, it can be shown that this modified CCS is also loudspeaker independent. How-ever, this approach would possibly lead to poor bass re-sponse because the crosstalk canceller will no longer have the factor 1 /共1−ITF2兲, which is essentially a bass boost. IV. EXPERIMENTAL INVESTIGATIONS

A. Experimental arrangement

The experiments were conducted by using a dummy head system 共KEMAR兲 inside a 4 m⫻4 m⫻3 m anechoic chamber, as shown in Fig. 5. An MP3 handset equipped with dual loudspeakers is mounted on a stand. The distance be-tween the handset, and the dummy head is 40 cm. Binaural transfer functions from the loudspeakers to the microphone embedded in the dummy head’s ears were measured by using a spectrum analyzer. The algorithms were implemented on the platform of a fixed-point DSP, ADI BF-533, operating at 48 kHz. The inverse filters were realized as 128-tapped FIR filters in the experiments.

B. Objective experiment

For simplicity, symmetrical acoustic plant is assumed. The head-related impulse responses measured by using the dummy head is shown in Fig. 6. The complex smoothing is applied prior to the design of CCS. In this regard, the CCS will prove more robust against misalignment of the listener’s head than that designed for unsmoothed frequency responses.10,18Figure 7 shows frequency responses obtained using uniform smoothing and nonuniform smoothing. It can be seen that the frequency responses are effectively smoothed by both methods. However, an informal subjective test has indicated that the difference between the two smoothing techniques is hardly detectible. The uniform smoothing method, therefore, is used exclusively in the fol-lowing experiments.

FIG. 5. Experimental arrangement for the dual speaker handset with a dummy head system inside an anechoic chamber.

(7)

Another issue concerning the CCS design is a modeling delay that is necessary to ensure the causality of inverse-filters. This is of fundamental importance whether the frequency-domain method or the time-domain method is used. A simple experiment was conducted to examine the effect of different modeling delays on a 128-tapped filter and a 512-tapped taps filter obtained using the time-domain method. Average channel separation共Ave-Sep, dB兲 between 200 and 20 kHz is calculated to assess the cancellation per-formance. The result summarized in Table II reveals that the optimal modeling delay is approximately half of the length of the inverse filter.

The length of the inverse filter also affects the perfor-mance of CCS. The perforperfor-mance of inverse filters with dif-ferent length of inverse filter is compared in Table III. As expected, the performance of CCS improves as the filter length is increased for both deconvolution methods. How-ever, it is worth noting that the time-domain method outper-forms the frequency-domain method for short filter length such as 128 taps. The frequency-domain method performs well only when a long filter is used. Another drawback of the frequency-domain method can be clearly seen by plotting the magnitudes of the equalized time responses on the dB scale, as suggested by Fielder.19 In Fig. 8共a兲, preringing artifacts are visible共at 1–3 ms兲 in the equalized time responses when the frequency-domain method is used, while no such artifacts are found in the result of the time-domain method in Fig. 8共b兲.

Next, a useful variation of inverse filter design to en-hance CCS performance is examined. Figure 9共a兲 shows the experimental results of the unprocessed and the processed frequency responses with the conventional CCS. While the flat spectrum is attained as expected in the compensated ip-silateral response, the contralateral response is not totally eliminated but amplified at the frequencies above 10 kHz. This incurs some audible coloration at high frequencies. To overcome the problem, the aforementioned modified ap-proaches were employed to suppress the crosstalk while pre-serving the ipsilateral response. Figures 9共b兲 and 9共c兲 refer to the implementation of the modified CCS-1 and CCS-2,

re-spectively. It is observed that not only the ipsilateral response remains largely unchanged but also the contralateral re-sponse is effectively attenuated without undesired amplifica-tion in high frequencies. To explore further the modified CCS, the time responses of the inverse filters of the modified methods are compared with those obtained using the conven-tional identity matching model. Figure 10共a兲 refers to the implementation of the conventional CCS. Figures 10共b兲 and

TABLE II. The average separation obtained using the time-domain method with different delays.

Filter length共Nc兲: 128 taps Filter length共Nc兲: 512 taps

Delay共m兲 Average separation 共dB兲 Delay 共m兲 Average separation 共dB兲

16 −20.583 32 −20.799 32 −20.717 128 −21.592 48 −20.833 256 −21.701 64 −21.050 288 −21.706 80 −21.007 320 −21.692 96 −20.282 448 −20.771

FIG. 7. Comparison between original and the complex smoothed magnitude spectrum. The thick line represents the complex smoothed magnitude re-sponse spectrum.共a兲 Result obtained using uniform smoothing. 共b兲 Result obtained using nonuniform smoothing.

FIG. 6. Head-related impulse responses measured by using the dummy head system.

(8)

10共c兲 refer to the implementation of the modified CCS-1 and CCS-2, respectively. The impulse responses of inverse filters designed using the modified methods are significantly shorter than those of the conventional method. This computational saving is a benefit for real-time implementation.

The inverse filters were implemented by using the band-limited design as detailed in the preceding section. It can be seen in the experimental result of Fig. 11 that the CCS main-tains wideband equalization of the ipsilateral response to re-sult in a flat spectrum, while the cancellation of crosstalk is only attained in low frequency range with some unwanted amplification in the high frequency range. Cancellation per-formance is confined in low frequency range as it should be for the filter bank method and the simple lowpass mixing method since they are essentially band-limit designs.

C. Subjective experiment

In order to assess the perceptual performance of the spa-tializers, subjective listening tests were conducted according to the double-blind triple stimulus with hidden reference method suggested in the standard ITU-R BS. 1116-1.20 The listening tests were carried out inside the anechoic chamber. The program material consists of various instruments with significant dynamic variations between the two stereo chan-nels. Both timbre-related and space-related qualities are con-sidered. The loudness of each reproduced signal was ad-justed with equal power. Nine subjective indices employed in the subjective tests are summarized as follows:

共1兲 Fullness: Dominance of low-frequency sound; 共2兲 Brightness: Dominance of high-frequency sound; 共3兲 Noise and distortion: Any extraneous disturbances to the

signal are considered as noise. Effect on the signal that produces new sounds or timbre change is considered as distortion;

共4兲 Width of stage: Perceived angular width of extreme left to extreme right edges of the stage;

共5兲 Depth perception: Ability to hear that performers are ap-propriately localized from the front to the rear of the sound stage;

共6兲 Spaciousness: Perceived quality of listening within a re-verberant environment. The sound is perceived as open, not constrained to the locations of the loudspeakers. The perception is an important part of the “you are there” sensation;

共7兲 Localization: Determination by a subject of the apparent direction or distance, or both, of a sound source;

共8兲 Robustness: Stability of performance with normal lis-tener movements and listening locations. This index is assessed by 5 and 10 cm lateral movement of listener’s head, and calculating the average grade; and

共9兲 Fidelity: The clarity of the reproduced signals.

Twenty experienced subjects participating in the tests were instructed with definition of the preceding subjective indices and the procedure before the listening tests. The sub-jects were asked to respond after listening in a questionnaire, with the aid of a set of subjective indices placed on a scale from −4 to 4. Positive, zero, and negative scores indicate perceptually improvement, no difference, and degradation, respectively, of the signals after processed by the spatializers. In order to justify the statistical significance, the scores were further processed by using the MANOVA.15 Cases with sig-nificance levels below 0.05 indicate that statistically signifi-cant difference exists among methods. The experiments were summarized in Table I.

The first listening test was carried to compare the frequency-domain and the time-domain methods. The total grades are plotted in Fig. 12. The vertical bars denote 0.95 confidence intervals. The small significance level 共s

TABLE III. The average separation obtained using inverse filtering with different filter length.

Average separation共dB兲 Filter length共Nc兲 Frequency domain Time domain

128 −18.203 −21.050

256 −18.361 −21.608

512 −21.535 −21.705

1024 −22.329 −21.760

2048 −22.375 −21.870

FIG. 8. Equalized time responses plotted on the dB scale.共a兲 Frequency-domain method.共b兲 Time-domain method.

(9)

= 0.041523兲 in the MANOVA output indicates that the differ-ence among the methods is statistically significant. In par-ticular, the time-domain method seemed to significantly out-perform the frequency-domain method for inverse filters of this length共128 taps兲, which is in agreement with the obser-vation in the preceding objective tests.

Next, the second experiment is performed to compare the modified CCS methods and a commercial spatializer21 which is used in this experiment as the benchmark. The re-sults shown in Fig. 13 revealed that the modified method-1 received the highest score among all approaches with strong statistical significance 共s=0.000001兲. The modified method-1 is particularly advantageous when sound quality is used as the performance index in addition to the cancellation performance.

In the third listening test, different structures of CCS implementation are compared. The total grades are summa-rized in Fig. 14. The MANOVA output reveals that signifi-cant difference in performance 共s=0.019207兲 does exist among the methods. The direct filtering method has attained the highest grade, while the simple lowpass mixing method received the lowest grade. In the direct filtering approach,

there is no significant difference between the full-band and the band-limited designs. It is worth noting that the filter bank approach and the simple lowpass mixing approach did not attain the grades as high as two other direct filtering approaches. Possible explanations for this are that the cross-overs in the filter bank are not adequately handled in the filter bank methods, and portion of the low-frequency signal is contaminated by crosstalk in the simple lowpass mixing method.

In the fourth listening test, various audio spatializers uti-lizing the HRTF, the conventional CCS method, the modified CCS method-1, and their combinations are compared. The total grades are summarized in Fig. 15. The MANOVA out-put reveals that significant difference in performance 共s = 0.000001兲 exists among the methods. It is observed from the result that the HRTF approach receives the lowest grade. The “widening” effect provided by the HRTF solely is obvi-ously insufficient to spatialize the sound image due to the severe crosstalk between the closely spaced loudspeakers. In contrast to the HRTF approach, there is a leap in perfor-mance when the CCS comes into play. In particular, the spa-tializer combining the HRTF and the conventional CCS

FIG. 9. Comparison between the unprocessed and the processed frequency responses.共a兲 The conventional CCS. 共b兲 The modified CCS-1. 共c兲 The modified CCS-2.

(10)

method has achieved the highest grade in both spatializing performance and sound quality. Surprisingly, when the modi-fied CCS method is used in combination with the HRTF, there is a sudden drop in performance. It is suspected that double HRTF filtering effect may have contributed to this

result. That is, while the sound quality has already been pre-served by plugging the HRTF in the matching model for the modified CCS, the additional HRTF filtering becomes super-fluous and may adversely affect the sound quality of the processed signal.

FIG. 10. Impulse responses of the inverse filters.共a兲 The conventional CCS method. 共b兲 The modified CCS-1. 共c兲 The modified CCS-2.

FIG. 11. Comparison between the unprocessed frequency response and that processed by using the band-limited CCS.

FIG. 12. Total grades summarized for the first listening test in which the frequency-domain and the time-domain deconvolution methods are com-pared. The significance level, s = 0.041523, in the MANOVA output.

(11)

V. CONCLUSIONS

A comprehensive study has been undertaken to compare various implementation approaches of audio spatializer for handsets fitted with two closely spaced loudspeakers. The HRTF and the CCS techniques were exploited to implement the audio spatializer. Two deconvolution methods were ap-plied to calculate the inverse filters for the CCS design. Ob-jective and subOb-jective experiments reveal that the time do-main approach is superior to the frequency-dodo-main approach when the length of inverse filter is short. An additional ben-efit of the time-domain method is that it is less liable to preringing artifact that frequently appears in the frequency-domain method.

Different structures of CCS were examined in this study. The experimental results indicate that the direct filtering ap-proaches outperform the filter bank method and the simple lowpass mixing method. In addition, two modified CCS techniques were proposed in the present paper. Unlike the conventional method that tends to over-compensate the ipsi-lateral responses, the modified methods are capable of

deliv-ering better spaciousness without compromising on sound quality. Two additional features of the modified CCS which are attractive in practical application lie in its shorter impulse responses of the inverse filters and the loudspeaker-independent property.

Listening tests were also carried out to compare various ways of implementing a spatializer based on HRTF, CCS, and their combination. The experimental results suggest that the widening effect provided by the HRTF solely is insuffi-cient to spatialize the sound image due to the severe crosstalk between the closely spaced loudspeakers. In con-trast to the HRTF approach, there is a leap in performance when the CCS is used. In particular, the spatializer combin-ing the HRTF and the conventional CCS method has achieved the best performance in both spatializing perfor-mance and sound quality.

ACKNOWLEDGMENT

The work was supported by the National Science Coun-cil in Taiwan, Republic of China, under the Project No. NSC94-2212-E-009-019.

1B. Gardner and K. Martin, “HRTF measurements of KEMAR dummy-head microphone,” MIT Media Lab, 1994; http://sound.media.mit.edu/ KEMAR.html. Last accessed 11/10/06.

2A. Sibbald, “Transaural acoustical crosstalk cancellation,” Sensaura White Paper, 1999, http://www.sensaura.co.uk. Last accessed 11/10/06. 3W. G. Gardner, 3-D Audio Using Loudspeakers共Kluwer Academic,

Dor-drecht, 1998兲.

4O. Kirkeby, P. A. Nelson, and H. Hamada, “Fast deconvolution of multi-channel systems using regularization,” IEEE Trans. Speech Audio Process.

6, 189–195共1998兲.

5O. Kirkeby and P. A. Nelson, “Digital filter design for inversion problems in sound reproduction,” J. Audio Eng. Soc. 47, 583–595共1999兲. 6J. F. Claerbout, Earth Soundings Analysis: Processing Versus Inversion

(PVI), 1992; http://sep.stanford.edu/sep/prof/toc_html/index.html. Last

ac-cessed 11/10/06.

7P. D. Hatziantoniou and J. N. Mourjopoulos, “Generalized fractional-octave smoothing of audio and acoustic responses,” J. Audio Eng. Soc.

48, 259–280共2000兲.

8S. G. Norcross, G. A. Soulodre, and M. C. Lavoie, “Subjective investiga-tions of inverse filtering,” J. Audio Eng. Soc. 52, 1003–1028共2004兲. FIG. 13. Total grades summarized for the second listening test in which

various CCS approaches including a commercial spatializer are compared. 共Conv CCS: conventional CCS with identity matching, modified-1: modified CCS-1, modified-2: modified CCS-2, com spatializer: DiMAGIC VX™ vir-tual sound imaging system.兲 The significance level, s=0.000001, in the MANOVA output.

FIG. 14. Total grades summarized for the third listening test in which dif-ferent structures of CCS implementation are compared.共full-band: full-band CCS, band-limited: band-limited, filter bank: filter bank CCS, sim low mix: simple lowpass mixing CCS.兲 The significance level, s=0.019207, in the MANOVA output.

FIG. 15. Total grades summarized for the fourth listening test in which various audio spatializers utilizing the HRTF, the conventional CCS method, the modified CCS-1, and their combinations are compared.共HRTF: HRTF widening, conv CCS: conventional CCS, modified-1: modified CCS-1, HRTF+ conv CCS: HRTF combined with conventional CCS, HRTF + modified-1: HRTF combined with modified CCS-1.兲 The significance level, s = 0.000001, in the MANOVA output.

(12)

9S. Neely and J. B. Allen, “Invertibility of a room impulse response,” J. Acoust. Soc. Am. 66, 165–169共1979兲.

10P. M. Clarkson, J. Mourjopoulos, and J. K. Hammond, “Spectral, phase, and transient equalization for audio systems,” J. Audio Eng. Soc. 33, 127–132共1985兲.

11O. Kirkeby, P. A. Nelson, F. Orduna-Bustamante, and H. Hamada, “Local sound field reproduction using digital signal processing,” J. Acoust. Soc. Am. 100, 1584–1593共1996兲.

12J. H. Wang and C. S. Pai, “Subjective and objective verifications of the inverse functions of binaural room impulse response,” Appl. Acoust. 64, 1141–1158共2003兲.

13P. P. Vaidyanathan, Multirate Systems and Filter Banks共Prentice-Hall, Englewood Cliffs, NJ, 1993兲.

14S. J. Elliott, P. A. Nelson, and I. M. Stothers, “Sound reproduction sys-tems,” U.S. Patent No. 5,727,066共1998兲.

15G. Keppel and S. Zedeck, Data Analysis for Research Designs共W. H. Freeman, New York, 1989兲.

16H. Hamada, “Construction of orthostereophonic system for the purposes of quasiinsitu recording and reproduction,” J. Acoust. Soc. Jpn. 39, 337– 348共1983兲.

17J. Rose, P. A. Nelson, B. Rafaely, and T. Takeuchi, “Sweet spot size of virtual acoustic imaging systems at asymmetric listener locations,” J. Acoust. Soc. Am. 112共5兲, 1992–2002 共2002兲.

18S. Salamouris, K. Politopoulos, V. Tsakiris, and J. Mourjopoulos, “Digital system for loudspeaker and room equalization,” J. Audio Eng. Soc. 43, 396共1995兲.

19L. D. Fielder, “Analysis of traditional and reverberation-reducing method of room equalization,” J. Audio Eng. Soc. 51, 3–26共2003兲.

20ITU-R BS. 1116, “Methods for the subjective assessment of small impair-ments in audio system including multichannel sound systems,” Geneva, Switzerland, 1994.

21DiMAGIC, “DiMAGIC VX™ virtual sound imaging system,” White Pa-per, 2000; http://www.dimagic.com/pdf/DiMAGIC_Virtualizer_X_White_ paper.pdf. Last accessed 11/10/06.

數據

Figure 1 shows a two-channel loudspeaker reproduction scenario, where H 11 and H 22 are ipsilateral transfer functions, and H 12 and H 21 are contralateral transfer functions from the loudspeakers to the listener’s ears
FIG. 2. The block diagram of a four-channel QMF bank using the polyphase representation.
TABLE I. The test items used in the subjective evaluation.
FIG. 5. Experimental arrangement for the dual speaker handset with a dummy head system inside an anechoic chamber.
+6

參考文獻

相關文件

• A language in ZPP has two Monte Carlo algorithms, one with no false positives and the other with no

• Non-uniform space subdivision (for example, kd tree and octree) is better than uniform grid kd-tree and octree) is better than uniform grid if the scene is

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

sketch with weak labels first, refine with limited labeled data later—or maybe learn from many weak labels only?.. Learning with Limited

Using this formalism we derive an exact differential equation for the partition function of two-dimensional gravity as a function of the string coupling constant that governs the

for a uniform field, a point charge, and an electric

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1

With regards to the questionnaire and interview aspects, we employed those made up by ourselves "The Questionnaire of trigonometry study present situation