利用共極串音消除器建立強健三度空間音效

(1)

國立交通大學

電信工程學系

碩士論文

利用共極串音消除器建立強健

三度空間音效

Robust 3D Sound based on Common-pole

Crosstalk Canceller

研究生：黃俊榮

指導教授：謝世福博士

(2)

利用共極串音消除器建立強健三度空間音效

Robust 3D Sound based on Common-pole

Crosstalk Canceller

研究生：黃俊榮 Student：C. R. Huang

指導教授：謝世福 Advisor：S. F. Hsieh

國立交通大學

電信工程學系碩士班

碩士論文

A Thesis

Submitted to Department of Communication Engineering College of Electrical and Computer Engineering

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Electrical Engineering September, 2006

Hsinchu, Taiwan, Republic of China

(3)

利用共極串音消除器建立強健三度

空間音效

學生：黃俊榮指導教授：謝世福

國立交通大學電信工程學系碩士班

摘要

在現代多媒體系統中，有越來越多的虛擬實境應用。因此三度空間音效技術便越來越重要。我們將使用雙喇叭來產生三度空間音效。然而，主要的問題是串音(crosstalk)的干擾。我們將使用 FIR 和 IIR 兩種形式的串音消除器(crosstalk canceller)來解決這個問題。在這兩種內，我們都會提出反矩陣(matrix-inverse) 或是直接最小平方錯誤(LSE)的方法。在反矩陣方法中，為了避免不穩定性，我們提出最小比例錯誤(ratio error)的方法。在直接 LSE IIR 設計中，為了避免非線性帶來設計上的困難，我們提出共極串音消除器結構。接下來我們將探討另一個問題，串音消除器的強健性。我們知道假如使用一組固定的串音濾波器，在頭可以移動的狀況下，接收到的信號將會和我們想要的信號差很多。因此，我們提出一個利用區域等化觀念的設計方法來降低頭移動造成的影響。最後，我們將用錯誤能量(EN)，串音壓制係數(CSF)和等化改善係數(EIF)來量化串音消除器的效能。

(4)

Robust 3D Sound based on

Common-pole Crosstalk Canceller

Student : C. R. Huang Advisor : S. F. Hsieh

Department of Communication Engineering

National Chiao Tung University

Abstract

There are more and more virtual reality applications in the modern multimedia systems. Therefore, the 3D sound technique becomes more important. We will use a loudspeaker pair to generate the 3D sound. However, the most critical problem is the crosstalk disturbance. To overcome this problem, we investigate both FIR and IIR forms of crosstalk cancellers. In both forms, we propose matrix inverse and direct LSE methods to implement the filters. In matrix-inverse method, to avoid the un-stability, we propose to minimize the ratio error. In direct LSE IIR design, to avoid the nonlinearity, the common-pole structure is proposed. Then, we consider another problem, the robustness of the crosstalk canceller. We know that if the crosstalk canceller is fixed, and the received signals may be very different to the signals we want when the head moves. Therefore, we propose a method by using the region-equalized concept to reduce the effect of head movements. Finally, Error energy (EN), crosstalk suppression factor (CSF) and equalization improvement factor (EIF) are used to quantify the performance of crosstalk cancellers.

(5)

致謝

感謝謝世福教授的耐心指導，使得本篇論文可以順利的完成。他所強

調的物理直覺讓我獲益良多，也對事情的分析有更加一步的認識。再

來要感謝我的父母與家人，由於他們的支持與關懷，讓我有信心，尤

其是大姐，在我疲倦的時候，幫我減輕壓力，讓我有能量可以繼續下

去。再來是感謝我的朋友，懷嘉，俊煒，威諭以及其他許多好友，有

他們的加油，讓我順利完成學業。最後是實驗室的夥伴，一起度過快

樂的兩年，有大家的一起努力與互相加油打氣，使得論文可以完成。

(6)

摘要 I

English Abstract II

致謝 III

Contents IV

List of Tables VII

List of Figures XIII

Chapter 1 Introduction ...1

Chapter 2 3D Sound System ...4

2.1 Sound Localization Cues ...4

2.2 Creation of Virtual Sounds...7

2.3 Virtual Sounds over Loudspeakers ...8

2.3.1 Crosstalk Phenomenon...8

2.3.2 Crosstalk Cancellation ...8

2.3.3 Robustness ...10

(7)

Chapter 3 FIR Crosstalk Canceller ...14

3.1 Matrix Inverse Design...14

3.1.1 Design in Time Domain ...14

3.1.2 Design in Frequency Domain ...18

3.2 Direct LSE FIR Design...24

3.2.3 Comparison between FIR Designs in Time and Frequency Domains 29 Chapter 4 IIR Crosstalk Canceller ...33

4.1 Matrix Inverse Design...33

4.2 Common-pole Structure...40

4.2.3 Comparison between IIR Designs in Time and Frequency Domain...50

Chapter 5 Robust Crosstalk Canceller ...54

5.1 Delay Compensations ...56

5.2 Robust Common Pole Design...58

Chapter 6 Computer Simulations...60

6.1 Figure of Merit...60

6.2 FIR Crosstalk Canceller...65

6.2.1 Matrix-Inverse FIR Design ...65

6.2.1.1 Design in Time Domain ...65

(8)

6.2.2.2 Design in Frequency Domain ...81

6.3 IIR Crosstalk Canceller...82

6.3.1 Filters Designed from Matrix-Inverse ...82

6.3.2 Common-Pole IIR Design...85

6.4 Robust Crosstalk Canceller...91

Chapter 7 Conclusions ...105

(9)

List of Tables

Table 6.1: FOM at ± of FIR form designed from G5o -1

in the time domain...70 Table 6.2: FOM at ±30oof FIR form designed from G-1 in the time domain...75 Table 6.3: FOM at± of FIR form from G5o -1

in the frequency domain ...75 Table 6.4: FOM at±30oof FIR form designed from G-1 in the frequency domain ...76 Table 6.5: FOM at ± of FIR form designed using direct LSE in the time domain 78 5o

Table 6.6: FOM at ±30o of FIR form designed using LSE in the time domain ...79 Table 6.7: FOM at ±30o

of FIR form designed using LSE in the frequency domain .81 Table 6.8: FOM at ±30oof FIR form designed using LSE in the frequency domain .81 Table 6.9: FOM at ± of IIR form 5o 1

G− in the time domain ...82 Table 6.10: FOM at ±30o of IIR form G−1 in the time domain ...82 Table 6.11: FOM at ± of IIR form 5o 1

G− in the frequency domain ...83 Table 6.12: FOM at ±30o of IIR form G−1 in the frequency domain ...83 Table 6.13: FOM at ± of common pole model ...85 5o

Table 6.14: FOM at ±30o of common pole model ...86 Table 6.15: Comparison Common-pole IIR at ± in time and frequency domains 91 5o

Table 6.16: Comparison Common-pole IIR at ±30oin time and frequency domains 91 Table 6.17: FOM at ±30o

of non-robust and robust FIR crosstalk cancellers...97 Table 6.18: Total EN with different length at ±30o ...98 Table 6.19: FOM at ± of non-robust and robust FIR crosstalk cancellers...102 5o

Table 6.20: Total EN with different length at ± ...103 5o Table 6.21: Comparison between robust direct LSE FIR and robust common pole IIR

(10)

List of Figures

Figure 1.1: Virtual 5-channel system ...2

Figure 2.1: A listener with a sound source oriented on azimuth 30o and elevation 0o...5

Figure 2.2: Impulse responses of two HRTFs...5

Figure 2.3: Magnitude responses of two HRTFs ...6

Figure 2.4: Binaural synthesis using headphones ...7

Figure 2.5: Binaural sound reproduced with crosstalk canceller...8

Figure 2.6: Crosstalk canceller described by Schroeder and Atal ...10

Figure 2.7: Geometry of head and loudspeakers ... 11

Figure 2.8: Design flow of the crosstalk canceller ...12

Figure 3.1: Crosstalk canceller in FIR form ...15

Figure 3.2: The impulse response of h z_x( )⋅z−τ /h z_y( )...16

Figure 3.3: Block diagram of the filtered error error_FIR_{_}_filtered( )z ...17

Figure 3.4: Magnitude responses of

(

H e_x( jω) /H e_y( jω)

)

and (c ejω)...20

Figure 3.5: Magnitude response of the direct error error_FIR( )ω ...20

Figure 3.6: Block diagram of the ratio error error_{FIR ratio}_{_} ( )ω ...21

(

H e_y( jω) /H e_x( jω)

)

and (c ejω)...22

Figure 3.8: Magnitude response of the ratio error error_ratio( )ω ...23

Figure 3.9: Magnitude responses of error_FIR( )ω and _{_} ( ) ( ) ( ) j x FIR ratio j y H e error H e ω ω ω ⋅ ...24

(11)

Figure 4.2: Block diagram of the filtered error error_IIR_{_}_filtered_{_ 2}( )z ...35

Figure 4.3: Block diagram of the ratio filtered error error_{IIR ratio}_{_} _{_}_filtered( )ω ...39

Figure 4.4: Common-pole model of the crosstalk canceller ...42

Figure 5.1: Head movements ...55

Figure 5.2: Head-centered coordinate model...56

Figure 6.1: The impulse response at ±30o without the crosstalk canceller...61

Figure 6.2: The frequency response at ±30o without the crosstalk canceller...61

Figure 6.3: The impulse response at ± without the crosstalk canceller...64 5o Figure 6.4: The frequency response at ± without the crosstalk canceller...64 5o Figure 6.5: The time response at± with 50-tap FIR designed from G5o -1...66

Figure 6.6: The frequency response at± with 50-tap FIR designed from G5o -1 ...66

Figure 6.7: Impulse response of c at 5₁₁ ± designed from Go -1 ...67

Figure 6.8: Impulse response of c at 5₂₁ ± designed from Go -1 ...67

Figure 6.9: The impulse response at± with 200-tap FIR designed from G5o -1...68

Figure 6.10: The frequency response at± with 200-tap FIR designed from G5o -1 ...69

Figure 6.11: Impulse response of c at 5₁₁ ± designed from Go -1 ...69

Figure 6.12: Impulse response of c at 5₂₁ ± designed from Go -1 ...70

Figure 6.13: The impulse response at ±30o with 50-tap FIR designed from G-1...71

Figure 6.14: The frequency response at±30o with 50-tap FIR designed from G-1...71

Figure 6.15: Impulse response of c at 30₁₁ ± o designed from G-1...72

Figure 6.16: Impulse response of c at 30₂₁ ± o designed from G-1...72

Figure 6.17: The impulse response at ±30owith 200-tap FIR designed from G-1...73

Figure 6.18: The frequency response at±30o with 200-tap FIR designed from G-1...73

(12)

Figure 6.19: Impulse response of c at 30₁₁ ± odesigned from G-1...74 Figure 6.20: Impulse response of c at 30₂₁ ± o designed from G-1...74 Figure 6.21: Comparison of design from 1

G− at± in time and frequency domains ...76 5o

Figure 6.22: Comparison of design fromG−1at±30oin time and frequency domains .77 Figure 6.23: EN with different taps at ± using direct LSE in time domain...78 5o

Figure 4.24: EN with different taps at ±30o using LSE in time domain ...79 Figure 6.25: Comparison between the direct FIR LSE and matrix-inverse at± ...80 5o

Figure 6.26: Comparison between the direct FIR LSE and matrix-inverse at±30o ....80 Figure 6.27: Comparison with FIR and IIR designs from 1

G− at ± in the time and 5o

frequency domains ...84 Figure 6.28: Comparison with FIR and IIR designs from 1

G− at ±30o

in the time and frequency domains ...84 Figure 6.29: comparison between common-pole and IIR design form 1

G− at ± .86 5o

Figure 6.30: comparison between common-pole and IIR design form G−1at ±30o .87 Figure 6.31: Comparison of EN with direct LSE FIR and Common-pole IIR at ± 88 5o

Figure 6.32: Comparison with direct LSE FIR and Common-pole IIR at ±30o ...88 Figure 6.33: Comparison between common pole and direct LSE FIR using same total

taps at ± ...89 5o Figure 6.34: Comparison between common pole and direct LSE FIR using same total

taps at ±30o...90 Figure 6.35: The frequency response at±30o

with 200-tap FIR designed using LSE..92 Figure 6.36: The frequency response with head rotated + at 305o ± o...92 Figure 6.37: The frequency response with head rotated − at 305o ± o

...93 Figure 6.38: The frequency response at fixed head without compensation...94 Figure 6.39: The frequency response at rotated head without compensation...94

(13)

Figure 6.40: The frequency response at fixed head with compensation...95 Figure 6.41: The frequency response at rotated + head with compensation ...96 5o Figure 6.42: The frequency response at rotated − head with compensation ...96 5o

Figure 6.43: EN between robust design and non-robust design at ±30o...97 Figure 6.44: Total EN between robust design and non-robust design at ±30o

...98 Figure 6.45: The frequency response at± with 200-tap FIR designed using LSE..99 5o Figure 6.46: The frequency response with head rotated + ...100 5o

Figure 6.47: The frequency response with head rotated − ...100 5o Figure 6.48: The frequency response at fixed head with compensation...101 Figure 6.49: The frequency response at rotated + head with compensation ...101 5o Figure 6.50: The frequency response at rotated − head with compensation ...102 5o

Figure 6.51: EN between robust design and non-robust design at ± ...103 5o Figure 6.52: Total EN between robust design and non-robust design at ± ...104 5o

(14)

Chapter 1 Introduction

As we know, virtual reality technique can be used to render virtual sound sources in three-dimensional (3-D) space around a listener [1]. Applications for this technique include entertainments, communications, and simulations. For example, a traditional 5-channel system requires five speakers (Left, Center, Right, Surround Left, and Surround Right). However, the measure of the room must be large enough to position each speaker properly. Besides, the system costs a lot of money. By using the virtual reality technique, we can use only two loudspeakers to realize the effect of the 5-channel system. Therefore, a lot of money can be saved, and the measure of the room where the speakers are placed will not be limited. Figure 1.1 shows virtual 5-channel system, where the five loudspeakers are not real and created by virtual reality technique.

(15)

Figure 1.1: Virtual 5-channel system

To realize the 3-D sound system, we must have the sufficient database of the directional cues. It is well known that the principal cues for sound localization, are Interaural Time Difference (ITD) and Interaural Intensity Difference (IID), which

(16)

has some problem to localize using only the two interaural cues, so called cone of confusion [2]. Therefore, we use the head related transfer functions (HRTFs) from a database, based on MIT Media Lab. [3]. HRTFs have the spectral cues, so we can know the sound from any direction [4]. Based on these transfer functions, a virtual sound can be synthesized at any 3-D direction.

The thesis is organized as follows. Chapter 2 will explain the directional cues and how to create a 3-D sound system. Besides, the problems of sound reproduction over headphones and loudspeakers are considered in details. Chapter 3, Chapter 4 and Chapter 5 are the main parts of this thesis, and they focus on the structures and robustness of crosstalk cancellers. In Chapter 6, we will use computer simulations to compare performance of different crosstalk cancellers. In Chapter 7, we will make a conclusion to summarize the results of simulations.

(17)

Chapter 2 3-D Sound System

In Chapter 1, we mentioned that a 3-D sound system can be realized by using HRTFs. Therefore, the localization cues of HRTFs will be introduced in Section 2.1 first. We will show the ITD, IID, and spectral cues in a pair of measured HRTFs. In Section 2.2, the synthesis of a directional sound is presented. Section 2.3 will investigate the problems of the 3-D sound system over loudspeakers and the robustness of crosstalk cancellers.

2.1 Sound Localization Cues

HRTFs are frequency-domain functions which have corresponding time-domain functions called head-related impulse responses (HRIRs). HRIRs are the impulse responses measured from some specific position to left and right ears.

(18)

S

30o L H R H

Figure 2.1: A listener with a sound source oriented on azimuth 30o and elevation 0o

A pair of HRTFs are measured in Figure 2.1, where H is the transfer function _R

of source S to right ear and H is the transfer function to left ear, and their impulse _L

and frequency responses are plotted in Figure 2.2 and Figure 2.3.

0 50 100 150 200 250 300 350 400 450 500 -1 -0.5 0 0.5 H_R 0 50 100 150 200 250 300 350 400 450 500 -0.3 -0.2 -0.1 0 0.1 0.2 H L

(19)

0 5 10 15 20 -70 -60 -50 -40 -30 -20 -10 0 10 20 frequency (kHz) m a gn it ud e (dB ) H R H L

Figure 2.3: Magnitude responses of two HRTFs

In Figure 2.2, we can find the amplitude of vibration in H is much larger than _R

that in H and _L H has more delays than _L H because of the length difference of _R

two transmission paths. In other words, the observations in Figure 2.2 are the interaural cues of IID and ITD. In Figure 2.3, the high frequency at 8-10 kHz responses have notches caused by the concha reflection [5], and the peaks at 2-3 kHz are caused by the ear canal resonance [6]. These notched and peaks are dependent on the location of the sound source. This result suggests that spectral notches and peaks in HRTFs determine the location of sounds.

(20)

2.2 Creation of Virtual Sounds

We can create a virtual loudspeaker by using a pair of HRTFs. The binaural synthesis process is diagrammed in Figure 2.4. When a sound signal S is processed by the digital filters (H_φ_R and H_φ_L, a pair of HRTFs) and played over headphones, the sound localization cues are reproduced and the listener should perceive the sound at the location specified by the pair of HRTFs.

R

H

_φ L

H

_φ

R

s

L

s

R

s

L

s

≡

Figure 2.4: Binaural synthesis using headphones

Headphones are often used for 3-D sound audio because they have good channel separation. The directional signals (s and _R s ) can be received directly. However, _L

there are some drawbacks by using headphone reproduction. It often suffers from in-head localization, and is also cumbersome and inconvenient.

(21)

2.3 Virtual Sounds over Loudspeakers

2.3.1 Crosstalk Phenomenon

To avoid the drawbacks of headphones, we would replace the headphones with a pair of loudspeakers. However, the left and right loudspeakers are not coupled directly to the left and right ears. The emitted sound from the right loudspeaker goes to the left ear as well as to the right ear of a listener, and vice versa. This phenomenon is called crosstalk. If two-channel binaural sound is reproduced through a pair of loudspeakers, sound received by the listener can be severely changed from the original sound due to the crosstalk effect.

The effect of crosstalk can be cancelled if binaural signals are filtered before they are sent to the loudspeakers. The process is diagrammed is Figure 2.5.

R

H

_φ

L

H

_φ

R

s

L

s

R

y

L

y

R

e

L

e

s

rr g rl g lr g ll

g

Figure 2.5: Binaural sound reproduced with crosstalk canceller

2.3.2 Crosstalk Cancellation

Crosstalk cancellation is a technique involving cancelling the crosstalk which transits the head from each speaker to the opposite ear.

(22)

The goal of the crosstalk cancellation is that the ear signals e and _R e should _L

be same as the binaural signals s and _R s . Essentially, the transfer functions from _L

the loudspeakers to the ears form a system transfer function matrix. Using the matrix notations and refer to Figure 2.5, we can write:

( ) ( ) ( ) ( ) ( ) R R L L e z y z G z e z y z ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦, (2.3.1) where ( ) ( ) ( ) ( ) ( ) rr lr rl ll g z g z G z g z g z ⎡ ⎤ = ⎢ ⎥

⎣ ⎦ is the system transfer function matrix and gxy

represents the channel impulse response between the x side loudspeaker and y side ear, and ,x y∈{ , }r l ; y and y_R _Lare the input signals of loudspeakers, and

( ) ( ) ( ) ( ) ( ) R R L L y z s z C z y z s z ⎡ ⎤ ⎡ ⎤ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ (2.3.2) Therefore, combine Equation (2.3.2) into Equation (2.3.1), we can get

( ) ( ) ( ) ( ) ( ) ( ) R R L L e z s z G z C z e z s z ⎡ ⎤ ⎡ ⎤ = ⋅ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ (2.3.3) Obviously, the signals of ears are same as the binaural signals if:

2 2

( ) ( ) ( )

G z C z⋅ =I _× z (2.3.4) Thus, our work is to find out the inverse of the system transfer function matrix ( )G z .

( )

C z can be found by inversing G z( ) directly as follows:

1 ( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ll lr lr rr C z G z g z g z g z g z D z − = − ⎡ ⎤ = ⎢₋ ⎥ ⎣ ⎦ , (2.3.5) where D z( )=g_rr( )z g_ll( )z −g_rl( )z g_lr( )z

(23)

From Equation (2.3.5), there are some problems. First, each element of C z is ( ) in fractional form, and the denominator D z may be of non-minimum phase so that ( ) some poles of each term may lie outside the unit circle in the z-plane. In other words, the elements of C z may be unstable. Second, because of the transmission delay, ( ) these elements would be non-causal; we should add some delay to make them causal.

There are many structures to implement the crosstalk canceller. The crosstalk canceller was first put into practice by Schroeder and Atal in 1963 [7], and the structure as follows:

Σ

( ) /

( )

ll

g z

D z

( ) / ( ) rl g z D z − ( ) / ( ) lr g z D z − rr

g

rl

g

lr

g

ll

g

( )

R

s

z

( )

L

s z

( ) R y z ( ) L y z

( ) /

( )

rr

g

z

D z

Figure 2.6: Crosstalk canceller described by Schroeder and Atal

When the listening condition is symmetry, i.e. g_rr =g_llandg_rl =g_lr, Cooper and Bauck in 1989 proposed the shuffler structure [2]. There are more structures and detailed discussions in [7]. In next chapter, we will use the least square error method to design the crosstalk canceller.

2.3.3 Robustness

(24)

listener does not change. In practice, it is impossible. If the position of the listener’s head moves, and the crosstalk canceller does not change, the signals to ears may be very different to the original signals.

The question now is how to design a crosstalk canceller that can reduce the effect of head movements. Before designing, there are some researches and analyses about the robustness of the crosstalk cancellers. Ward and Elko et al. [9] show an evaluation of the robustness of crosstalk cancellers for various loudspeaker spacing. They noted a rule-of-thumb for the optimum loudspeaker spacing is given by d_s = ⋅ ⋅2 λ d_H, where d is the distance of the head from the loudspeaker center-line (see Figure _H

2.7), and λ is the wavelength of operation tone. Therefore, we know that optimum loudspeaker spacing is varied with frequency of sound.

/ 2

s

d d_s/ 2

H

d

Figure 2.7: Geometry of head and loudspeakers

Later, they show another method of analysis by using the condition number [10]. Because the difference of the transfer function matrix ΔG between the fixed head and the moving head can be considered as perturbations of the matrix G. The result indicates that a small incidence angle between two loudspeakers and the head is better on the high frequency (above about 4 kHz) band, and lager incidence angle is better on the lower frequency band. Because different incidence angles are robust to

(25)

different frequency band, they analyzed the asymmetric listening condition. A crosstalk cancellation system with three loudspeakers was proposed [11]. An analysis of three loudspeakers by using condition number also shows that the robustness of three loudspeakers is better than that of two loudspeakers [12].

So far, the robustness was increased by increasing the number of loudspeakers. We will propose a new method to increase the robustness with only two loudspeakers.

2.3.4 Crosstalk Canceller Structures

In this section, we will provide an overview for the design flow of the crosstalk canceller in the thesis, and Figure 2.8 is the design flow chart.

GC=I 1

C

=

G

− C≈G−1 1 D E = −C G− f D E = Δ ⋅E r D x E E h Δ = ⋅ GC≈I ELSE =GC−I

Figure 2.8: Design flow of the crosstalk canceller

Our starting point is Equation (2.3.4), and then the design can be separated into two ways, matrix-inverse and direct LSE methods. In matrix-inverse design, we can find it is hard to handle the direct error, so we handle the filtered error instead of the direct error. In filtered error, we can separate two types filtered by Δ and filtered by

ratio term

x h

Δ

(26)

and the type filtered by ratio term

x h

Δ

is used in Section (3.1.2) and Section (4.1.2).

In direct LSE design, the FIR form is implemented in Section (3.2), and the IIR form is hard to handle. Therefore, we handle the filtered error instead of the direct LSE IIR error and propose the structure, common pole model in Section (4.2).

(27)

Chapter 3 FIR Crosstalk Canceller

As we have discussed in Chapter 2, in order to generate 3-D sound, C z must ( ) be the inverse of the channel transfer function matrix G z . If ( ) G z is inversed ( ) directly, the stability and causality must be considered.

To avoid these problems, we use the matrix inverse criterion and the direct least

square error (LSE) criterion to find the filter coefficients of the crosstalk canceller.

3.1 Matrix Inverse Design

3.1.1 Design in Time Domain

(28)

the crosstalk canceller is given as follows: 11 ( ) ( ) ( ) ( ) ( ) ( ) ll rr ll rl lr g z c z g z g z g z g z = − ; 21 ( ) ( ) ( ) ( ) ( ) ( ) rl rr ll rl lr g z c z g z g z g z g z − = − ; 12 ( ) ( ) ( ) ( ) ( ) ( ) lr rr ll rl lr g z c z g z g z g z g z − = − ; 22 ( ) ( ) ( ) ( ) ( ) ( ) rr rr ll rl lr g z c z g z g z g z g z = − ;

Therefore, we want to find filters of the crosstalk canceller by using these theoretical

solutions, and the block diagram is expressed in Figure 3.1.

Σ

11

( )

c

z

12

( )

c

z

21

( )

c

z

22

( )

c

z

R

s

L

s

rr

g

rl

g

lr

g

ll

g

R

e

L

e

Figure 3.1: Crosstalk canceller in FIR form

From Equation (2.3.5), we know each term is in the same form, so we can

estimate each term in FIR form by using the same algorithm. The following method is proposed to find each filter. We want to find a FIR filter ( )c z so that it can be

approximated to each theoretical solution of the crosstalk canceller, i.e.

( ) ( ) ( ) x y h z c z z h z τ − ≈ ⋅ , (3.1.1) where ( ) {h z_x ∈ g_rl( ),z g_lr( ),z g_ll( ),z g_rr( )}z , and h z_y( )=g_rr( )z g_ll( )z −g_lr( )z g_rl( )z ; τ is a delay to guarantee the causality. Therefore, the criterion is to minimize the

(29)

direct error error_FIR( )z as follows: ( ) ( ) ( ) ( ) x FIR y h z error z c z z h z τ − = − ⋅ (3.1.2)

{

2

}

( ) arg min _FIR( )

c z = error z (3.1.3) The impulse response of h z_x( )⋅z−τ/h z_y( ) is showed in upper Figure 3.2 and zoomed in tap-20 to tap-40 in lower figure. From Figure 3.2, we know that it is hard to find a FIR filter to approximate the IIR system h z_x( )⋅z−τ /h z_y( ) because it diverges too fast. 0 10 20 30 40 50 60 70 80 90 -5 0 5 10x 10 20 20 22 24 26 28 30 32 34 36 38 40 -2 0 2 4 6x 10 10 Taps

Figure 3.2: The impulse response of h z_x( )⋅z−τ /h z_y( )

Therefore, Mochtaris proposes to minimize a filtered error error_FIR_{_}_filtered( )z

expressed in Equation (3.1.4) and its block diagram is plotted in Figure 3.3 [13].

_ ( ) ( ) ( )

FIR filtered x FIR

(30)

+ −

Σ

( )

y

h z

( )

x

h z

( )

c z

z

−

τ

_ ( ) FIR filtered error z

Figure 3.3: Block diagram of the filtered error error_FIR_{_}_filtered( )z

According to Figure 3.3, the filtered error can be formulated is as follows:

_ ( ) ( ) ( ) ( )

FIR filtered y x

error z =c z h z⋅ −h z ⋅z−τ (3.1.5) Therefore, the criterion is as follows:

{

2

}

_

( ) arg min || _FIR _filtered( ) ||

c z = error z (31.6)

Equation (3.1.5) can be written in convolution matrix and vector forms, and expressed

in Equation (3.1.7).

_

FIR filtered = y⋅ x

error H c - h , (3.1.7)

[

]

where (0)c= c c(1) " c N( −1)T is a FIR filter with N taps; H is a _y

convolution matrix; 2 0 0 (0) (1) ( 1) 0 0 T x x x x M N h h h M τ + − −τ ⎡ ⎤ =⎢ − ⎥ ⎢ ⎥ ⎣ ⎦ h " " " _{, and}

M is the channel length.

(31)

(2 2) (0) 0 0 (1) (0) 0 (2 2) (0) 0 (2 2) (1) 0 0 (2 2) y y y y y y y y y _{M N} _N h h h h M h h M h h M + − × ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ = ⎢ ⎥ ⎢ ₋ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ ⎣ ⎦ H " " # # " # # " " # # " # " (3.1.8)

One of the filters in the crosstalk canceller c is found as follows:

(

_T

)

1 _T

y y y x

−

=

c H H H h (3. 1.9)

Referring to Equation (3.1.9) with differenth z , we can find each filter of the _x( )

crosstalk canceller. One point should be noted is that the delay in each term of the

crosstalk canceller must be the same.

3.1.2 Design in Frequency Domain

In the previous section, the direct error is designed in the time domain, and met a

divergence problem. Now we will propose a method designed in the frequency

domain to avoid the problem. Equation (3.1.1) is rewritten in frequency domain as

follows: ( ) ( ) ( ) ( ) j j x j FIR j y H e error c e e H e ω ω ωτ ω ω ₌ ₋ − , (3.1.10)

where (c ejω), H e_x( jω)and H e_y( jω) are the Fourier transforms of c n , ( )( ) h n_x

(32)

2 1 ( ) arg min ( ) 2 FIR c n π error d π ω ω π − ⎧ ⎫ = _⎨ _⎬ ⎩

∫

⎭ (3.1.11) In order to find the filter coefficients, we can rewrite the Fourier transform of (c ejω) in vector form as follows [14].

1 0 ( ) ( ) ( ) n N j jn n T c eω c n e ω ω = − ₋ = = ⋅ = ⋅

∑

c ex , (3.1.12) 0 1 ( 1) where ( )ex ω =[e− ×j ω,e− ×j ω,",e−j N− ×ω]T

Referring to Equation (3. 1.12), the error energy can be rewritten as follows:

2 2 1 ( ) 2 ( ) 1 , 2 ( ) FIR FIR j x j y J error d H e d H e π π ω π ω π ω ω π ω π − − = = ⋅ ⋅

∫

T T c c - 2c b + (3.1.13) * ( ) 1 where Re{ ( ) ( ) } 2 ( ) j j x j y H e e d H e ω π _{τ ω} ω π ω ω π − − ⋅ ⋅ =

_∫

⋅ ⋅ b ex

In order to minimizeJ_FIR, let ∂ JFIR =

∂c 0. We can get

c = b (3.1.14) However, we find the performance is bad. The reason is that it is difficult to approximate the high frequency band of

(

H e_x( jω) /H e_y( jω)

)

. We know that h n _y( ) is the convolution of two HTRFs, so

(

H e_x( jω) /H e_y( jω)

)

can be viewed as the inverse of first order of HRTF. From Figure 2.3, we can know that the high frequency

of HRTF are decayed very much, so the high frequency magnitude responses of

(

H e_x( jω) /H e_y( jω)

)

are very large. The frequency magnitude responses of

(33)

(

H e_x( jω) /H e_y( jω)

)

and (c ejω)

From Figure 3.4, we know that the filter (c ejω) would seek the high gain in the high frequency band and sacrifice the low band in order to compromise the full band. The

following figure is the magnitude response of the direct error.

Figure 3.5: Magnitude response of the direct error error_FIR( )ω

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -30 -20 -10 0 10 20 30 40 50 60 70

Normalized Frequency (×π rad/sample)

M a gn it ud e ( d B ) Hx/Hy c 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -40 -20 0 20 40 60 80

M ag ni tude ( dB ) Direct error error FIR(ω)

(34)

From Figure 3.5, we can know that the error in high frequency band is large, so the

performance is bad. Therefore, we propose a method to minimize the ratio error

_ ( )

FIR ratio

error ω instead of the direct error error_FIR( )ω . The ratio error is defined as

follows: _ ( ) ( ) ( ) ( ) ( ) ( ) ( ) j y

FIR ratio j FIR x j y j j j x H e error error H e H e c e e H e ω ω ω ω ω τ ω ω ω − ⋅ ⋅ = ⋅ = ⋅ − (3.1.15)

The block diagram of the ratio error is as follows:

+ −

Σ

(

)

(

)

j y j x

H e

ω ω

(

j

)

c e

ω j

e

− τω _ ( ) FIR ratio error ω

Figure 3.6: Block diagram of the ratio error error_{FIR ratio}_{_} ( )ω

Therefore, the criterion is as follows:

2 _ 1 ( ) arg min | ( ) | 2 FIR ratio c n π error d π ω ω π − ⎧ ⎫ = _⎨ _⎬ ⎩

∫

⎭ (3.1.16)

In the same way, we change the expression of (c ejω) in the vector form as Equation (3.1.12) and the error energy can be rewritten as follows:

2 ( ) 1 | ( ) | 2 ( ) 1 j y j j ratio j x H e J c e e d H e ω π _ω _{τ ω} ω π ω π − − ⋅ ⋅ = ⋅ − = ⋅ ⋅ ⋅

∫

T T c A c - 2c b + , (3.1.17)

(35)

2 1 ( ) where ( ) ( ) 2 ( ) 1 ( ) Re{ ( ) ( )} 2 ( ) j H j j j j Hy e d Hx e Hy e e d Hx e ω π ω π ω π _{τ ω} ω π ω ω ω π ω ω π − ⋅ ⋅ − = ⋅ ⋅ = ⋅ ⋅

∫

A ex ex b ex

In order to minimizeJ_{FIR ratio}_{_} , let ∂ JFIR ratio_ =

∂c 0. We can get

Ac = b (3.1.18) Therefore, the filter can be found out as follows:

-1

c = A b (3.1.19)

The frequency magnitude responses of

(

H e_y( jω) /H e_x( jω)

)

and (c ejω) are plotted in Figure 3.7.

(

H e_y( jω) /H e_x( jω)

)

and (c ejω)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -80 -60 -40 -20 0 20 40

M a gn it ud e ( dB ) Hy/Hx c

(36)

The high frequency band of

(

H e_y( jω) /H e_x( jω)

)

would be very small because it can be approximate as first order of HRTF. Therefore, the high frequency band of (c ejω) must be very large. From Figure 3.7, although the high frequency band of (c ejω) is not large enough, the error is smaller than the direct error. Therefore, the errors in

high band are reduced, and the low frequency band can be done better. Figure 3.8

shows the magnitude response of the ratio error.

Figure 3.8: Magnitude response of the ratio error error_ratio( )ω

From Equation (3.1.15), we know the relation between the direct error and the ratio

error. Therefore, Figure 3.9 shows the results of the direct error and the ratio error filtered by H e_x( jω) /H e_y( jω) in order to compare these two errors.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -50 -40 -30 -20 -10 0 10

M agni tu de ( dB ) error_ratio(ω)

(37)

Figure 3.9: Magnitude responses of error_FIR( )ω and _{_} ( ) ( ) ( ) j x FIR ratio j y H e error H e ω ω ω ⋅

From Figure 3.9, it is obvious that the ratio error filtered by H e_x( jω) /H e_y( jω) is smaller than the direct error.

From Equation (3.1.15), we can find it is the same as the direct LSE FIR method

which will be proposed in the next section.

3.2 Direct LSE FIR Design

3.2.1 Design in Time Domain

In this section, we design the crosstalk canceller by using the direct least square

error (LSE) method instead of the matrix-inverse design. The structure of the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -60 -40 -20 0 20 40 60 80

M agn it ude ( dB ) error ratio(ω)×(Hx(ω)/Hy(ω)) error FIR(ω)

(38)

crosstalk canceller is plotted in Figure 3.1, and C z can be represented in matrix ( ) form as follows: 11 12 21 22 ( ) ( ) ( ) ( ) ( ) c z c z C z c z c z ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ (3.2.1)

In order to show how the crosstalk canceller works, s is an impulse signal, and _R

L

s sends the zero-signal. Because of causality, we want e to be same as _R s with _R

some extra delay, and e to be a zero-signal. The signals to ears can be expressed as _L

follows: ( ) ( ) R e z ≈d z (3.2.2) ( ) 0 L e z ≈ , (3.2.3)

where ( )d z is the desired signal which is a delayed impulse signal. Therefore, the

LSE criterion to find the filters c₁₁( )z and c₂₁( )z can be rewritten as follows:

(

)

2 11 11 21 21 ( ) ( ) ( ) ( ) arg min ( ) ( ) 0 c z d z c z c z G z c z ⎧ _⎡ _{⎤ ⎡} _⎤ ⎫ ⎪ ⎪ = _⎨ _⎢ _{⎥ ⎢}− _⎥ _⎬ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩ ⎭ (3.2.4)

In the same way, if s is an impulse signal, and _L s is a zero-signal such that: _R

(

)

2 12 12 22 22 ( ) 0 ( ) ( ) arg min ( ) ( ) ( ) c z c z c z G z c z d z ⎧ _⎡ _{⎤ ⎡} _⎤ ⎫ ⎪ ⎪ = _⎨ _⎢ _{⎥ ⎢}− _⎥ _⎬ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩ ⎭ (3.2.5)

Equation (3.2.4) and Equation (3.2.5) can be derived in convolution matrixes as

follows:

(

)

2 11 11 21 21 arg min rr lr rl ll ⎧ _⎡ _{⎤ ⎡ ⎤ ⎡ ⎤} ⎫ ⎪ ⎪ = _⎨ _⎢ _{⎥ ⎢ ⎥ ⎢ ⎥}− _⎬ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩ ⎭ G G c d c c G G c 0 (3.2.6)

(39)

(

)

2 12 12 22 22 arg min rr lr rl ll ⎧ _⎡ _{⎤ ⎡ ⎤ ⎡ ⎤} ⎫ ⎪ ⎪ = _⎨ _⎢ _{⎥ ⎢ ⎥ ⎢ ⎥}− _⎬ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩ ⎭ G G c 0 c c G G c d , (3.2.7)

where G , _rr G , _lr G , and _rl G are the convolution matrixes of _ll g_rr( )z , ( )g_lr z ,

( )

rl

g z , and g_ll( )z in the time domain; c₁₁ =

[

c₁₁(0) c₁₁(1) " c₁₁(N−1)

]

T and

[

]

21 21(0) 21(1) 21( 1)

T

c c c N

= −

c " ; the desired signal d=

[

0 0 " 1 0 " 0

]

T and zero-vector 0 with L

(

=M + −N 1

)

taps.

Rewrite G_ij in detail , where ,i j∈{ , }r l as follows:

(0) 0 0 (1) (0) 0 ( 1) (0) 0 ( 1) (1) 0 0 ( 1) ij ij ij ij ij ij ij ij ij _{L N} g g g g M g g M g g M × ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ = ⎢ ⎥ ⎢ ₋ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ ⎣ ⎦ G " " # # " # # " " # # " # " (3.2.8)

Refer to Equation (3.2.6), and let

11 1 1 21 , , rr lr rl ll ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ =_⎢ _⎥ =_{⎢ ⎥} =_{⎢ ⎥} ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ G G c d G c q G G c 0 (3.2.9)

we know the error as follows:

1 1

= −

error Gc q (3.2.10)

The least square error criterion is such that:

{

2

}

1=arg min || ||

c error (3.2.11) The solution can be easily shown as

(

)

1= 1

-1

T T

(40)

Therefore,

[

]

11 1(0), (1),1 , (1 1) T c c c N = − c " (3.2.13)

[

]

21 1( ), (1 1), , (21 1) T c N c N c N = + − c " (3.2.14)

Refer to Equation (3.2.7), and let

12 2 2 22 , ⎡ ⎤ ⎡ ⎤ =_{⎢ ⎥} =_{⎢ ⎥} ⎣ ⎦ ⎣ ⎦ c 0 c q c d , (3.2.15)

where c and ₁₂ c with ₂₂ N taps, too. In the same way, we can get

(

)

2 = 2 -1 T T c G G G q (3.2.16) Therefore,

[

]

12 2(0), 2(1), , 2( 1) T c c c N = − c " (3.2.17)

[

]

22 2( ), 2( 1), , 2(2 1) T c N c N c N = + − c " (3.2.18)

If the listening situation is symmetric, Equation (3.2.15) can be reduced. The other

two filters can be found in Equation (3.2.13) and Equation (3.2.14) such that:

c₁₂ =c₂₁ and c₂₂ =c (3.2.19) ₁₁

3.2.2 Design in Frequency Domain

In this section, we will implement the system in the frequency domain. Equation

(3. 2.4) can be written in the frequency domain as follows:

(

)

11 2 11 21 21 ( ) ( ) ( ) 1 ( ) ( ) arg min 2 ( ) ( ) ( ) 0 j j j j j j rr lr j j j rl rr g e g e c e e c e c e d g e g e c e ω ω ω ωτ π ω ω ω ω ω π ω π − − = ⎧⎪_⎨ ⎡_⎢ ⎤ ⎡_{⎥ ⎢} ⎤ ⎡_{⎥ ⎢}− ⎤_⎥ ⎫⎪_⎬ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩

∫

⎭ (3.2.20)

(41)

As described in Section 3.1.1.2, we can rewrite c₁₁(ejω) and c₁₂(ejω) as follows: 11( ) 1 11 j T c eω =ex ⋅c , (3.2.21) 0 1 ( 1) 1 where ( )ex ω =[e− ×j ω,e− ×j ω,",e−j N− ×ω]T. 21( ) 2 21 j T c eω =ex ⋅c , (3.2.22) 0 1 ( 1) 2 where ( )ex ω =[e− ×j ω,e− ×j ω,",e−j N− ×ω]T.

Therefore, Equation (3.2.20) can be rewritten as follows:

(

)

2 11 11 21 21 arg min 0 j e d ωτ π π ω − − ⎧ _{⎡ ⎤} _⎡ _⎤ ⎫ ⎪ ⎪ = _⎨ ⋅_{⎢ ⎥}−_⎢ _⎥ _⎬ ⎣ ⎦ ⎣ ⎦ ⎪ ⎪ ⎩

∫

⎭ c c c P c , (3.2.23) where 1 2 1 2 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) j T j T rr lr j T j T rl ll g e g e g e g e ω ω ω ω ω ω ω ω ⎡ ⋅ ⋅ ⎤ = ⎢ _⋅ _⋅ ⎥ ⎣ ⎦ ex ex P ex ex .

The error J_{LSE FIR}_{_} can be calculated as follows:

11 2 _ 21 1 | | 2 0 j LSE FIR e J d ωτ π π ω π − − ⎡ ⎤ ⎡ ⎤ = ⋅_{⎢ ⎥}_{− ⎢} _⎥ ⎣ ⎦ ⎣ ⎦

∫

P c c (3.2.24)

In the same way, we can rewrite the above equation as follows:

_ 1 1 2 1 1 T T LSE FIR J =c ⋅ ⋅ − ⋅O c c ⋅ +m , (3.2.25) where c₁=

[

c₁₁ c₂₁

]

T,

(

)

1 2 H d π π ω π − =

_∫

O P P , and

( )

1 2 ( ) 1 Re ( ) 2 j j rr j j lr e g e d e g e ωτ ω π ωτ ω π ω ω ω π − ⎧⎡ ⋅ ⋅ ⎤⎫ ⎪ ⎪ = _⎨_⎢ _⎥_⎬ ⋅ ⋅ ⎪⎣ ⎦⎪ ⎩ ⎭

∫

ex m ex .

In order to minimize the error J_{LSE FIR}_{_} , let _

1 0 LSE FIR J ∂ = ∂c We can get 1 Oc = m (3.2.26)

(42)

Therefore, the filter can be found out as follows:

1 -1

c = O m (3.2.27)

We can find the filter coefficients as follows:

[

]

11 1(0), (1),1 , (1 1) T c c c N = − c " (3.2.28)

[

]

21 1( ), (1 1), , (21 1) T c N c N c N = + − c " (3.2.29) 12

c and c can be found out by using the same method. ₂₂

3.2.3 Comparison between FIR Designs in Time and

Frequency Domains

In Section (3.1.2.1) and Section (3.1.2.2), we can find out the crosstalk canceller

in LSE FIR model in the time domain and in the frequency domain. The difference

between these two methods is to minimize the error in time and frequency domains.

According to Parseval theorem [14], they should be equal. Therefore, the results of

the two methods should be the same intuitionally. We will prove that two crosstalk

cancellers designed in the two methods are the same, and the process of the proof is as

follows:

First, let 'O =G G and T m = G q . Equation (3.2.12) can be rewritten as ' T ₁

follows:

( )

1 1 ' ' − = c O m (3.2.30)

(43)

We can write O' in more detail as follows: 1 2 3 4 ' ' ' ' ' ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ O O O O O , (3.2.31) where 1 2 3 4 ' ' ' ' T T rr rr rl rl T T rr lr rl ll T T lr rr ll rl T T lr lr ll ll = = = = O G G + G G O G G + G G O G G + G G O G G + G G

In order to compare O' with O in Equation (3.2.27), O is also written in detail as follows: 1 2 3 4 ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ O O O O O , (3.2.32) where

(

)

(

)

(

)

(

)

(

) (

)

(

) (

)

* * 2 2 1 1 1 1 1 * * * * 2 1 2 1 2 * * * * * * 3 2 1 2 1 1 1 ( ) ( ) 2 2 1 1 ( ) ( ) ( ) ( ) 2 2 1 1 ( ) ( ) ( ) ( ) 2 2 j H j H rr rl j j H j j H lr rr ll rl j j H j j H lr rr ll rl g e d g e d g e g e d g e g e d g e g e d g e g e d π _ω π _ω π π π _ω _ω π _ω _ω π π π _ω _ω _ω _ω π π ω ω π π ω ω π π ω ω π π − − − − − − = + = + = +

∫

O ex ex ex ex O ex ex ex ex O ex ex ex ex

(

)

*

(

)

* 2 2 4 2 2 2 2 1 1 ( ) ( ) 2 2 j H j H lr ll g e d g e d π π _ω π _ω π ω π ω π − π − = +

∫

O ex ex ex ex

Comparing the elements between in O and O', we can find they are the same and the proof as follows:

Take O and ₂ O for example, and other elements are proved in the same way. ₂'

(44)

1 1 1 0 0 0 1 0 1 1 0 0 ( ) ( ) ( ) ( 1) ( ) ( ) ( 1) ( ) ( ) ( ) ( ) ( ) L L L rr lr rr lr rr lr k k k L rr lr T k rr lr L L rr lr rr lr k k g k g k g k g k g k g k N g k g k g k N g k g k N g k N − − − = = = − = − − = = ⎡ ₋ ₋ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ₋ ₋ ⎥ ⎢ ⎥ ⎣ ⎦

∑

G G " % " # # # % # " "

From above equation, we can know each element in G G as follows: _rrT _lr

1 0 ( , ) ( ) ( ), 0 ~ 1 0 ~ 1 L T rr lr rr lr k p q g k p g k q where p N and q N − = = − − = − = −

∑

G G (3.2.33)

From O , the first right term ₂ 1 ( ) *( )

(

₁ ₂

)

* 2 j j H lr rr g e g e d π _ω _ω π ω π

∫

− ex ex is considered.

(

)

( 21 ) 11 11 21 * * 1 2 * 0 ( 1) (0) (0 1) (1) * ( 1) ( ) 1 Let ( ) ( ) 2 1 ( ) ( ) 2 j j H lr rr j N j j j j j lr rr j N j N N g e g e d e e e e g e g e d e e π _ω _ω π ω ω ω ω π _ω _ω π ω ω ω π ω π − − − − − − − − − − − − − = ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ = _⎢ _⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

∫

T ex ex T " % " # # # % # " " * ( ) * 1 1 ( ) 0 0 1 1 ( ) 0 0 1 ( , ) ( ) ( ) 2 1 ( ) ( ) 2 1 ( ) ( ) 2 ( j j j p q lr rr L L jn jm j p q lr rr n m L L j p m q n lr rr n m lr p q g e g e e d g n e g m e e d g n g m e d g m q π _ω _ω _ω π π _ω _ω _ω π π _ω π ω π ω π ω π − − − − − − − − = = − − + − − − = = = ⎛ ⎞⎛ ⎞ = _⎜ _⎟⎜ _⎟ ⎝ ⎠⎝ ⎠ = = −

∫

∑

∫

∑

∫

T 1 0 ) ( ) L rr m g m p − = −

∑

(3.2.34) where 0 ~p= N−1 and q=0 ~N− . 1

We can find that the results in Equation (3.2.33) and Equation (3.2.34) are the

same. Therefore, 1 ( ) *( )

(

₁ ₂

)

* 2 j j H lr rr g e g e d π _ω _ω π ω π

∫

− ex ex equals T rr lr G G . In the same

way, we can prove that 1 ( ) *( )

(

₁ ₂

)

*

2 j j H ll rl g e g e d π _ω _ω π ω π

∫

− ex ex equals T rl ll G G . In

(45)

be proved equals. Therefore, we can say

' =

O O (3.2.35) Now, we consider the terms m and m . We know that '

1 ' T T rr T lr ⎡ ⎤ = _{= ⎢} _⎥ ⎣ ⎦ G d m G q G d

We first consider the upper term G d written as follows: _rrT

(

11

)

( ) ( 1) ( 1) T

T

rr =⎣⎡grr τ grr τ− grr τ − N − ⎤⎦

G d " (3.2.36)

The upper term in m is also considered first. Each term in upper term is as follows:

{

}

(

)

* ( ) ( ) rr 1 ( ) Re ( ) , 0 ~ 1 2 ( ) ( ) 1 2 2 g ( ) j j js upper rr j j s j j s rr rr s e g e e d s N g e e g e e d s π _ωτ _ω _ω π ω ω τ ω ω τ π π ω π ω π τ − − − − − − − = ⋅ ⋅ = − ⋅ + ⋅ = −

∫

m = (3.2.37)

We can find that the results in Equation (3.2.36) and Equation (3.2.37) are the same.

In the same way, we can prove the lower terms in m and m are the same. '

Therefore, we can say that

m = m (3.2.38) '

From the results in Equation (3.2.37) and Equation (3.2.38), we can know the FIR

(46)

Chapter 4 IIR Crosstalk Canceller

In this chapter, the crosstalk canceller is designed in IIR form. We also use these

two criteria, matrix inverse and direct LSE IIR to design the filters.

4.1 Matrix Inverse Design

4.1.1 Design in Time Domain

Referring to Equation (2.3.5) and Equation (3.1.1), we have known the theoretical

solutions of the crosstalk canceller. We want that each term of G−1 can be approximated by using IIR from as expressed in Equation (4.1.1) and the IIR structure

of the crosstalk canceller is diagramed in Figure 4.1.

( ) ( ) ( ) ( ) x y h z b z z a z h z τ − ≈ ⋅ (4.1.1)

(47)

Σ

11 11 ( ) ( ) b z a z 12 12 ( ) ( ) b z a z 21 21 ( ) ( ) b z a z 22 22 ( ) ( ) b z a z R

s

L

s

rr

g

rl

g

lr

g

ll

g

R

e

L

e

Figure 4.1: The structure of the crosstalk canceller in IIR form Now, the criterion to design the IIR filter is to minimize the error as follows:

( ) ( ) ( ) ( ) ( ) x IIR y h z b z error z z a z h z τ − = − ⋅ (4.1.2)

(

)

2 0

( ), ( ) arg min _IIR( )

n b n a n error n ∞ = ⎧ ⎫ ⎪ ⎪ = _⎨ _⎬ ⎪ ⎪ ⎩

∑

⎭ (4.1.3) Let ( ) 1 ( ) u z a z

= , and a(0) 1= . We can get

( ) ( ) ( ) ( )

k

u n =δ n −

∑

a k u n k−

Therefore, ( )error_IIR n can be rewritten as follows:

0 1 ( ) ( ) ( ) ( ) ( ) ( ) IIR t k error n b t δ n t a k u n t k r n = = ⎛ ⎞ = _⎜ − + − − _⎟− ⎝ ⎠

∑

, (4.1.4)

where ( )r n is the impulse response of h z_x( )⋅z−τ /h z_y( ).

Therefore, we can find that Equation (4.1.4) is a function of a k and ( )( ) b t . We

may want to differentiate Equation (4.1.4) with respect to a k and ( )( ) b t , and set the

(48)

will appear and it will be a very tough work. Therefore, we use the Prony’s Method

concept [15] to linearilize the problem by multiplication (filtering) of the denominator ( )

a z . Equation (4.1.2) can be rewritten as follows:

_ _1( ) ( ) ( ) ( ) ( ) ( )

( )

IIR fitered IIR x y error z a z error z h z b z a z z h z τ − = ⋅ = − ⋅ (4.1.5)

The filtered error error_IIR_{_}_fitered_{_1}( )z can be expressed as follows:

_ _1 1 ( ) ( ) ( ) ( ) ( ) IIR filtered k error n b n a k r n k r n = = −

∑

− − (4.1.6)

From the discussion in the previous section, we know that ( )r n will diverge.

Therefore, we multiply h z to further stabilize the problem and minimize the _y( ) following filtered error error_IIR_{_}_filtered_{_ 2}( )z .

_ _ 2( ) ( ) _ _1( )

( ) ( ) ( ) ( )

IIR filtered y IIR filtered

x y

error z h z error z

a z h z z−τ b z h z

= ⋅

= ⋅ ⋅ − ⋅ (4.1.7)

The block diagram of the filtered error is given in Figure 4.2.

+ −

Σ

( )

y

h z

( )

x

h z z

−τ

( )

b z

( )

a z

_ _ 2( ) IIR fltered error z

Figure 4.2: Block diagram of the filtered error error_IIR_{_}_filtered_{_ 2}( )z

Our goal is to minimize the filtered error, error_IIR_{_}_filtered_{_ 2}( )z , and its expression

(49)

_ _ 2( ) ( )* ( ) ( )* ( )

IIR filtered x y

error n =a n h n− −τ b n h n , (4.1.8) where ' '_{* means convolution; ( )}b n , and ( )a n are FIR filters with nb, and na

taps. Let a₁(0) 1= , and ( )u n =h n_x( −τ). Equation (4.1.3) can be rewritten as

follows:

(

)

1 1 _ _ 2 1 11 1 0 ( ) ( ) ( ) ( ) ( ) ( ) na nb IIR filtered y l m error n u n u n l a l h n m b m − − = = = +

∑

− +

∑

− − (4.1.9)

The above equation can be rewritten in the matrix form as follows:

( )

(

)

_ _ 2

IIR filtered = − ⋅ − + y⋅

error u U a H b , (4.1.10)

where vectors a, and b are the filter coefficients.

[

a(1) a(2) a na( 1)

]

T = − a " ;

[

b(0) b(1) b nb( 1)

]

T = − b " ; 0, , 0 , (0), (1), ( 1), 0, , 0 T x x x L M h h h M τ − −τ ⎡ ⎤ =⎢ − ⎥ ⎢ ⎥ ⎣ ⎦ u " " " , 2L= M +max(nb na, ) 2−

U, and H are the convolution matrices given by _y

( 1) 0 0 0 (0) 0 0 (1) (0) (1) (0) (1) ( 2) ( 3) ( ) _L _na u u u u u u u L u L u L na _× ₋ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ₋ ₋ ⎥ ⎣ ⎦ U " " " # # " # # " # # " # " (4.1.11)

(50)

(0) 0 0 (1) (0) 0 (2 2) (0) 0 (2 2) (1) 0 0 (2 2) y y y y y y y y y _{L nb} h h h h M h h M h h M × ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ = ⎢ ⎥ ⎢ ₋ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ₋ ⎥ ⎣ ⎦ H " " # # " # # " " # # " # " (4.1.12)

Equation (4.1.5) can be rewritten as follows:

_ _ 2 IIR filtered y − ⎡ ⎤ ⎡ ⎤ = −_⎣ _⎦_{⎢ ⎥} ⎣ ⎦ a error u U H b

Our goal is to minimize error_IIR_{_}_filtered_{_ 2}, so the criterion is as follows:

{

2

}

_ 2 2 arg min || || arg min || || filtered y − ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ ⎧ _⎡ _⎤⎡ ⎤− ⎫ = _⎨ −_⎣ _⎦_{⎢ ⎥} _⎬ ⎣ ⎦ ⎩ ⎭ a error b a u U H b (4.1.13) Let W= ⎣⎡U H_y⎤_⎦, and = ⎢ ⎥⎡ ⎤− ⎣ ⎦ a v

b . We can get the filtered least squares solution as

follows:

(

_T

)

−1 _T

= ⋅ ⋅ ⋅

v W W W u (4.1.14)

From the above equation, we can get the filter coefficients as follows:

[

]

( ) 1 (0) (1) ( 2) T a n = v v " v na− (4.1.15)

[

]

( ) ( 1) ( ) ( 2) T b n = v na− v na " v na+nb− (4.1.16)

4.1.2 Design in Frequency Domain

Similarly, we will try to find the filters in the frequency domain. The direct error

利用共極串音消除器建立強健三度空間音效

國 立 交 通 大 學

電信工程學系

碩 士 論 文

利用共極串音消除器建立強健

三度空間音效

Robust 3D Sound based on Common-pole

Crosstalk Canceller

研究生：黃俊榮

指導教授：謝世福 博士

利用共極串音消除器建立強健三度空間音效

Robust 3D Sound based on Common-pole

Crosstalk Canceller

研究生 ：黃俊榮 Student：C. R. Huang

指導教授：謝世福 Advisor：S. F. Hsieh

國 立 交 通 大 學

電信工程學系碩士班

碩士論文

利用共極串音消除器建立強健三度

空間音效

學生：黃俊榮 指導教授：謝世福

國立交通大學電信工程學系碩士班

摘要

Robust 3D Sound based on

Common-pole Crosstalk Canceller

Student : C. R. Huang Advisor : S. F. Hsieh

Department of Communication Engineering

National Chiao Tung University

Abstract

致謝

感謝謝世福教授的耐心指導，使得本篇論文可以順利的完成。他所強

調的物理直覺讓我獲益良多，也對事情的分析有更加一步的認識。再

來要感謝我的父母與家人，由於他們的支持與關懷，讓我有信心，尤

其是大姐，在我疲倦的時候，幫我減輕壓力，讓我有能量可以繼續下

去。再來是感謝我的朋友，懷嘉，俊煒，威諭以及其他許多好友，有

他們的加油，讓我順利完成學業。最後是實驗室的夥伴，一起度過快

樂的兩年，有大家的一起努力與互相加油打氣，使得論文可以完成。

Contents

摘要 I

English Abstract II

致謝 III

Contents IV

List of Tables VII

List of Figures XIII

List of Tables

List of Figures

(

)

(

)

Chapter 1

Introduction

Chapter 2

3-D Sound System

2.1 Sound Localization Cues

S

2.2 Creation of Virtual Sounds

H

H

R

s

L

s

R

s

L

s

≡

2.3 Virtual Sounds over Loudspeakers

2.3.1 Crosstalk Phenomenon

H

φ

H

φ

R

s

s

y

y

e

國立交通大學

碩士論文

指導教授：謝世福博士

研究生：黃俊榮 Student：C. R. Huang

國立交通大學

學生：黃俊榮指導教授：謝世福

_φ

_φ