## 國

## 立

## 交

## 通

## 大

## 學

### 電機學院 IC 設計產業研發碩士班

### 碩

### 士

### 論

### 文

### 根據高階相關特性之非線性回音壓抑器

### Nonlinear Acoustic Echo Suppression Based on Correlated Higher-orders

### Nonlinear Echoes

### 研 究 生：王宏益

### 指導教授：謝世福 教授

### 根據高階相關特性之非線性回音壓

### 抑器

### 學生：王宏益 指導教授：謝世福

### 國立交通大學電機學院產業研發碩士班

### 摘要

為了去補償在免持聽筒或是視訊會議系統中所發生喇叭的非線性回音，通 常使用非線性音訊回音消除器去消除此回音。不過這些方式的收斂速度太慢而影 響效能。使用非線性回音壓抑方式即改善此問題。這是因為使用非線性回音壓抑 時，不去找出實際的非線性路徑的部分。而是去找出非線性部份的能量頻譜密度。 這篇論文中，介紹兩種估計非線性回音能量頻譜密度的方式。第一種是以 回音的相關性去估計出。第二種是使用非線性適應性濾波器去得到。然而使用第 一種，會產生較大的誤差，是由於高階與一階回音不會是線性的關係。而第二種， 需要大量的非線性濾波器去估計出高階的回音。在此篇論文將提出新的演算法， 使用非線性適應濾波器去準確找出低階部分的回音，並且使用回音的相關性粗略 估出高階部分的回音。此演算法能以少量的運算量較準確的估計出非線性回音的 能量。另外對基底選擇提出分析方式，進一步的提升估計非線性回音的準確度。**Nonlinear Acoustic Echo **

**Suppression Based on Correlated **

**Higher-orders Nonlinear Echoes **

**Student : H. Y. Wang Advisor : S. F. Hsieh **

**Industrial Technology R & D Master Program of **

**Electrical and Computer Engineering College **

**National Chiao Tung University **

**Abstract **

In order to compensate the nonlinear distortion in the hands -free telephones or teleconferencing system, the nonlinear acoustic echo cancellation can be used to cancel nonlinear acoustic echo. However, the convergence speed of these methods is too slow. Nonlinear Acoustic Echo Suppressions will resolve the question. This is because we are not interested in an exact identification of the nonlinear components of the echo path. We are rather aiming at estimates power spectral density of the nonlinear components.

In the thesis, two previous methods are introduced to estimate power spectral density of nonlinear residual error. First, use the linear property of echoes to estimate. Second, a nonlinear adaptive filter is used. We proposal the new method that can estimate low order echoes accurately and high order echoes roughly by less computational complexity than second method and estimate the nonlinear residual error more accurately than first method. And, we discuss how to choose echo basis.

**Acknowledgments **

時光飛逝，回首兩年多來的研究生活，真是讓我成長不少，首先要感謝指
導教授謝世福老師，在老師的耐心與嚴謹的指導下，讓我在追求學識與做事態
度，有了很大的震撼，讓我受益良多，老師謝謝您！除此之外，感謝實驗室的
學長、同學和學弟，因為有你們，讓我在沮喪時有了動力，讓我的研究生活充
滿色彩。最後要感謝我的家人，總是默默的陪伴我，給我支持與鼓勵，讓我在
困境中完成。
**Contents **

### 中文摘要... I

**English Abstract ...II **

**Acknowledgments ...III **

**Contents ... IV **

**List of Figures... VI **

**List of Tables...VIII **

**1. Introduction...1**

**2. Overview of Nonlinear Acoustic Echo Cancellation...4**

2.1 Memoryless Nonlinear AEC ...5

2.1.1 Hammerstein model ...5

2.1.2 Orthogonalized Power Filter model...6

2.2 Memory Nonlinear AEC ...7

2.2.1 Wiener Model...7

2.2.2 Volterra model...8

2.3 Computation Reduction of Simplified Volterra Model... 10

**3. Estimate Nonlinear Residual Error for Nonlinear Acoustic Echo **

**Suppression...14**

3.1 Wiener Filter ...15

3.2 Acoustic Echo Suppression Structures...16

3.2.1 AEC+NR structure ...17

3.2.2 NR+AEC structure...18

3.3 Estimation nonlinear residual error for NAES...19

3.3.1 Based on Highly Correlated Nonlinear Residual Echo...21

3.3.2 Power Filter Model ...22

3.4 Suppression of nonlinear residual error and background noise ...24

3.4.1 AEC+NR suppression of noise and nonlinear residual error ...24

3.4.2 NR+AEC suppression of noise and nonlinear residual error...25

3.5 Volterra structure for NAES...27

3.6 Using high order nonlinear echo to estimate nonlinear residual error...29

**4. Computer Simulations...34**

4.1 Parameters and speech signal of simulations...34

4.2 Compare two combined structures in three conditions...35

4.2.1 Background noise...35

4.2.2 Nonlinear residual error ...38

4.2.3 Background noise and nonlinear residual error ...39

4.3 Performance of Volterra Structure for NAES ...39

4.4 Simulation of Highly Nonlinear Residual Errors...41

4.4.1 Single talk ...41

4.4.2 Double Talk...45

4.5 Statistics Distribution of Higher-Order Nonlinear Residual Errors...46

**5. Conclusions...56**

**List of Figures **

**1.1 The simplified diagram of hands-free telephone system**

### ...1

**2.1 Hammerstein structure**

### ...5

**2.2 Wiener model**

### ...7

**2.3 Cascade model of the system and mirror adaptive system**

### ...8

**2.4 Simplified second order Volterra filter**

### ...10

**3.1 Linear AEC and nonlinear acoustic echo suppression structure**

### ...14

**3.2 Block diagram representation of the statistical filtering problem**

### ...16

**3.3 Block diagram of a combined echo canceling and noise reduction system**

### 16

**8**

**3.4 Block diagram of a combined noise reduction and echo canceling system**

### .19

**3.5 Block diagram of NR+AEC to suppress noise and nonlinear residual error**

### 25

**3.6 Block diagram of NR+AEC to suppress noise and nonlinear residual error**

### 26

**3.7 The Volterra structure for NAES**

### ...28

**3.8 Block diagram of Sec. 3.3.1**

### ...28

**9**

**3.9 Block diagram of the proposed method**

### ...

**30**

**3.10 Sigmoid function**

### ...32

**4.1 Speech signal**

### ...35

**4.2 Pseudo room impulse response**

### ...36

**4.3 Combined structures for suppressing background noise (SNR=20dB)**

### ...37

**4.5 Combined structures for suppressing nonlinear residual error **

**(SNR=20dB)**

### ...38

**4.6 Combined structures for suppressing nonlinear residual error and **
**background noise**

### ...39

**4.7 The Volterra structure for NAES using WGN**

### ...40

**4.8 ERLE for speech signal & real system**

### ...43

**4.9 ERLE for speech signal & polynomial system**

### ...44

**4.10 ERLE for WGN & polynomial system**

### ...44

**4.11 Statistic distribution of AEC1/slope1 in speech signal + real system**

### ...47

**4.12 Statistic distribution of AEC13/AES3/slope1 in speech signal + real system**

### 47

**4.13 Statistic distribution of AEC13/AES3/slope3 in speech signal + real system**

### 48

**4.15 Statistic distribution of AEC13/AES3/slope1 in WGN signal + polynomial**

**system**

### ...49

**4.16 Statistic distribution of AEC13/AES3/slope3 in WGN signal + polynomial**

### 50

**4.17 Statistic distribution of AEC1/slope1 in speech signal + polynomial system**

### 51

**4.18 Statistic distribution of AEC13/AES3/slope1 in speech signal + polynomial**

**system**

### ...51

**4.19 Statistic distribution of AEC13/AES3/slope3 in speech signal + polynomial **
**system**

### ...52

**4.20 Statistic distribution of AEC1/slope1 in WGN signal + real system**

### ...53

**4.21 Statistic distribution of AEC13/AES3/slope1 in WGN signal + real system**

### 53

**List of Tables **

**2.1 The parameter of simplified Volterra Model**

### ...12

**4.1 Notation of six algorithms**

### ...41

**4.2 Simulation of double-talk**

### ...45

**Chapter 1 **

**Introduction **

In using hands-free telephone or teleconferencing, the speaker always hears his speech signal. This is because the system suffers from the annoying acoustic echo problem which is the far-end speech transmitted from the near-end microphone to near-end loudspeaker and back to far-end user from far-end loudspeaker. A simplified diagram of a heads-free telephone system is shown in Fig 1.1. If the far-end signal picked up from the near-end microphone can be cancelled, the acoustic echo problem would be overcome. If we estimate the echo path accurate, the echo would be cancelled. This method is acoustic echo cancellation (AEC) shown in Fig1.1. When using hands-free telephone or teleconferencing, the room impulse response can change very often. So, the AEC is time-variant to track the echo path to provide satisfactory speech communication quality in [1-5].

When the volume level is larger than the loudspeaker amplifier capability, the amplifier is overdriven. So, the loudspeaker power curve is not just linear. The linear AEC is not sufficient to estimate the acoustic echo channel. The performance is bad by the high level volume. To overcome the nonlinear problem, there are many methods that had been proposed. The nonlinearity of loudspeaker can be classified as nonlinearity with and without memory in [6-10]. In this thesis, we will introduce several nonlinear AEC methods.

In order to overcome the nonlinear effect of power amplifier, we always use high order structure of nonlinear AEC. However, the computation is more complex, and the convergence is too slow. The high order nonlinear AEC (NAEC) must adapt many coefficients simultaneously at the every iteration. The coefficients are interfering with each other. The object of nonlinear acoustic echo suppression (NAES) is to increase the attenuation of the nonlinearly distorted residual echo and the convergent speed. These time-variant estimates are used to approximately adjust the frequency-dependent gain value of the echo suppressor. In contrast to the application of adaptive filters to nonlinear echo cancellation, we are not interested in an exact identification of the nonlinear components of the echo path. We are rather aiming at estimates of power spectral density of the nonlinear components. This method uses linear relation between linear echo and nonlinear residual error to estimate power spectral density of nonlinear residual error that can save large calculation to suppress nonlinear residual error [11]. However, the linear echo is not linear with nonlinear residual error in fact. So, the high order nonlinear echoes can not be suppressed. Another method uses nonlinear adaptive filter to estimate high order nonlinear residual error that is more accurate than [12]. But, the method needs large calculation quantity.

Besides, we propose the new NAES method to combine the methods [11-12]. In fact, the nonlinear residual error is not linear with linear echo. The proposed method is used to overcome this drawback and save the calculation quantity than [12]. The new method is wanted to estimate nonlinear residual error to two parts. Low order nonlinear residual echoes estimate accurately by nonlinear adaptive filters and high order nonlinear residual echoes estimate roughly by correlate with low order echo.

This thesis is organized as follows. More details about nonlinear AEC with and without memory will be introduced in chapter 2. We will introduce nonlinear acoustic echo suppressions (NAES) and propose a new method that uses low order power filter to estimate low order nonlinear residual error and higher-order nonlinear echoes as basis to estimate high order nonlinear residual error. In chapter 4, we will show many computer simulations that have been discussed in chapter 2 and 3. Finally, we give a conclusion of our work.

**Chapter 2 **

**Overview of Nonlinear Acoustic Echo **

**Cancellation **

In Chapter 2 we will introduce several nonlinear AEC methods in the time domain including memoryless and memory. The traditional method to get over the acoustic echo is to eliminate all the signals from the far-end loudspeaker. However, this half-duplex communication is bad for user. The linear acoustic echo cancellation can overcome this difficultly and the hands-free telephone or teleconferencing can work at full-duplex. The loudspeakers for hands-free telephone or teleconferencing are usually small and cheap, so the loudspeaker will be saturated at high level speech. When the saturation effect happens, the loudspeaker is not linear any more. The residual error using only linear acoustic echo cancellation is very large. We will discuss the nonlinear acoustic echo cancellation to overcome this question.

To some loudspeakers, the nonlinear effects have memory. If using memoryless structures to model that, the cancellations don’t eliminate echo perfect. The memory structures for canceling the memory echo are complex in general, i.e. Volterra model. As shown below, we will introduce the several memoryless and memory structures.

**2.1 Memoryless Nonlinear AEC**

**2.1.1 Hammerstein model **

The Hammerstein model, as shown in Fig. 2.1, is used to model the loudspeaker and acoustic channel in [6]. The structure has two layers. First layer is nonlinear order weights to model static loudspeaker part. Second layer is linear FIR to model dynamic echo path. The first layer can estimate the parameters of nonlinear part separately from linear part.

### +

### +

### -ˆ

*h*

2
### ˆ

*a*

1
### ˆ

*a*

*a*

### ˆ

_{p}*h*

### First Layer

### Second Layer

### 1

*x*

*x*

_{2}

*x*

_{p}### [n]

*x*

**2.1.2 Orthogonalized Power Filter model **

This method wants to use power filter to model nonlinear effect in [13-14]. A p-th order power filter defines by its input/output relation as follows

1
1 0
[n]
*p*
*N*
*P*
*p* *l*
*y*
−
= =
=

### ∑ ∑

**p**

**p,l**

**h [n]x [n - l] (2.1.5)**

From (2.1.5) we notice that power filters can be considered as linear multiple input/single output systems, where the input of the p-th channel is given by the p-th power of . The input of each channel is then filtered by as associated linear filter

with channel length . For compactness, we write (2.1.5) in matrix notation:

*n*
**x[ ]**
,
*p l*
*h* *Np*
(2.1.6)
1
[ ]
*P*
*p*
*y n*
=
=

### ∑

**T**

**p**

**p**

**h x [n]**

With the vectors

[*x n x np*( ), *p*( 1), ,*x np*( *N _{P}* 1)]
= − −

**p**

**x [n]**K +

*T*⎤ ⎦ (2.1.7)

_{,0},

_{,1}, ,

_{,}

_{1}(2.1.8)

*p*

*p*= ⎣⎡

*hp*

*hp*

*hp N*−

**h**K

Note that the input signals of each channel are in general not mutually
orthogonal, i.e., *E*

### {

**x [n]x [n]i**

**j**

### }

≠0. Thus, a direct adaptive implementation of the no-orthogonalized power filter suffers from slow convergence. Therefore, a new set of mutually orthogonal input signals has been introduced [14]: **x [n] = x[n] _{o,1}** (2.1.9)
(2.1.10)
1
,
1

*p*

*p i*

*i*

*q*− = +

### ∑

**p**

**o,p**

**x [n] = x [n]**

**x [n]i**

for . The orthogonalization coefficients an be determined using the

Gram-schmidt orthogonalization method. Applying standard gradient descent techniques, it directly follows from (2.1.6) that the normalized least mean square (NLMS) update of the coefficients of the orthogonaized structure by

_{,} _{,} ,
, ,
[ ] [ ]
[ 1] [ ]
[ ] [ ]
*o p*
*o p* *o p* *p* *T*
*o p* *o p*
*n e n*
*h* *n* *h* *n*
*n* *n*
α
+ = + **x**
**x** **x** (2.1.11)

**2.2 Memory Nonlinear AEC**

In Sec.2.1, we assume the nonlinear effect of loudspeaker is memoryless. However, when the nonlinear memory effect of loudspeaker happens, the nonlinear memory AEC should be used.

**2.2.1 Wiener Model **

The Wiener model can model the memory loudspeaker in [15]. Wiener model has three layers. First layer and second layer model the nonlinear memory effect, and third layer models linear echo path. The Wiener model is shown Fig.2.2.

### ˆ

*h*

*h*

### [ ]

*x n*

### [ ]

*h*

*x n*

**2.2.2 Volterra Model **

As show in Fig 2.3 the system is composed of two different modules organized in a cascaded structure: a first level is modeling the nonlinear loudspeaker effect based on polynomial Volterra structure, and a second level is modeling the room impulse response of the acoustic path with standard linear filter in.

**o**
**w**
**w[n]**
, ,
δ **o** **o**
**c2** **c3**
**h h**
, ,
δ **h [n] h [n]c2** **c3**
ˆ[ ]
*d n*
+
−

### [ ]

*x n*

Fig. 2.3 Cascade model of the system and mirror adaptive system

The global include loudspeaker and acoustic path may be modeled by the parallel system (nonlinear/linear). The system, whose output is denoted by , is then modeled by a parallel structure with first-, second-, and third-order kernels

[ ]
*d n*
0 0
0
[ ]
*L* *L* *L*
*i* *i* *j i*
*L* *L* *L*
*i* *j i k* *j*
*d n*
= = =
= = =
=

### ∑

+### ∑∑

### ∑∑∑

**o**

**o**

**1**

**2**

**o**

**3**

**h [n]x[n - i]**

**h [i,j]x[n - i]x[n - j]**+

**h [i, j,k]x[n - i]x[n - j]x[n - k]**(2.2.1)

### ( )

ˆ*d n is far-end signal pass through two layers adaptive filter to model nonlinear *
loudspeaker and linear channel.

### ( )

ˆ

*d n* =**X [n]h [n]+ U [n]h [n]+ U [n]h [n] (2.2.2) _{1}T**

_{1}**T**

_{2}

_{2}**T**

_{3}**The input signal vectors for second- and third- order may be expressed as**

_{3}products of matrices [16]. We define first the filter , corresponding to the

second-order nonlinearity as the vector of dimension ( where

).
**2**
**h [n]**
2 1
*L* ×

### (

### )

2 1 / 2*L*=

*L L*+

### ( )

2 2 2 2 2 2 2 [ ] [ 1] [ 1] [ ] [ 1] [ 1] [ 2] [ ] [ 1] [ 1] [ 1] [ 2]

_{L}

_{N}*x n*

*x n*

*x n*

*N*

*x n x n*

*x n*

*x n*

*x n x n*

*L*

*x n*

*x n*

*L*

*x n*

*L*

*N*

_{×}⎡ − − + ⎤ ⎢

_{−}

_{−}

_{−}⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ =

_{⎢}− +

_{⎥}⎢

_{−}⎥ ⎢ ⎥ ⎢ ⎥ ⎢

_{− +}

_{− − +}⎥ ⎣ ⎦

**2**

**U n**" " # # # # # # # # # # # # " (2.2.3) (2.2.4) 2 2 2 2 2 2

_{1}[0, 0; ] [0,1; ] [0, 1; ] [1,1; ] [ 1, 1; ]

_{L}*h*

*n*

*h*

*n*

*h*

*L*

*n*

*h*

*n*

*h L*

*L*

*n*

_{×}⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ =

_{⎢}− ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢

_{−}

_{−}⎥ ⎣ ⎦

**2**

**h [n]**# # ⎥ ⎥

### ( )

### ( ) ( )

ˆ*e n*=

*d n*−

*d n*

The update equation of a given vector , using the NLMS

algorithm is given by [17-19].
, ,
**1** **2** **3**
**h [n] h [n] h [n]**
1
2
3
[ ]
[ ]
[ ]
*e n*
*e n*
*e n*
μ
μ
μ
= +
+
= +
+
= +
**1**
**1** **1** **T**
**1** **1**
**2**
**2** **2** **T**
**2** **2**
**3**
**3** **3** **T**
**3** **3**
**X [n]**
**h [n + 1] h [n]**
**X [n]X [n] δ**
**U [n]**
**h [n + 1] h [n]**
**U [n]U [n] δ**
**U [n]**
**h [n + 1] h [n]**
**U [n]U [n]+ δ**
(2.2.5)

The performance for Volterra structure is the best to the other structures, but the computer complexity is the most complexity.

**2.3 Computation Reduction of Simplified Volterra Model **

capable to model nonlinear system with memory. The Volterra filter is the common method to be deal with such system. Unfortunately, they suffer from high computational complexity. Therefore, we introduce an adaptive structure representing a simplified realization of a special second order Volterra filter in [20]. The simplified second Volterra filter is illustrated in Fig.2.4 and includes a linear branch and a second order nonlinear branch in parallel. The nonlinear branch consists of the cascade of an FIR filter, a multiplier, and a second stage filter.

1

### ˆ

*h*

**c**

**w**

### ×

### [ ]

*k*

**v**

### [ ]

*k*

**u**

*h*

2
### ˆ

_{SVF}*y*

### [ ]

*x n*

The coefficients of Volterra second order is ( +1)

2

*v* *v*

*N N*

, the memory length is . However, the coefficients of simplified second order Volterra is which is sum of

.
*v*
*N*
*v*
*N*
*c* *w*
*N* +*N*

As the below, we will discuss the distribution for and at the total

coefficients. For example N=5, we expand three conditions ,

, .
*c*
*N* *N _{w}*

*N*4 1

_{v}*c*

*w*

*N*=

*N*= 2 3

*c*

*w*

*N*=

*N*=

*N*=1

_{c}*N*=4 4 1

_{w}*c*

*w*

*N*=

*N*= : (2.2.6) 3 0 [ ]

*[ ]*

_{i}*i*

*v n*

*c x n i*= =

### ∑

− − − + − − + (2.2.7) 2 3 3 3 0 0 0 [ ]*[ ]*

_{i}

_{i}*[ ] [ ]*

_{j}*i*

*i*

*j*

*u n*

*c x n i*

*c c x n i x n*

*j*= = = ⎛ ⎞ =

_{⎜}−

_{⎟}= − ⎝

### ∑

⎠### ∑∑

(2.2.8)### (

### )

### (

### )

### (

### )

### (

### )

3 3 0 0 0 0 1 0 0 1 2 0 0 2 0 3 3 0 1 1 1 2 2 1 1 [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ 1] [ ] [ 2] [ ] [ 3] [ 1] [ 1] + [ 1] [ 2]*i*

*j*

*i*

*j*

*y n*

*w u n*

*wc c x n i x n*

*j*

*wc c x n x n*

*wc c*

*wc c x n x n*

*wc c*

*wc c*

*x n x n*

*wc c*

*wc c*

*x n x n*

*wc c n*

*x n*

*wc c*

*wc c x n*

*x n*

*wc c*= = = ⋅ = − − = + + − + + − + + − − + − − +

### ∑∑

### (

### )

### (

### )

3 3 1 2 2 2 3 3 2 3 3 [ 1] [ 3] [ 2] [ 2] [ 2] [ 3] [ 3] [ 3]*wc c x n*

*x n*

*wc c x n*

*x n*

*wc c*

*wc c*

*x n*

*x n*

*wc c x n*

*x n*+ − − + − − + + − − + − − 2 3

*c*

*w*

*N*=

*N*= 1 0 [ ]

*[ ]*

_{i}*i*

*v n*

*c x n i*= =

### ∑

(2.2.9) 2 1 1 1 0 0 0 [ ]*[ ]*

_{i}

_{i}*[ ] [ ]*

_{j}*i*

*i*

*j*

*u n*

*c x n i*

*c c x n i x n*

*j*= = = ⎛ ⎞ =

_{⎜}−

_{⎟}= − ⎝

### ∑

⎠### ∑∑

(2.2.10)### (

### )

### (

### )

### (

### )

### (

### )

2 2 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 2 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 1 [ ] [ ] [ ] [ ] [ ] [ ] [ 1] [ 1] [ 2] [ 2] [ ] [ 1] [ 1] [ 2]*k*

*k*

*i*

*j*

*k*

*k*

*i*

*j*

*y n*

*w u n k*

*w*

*c c x n k*

*i x n k*

*j*

*w c c x n x n*

*w c c*

*w c c x n*

*x n*

*w c c*

*w c c x n*

*x n*

*w c c*

*w c c x n x n*

*w c c*

*w c c*

*n*

*x n*

*w*= = = = = ⋅ − = − − − − = + + − − + + − − + + − + − − +

### ∑

### ∑ ∑∑

### (

### )

### (

2 1 0 2 0 1### )

1 22 2 2 1 2 1 1 [ 1] [ 2] [ 2] [ 3] [ 2] [ 2] [ 3] [ 3]*c c*

*wc c x n*

*x n*

*w c c*

*w c c x n*

*x n*

*wc c x n*

*x n*

*w c c x n*

*x n*+ − − + − − + − − + − − + (2.2.11)

1 4
*c* *w*
*N* = *N* = :
*v n*[ ]= ⋅*c x n*[ ] (2.2.12)
*u n*[ ]= ⋅

### (

*c x n*[ ]

### )

2 =*c*2⋅

*x n*2[ ] (2.2.13)

### (

### )

### (

### )

### (

### )

### (

### )

3 2 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 2 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 1 [ ] [ ] [ ] [ ] [ ] [ ] [ 1] [ 1] [ 2] [ 2] [ ] [ 1] [ 1] [ 2]*k*

*k*

*i*

*j*

*k*

*k*

*i*

*j*

*y n*

*w c x n*

*w*

*c c x n k*

*i x n k*

*j*

*w c c x n x n*

*w c c*

*w c c x n*

*x n*

*w c c*

*w c c x n*

*x n*

*w c c*

*w c c x n x n*

*w c c*

*w c c*

*n*

*x n*

*wc*= = = = = ⋅ = − − − − = + + − − + + − − + + − + − − +

### ∑

### ∑ ∑∑

### (

### )

### (

### )

1 2 2 1 2 1 0 2 0 1 2 2 2 1 1 [ 1] [ 2] [ 2] [ 3] [ 2] [ 2] [ 3] [ 3]*c*

*wc c x n*

*x n*

*w c c*

*w c c x n*

*x n*

*wc c x n*

*x n*

*w c c x n*

*x n*+ + − − + − − + − − + − − + (2.2.14)

Table 2.1 The parameter of simplified Volterra Model
4 1
*c* *w*
*N* = *N* = 2*N _{c}* =

*N*= 13

_{w}*N*=

_{c}*N*=4 00

_{w}*h*

*wc c*

_{0 0}

### (

### )

0 1 2 0*w*+

*w*+

*w c c*

_{0}

### (

*w*

_{0}+

*w*

_{1}+

*w*

_{2}+

*w c c*

_{3}

### )

_{0}

_{0}01

*h*

*wc c*

_{0 1}+

*wc c*

_{1 0}

### (

### )

0 1 2 0 2*w*+2

*w*+2

*w c*

*c*

_{1}0 02

*h*

*wc c*

_{0 2}+

*wc c*

_{2 0}0 0 03

*h*

*wc c*

_{0 3}+

*wc c*

_{3 0}0 0 11

*h*

*wc c*

_{1 1}

### (

### )

0 2 1*w*+

*w c c*

_{1}0 12

*h*

*wc c*

_{1 2}+

*wc c*

_{2 1}

*w c c*

_{2 1 2}0 13

*h*

*wc c*1 3+

*wc c*3 1 0 0 22

*h*

*wc c*

_{2 2}0 0 23

*h*

*wc c*

_{2 3}0 0 33

*h*

*wc c*

_{3 3}0 0

When the is more, the coefficients is more for nonlinear effect. From Table 2.1, modeling nonlinear memory effect of

*c*

*N*

4 1

*c* *w*

*N* = *N* = is the same with Volterra

filter. However, the linear adaptive filter tap is one.*N _{c}* =1

*N*= is the same with 4 linear adaptive filter. But, this condition can’t estimate nonlinear effect at all. The total modeling nonlinear memory coefficients for simplified Volterra structure are shown in (2.2.15) and Volterra structure are shown in (2.2.16) .

_{w}

### (

1### )

### (

1 2*c*

*c*

*c*

*w*

*N*

*N*

*N*

*N*+ + − (2.2.15)

### )

### (

1### )

2*v*

*N*+

*Nv*(2.2.16)

**Chapter 3 **

**Estimate Nonlinear Residual Error **

**for Nonlinear Acoustic Echo **

**Suppression **

Loudspeakers and amplifiers of mobile communication device may cause significant nonlinear distortion in the acoustic echo path, resulting in a limitation of the performance of only using linear acoustic echo cancellation [1-5]. There are several nonlinear acoustic echo cancellations [6-10] to overcome this distortion. But these methods converge too slowly for the room impulse response is changing on the going. So, we introduce nonlinear acoustic echo suppression (NAES). The convergence speed is faster than nonlinear acoustic echo cancellation (NAEC) [12]. Besides, NAES suppresses the residual echo that remains after a purely linear AEC is better than NAEC cancels the acoustic echo. There is linear acoustic echo cancellation and nonlinear acoustic echo suppression structure in Fig. 3.1.

ˆ[ ]

*y n*

The near-end microphone picks up the signal which includes the acoustic

echo speech and near-end speech in silent room. The acoustic echo

speech is far-end speech passing through nonlinear loudspeaker and room

impulse response. We use a linear adaptive filter to find room impulse response, and want to get replica acoustic echo to cancel the desire signal . However, this adaptive filter can’t find the loudspeaker’s nonlinear channel. The residual error is very large by only using linear acoustic echo cancellation (AEC) when a high level’s power speech is injected into a small loudspeaker. That is because the loudspeaker has been operated at saturation region. So, we want to find the filter after AEC to suppress nonlinear residual error. If the optimum gain can be found, the output signal of suppressed signal will be close to the near-end signal .

[ ]
*d n*
[ ]
*y n* *b n*[ ]
[ ]
*y n*
ˆ[ ]
*y n* *d n*[ ]
( , )
*G k m*
[ ]
*b n*

**3.1 Wiener Filter **

The foundation of nonlinear acoustic echo suppression is Wiener filter [1]. Consider the block diagram of Fig. 3.2 built around a linear discrete-time filter. The

filter input consists of a time series , and the filter is itself

characterized by the impulse response . At some discrete time n, the filter produces an output denoted by . The output is used to provide an estimate of a desired response designated by . With the filter input and the desired response representing single realizations of respective stochastic processes, the estimation is ordinarily accompanied by an error with statistical characteristics of its own. In particular, the estimation error, denoted by , is defined as the difference between the desired response and the filter output .

(0), (1), (2),...
*u* *u* *u*
0, 1, 2,...
*w w w*
( )
*y n*
( )
*d n*
( )
*e n*
( )
*d n* *y n*( )

0, 1, 2,...
*w w w*

Fig. 3.2 Block diagram representation of the statistical filtering problem

The optimum linear filter in Fig.3.2 is shown in Eq. (3.1.13). Assuming

that for allΩ, we find the following transfer function of the noncausal

Wiener filter:
( ) 0
*UU*
*S* Ω ≠
*
( )
( )
( )
*UD*
*opt*
*UU*
*S*
*H*
*S*
Ω
Ω =
Ω (3.1.13)

**3.2 Acoustic Echo Suppression Structures **

The integration [21] of noise reduction and echo cancellation has advantage of utilizing the synergy among its components. An important issue is the placement of these two algorithms. There structures can only be implemented in frequency domain because noise reduction requires frequency-domain implementation using FFT. Since the performance of the NLMS algorithm degrades significantly in the presence of high-level background noise, an immediate suggestion would be place the noise reduction prior to echo cancellation. However, the drawback is that the noise reduction introduces nonlinearity into the echo path.

In this section, we only take care of the linear echo and background noise. Only using the AEC to cancel the echo is not enough to assure the quality of auditory. This

is because another side user hears the background noise and the background noise interferes with the AEC algorithm. We will introduce two combined structures as below.

**3.2.1 AEC+NR structure **

The combined system is shown in Fig. 3.3 [21]. We use a conventional echo
canceller, consisting of a time variant FIR-filter adapted by the NLMS algorithm, and
of a combined residual echo and noise reduction filter implemented in the frequency
domain. *x n*[ ]denotes the far-end speech, the near-end speech and the noise.

The microphone signal is made up of the echo as well as of the near-end

speech and noise,

[ ]
*b n* *v n*[ ]
[ ]
*d n* *y n*[ ]
[ ] [ ] [ ] [ ]
*d n* =*y n* +*b n* +*v n* (3.2.1)

The estimated echo is subtracted from forming the echo compensated

signal ,
ˆ[ ]
*y n* *d n*[ ]
[ ]
*e n*
ˆ
[ ] [ ] [ ] [ ] [ ]
[ ] [ ] [ ]
*e n* *y n* *y n* *b n* *v n*
*y n* *b n* *v n*
= − + +
= Δ + + (3.2.2)

Depending on the effectiveness of the echo canceller, the residual echo
must be more or less attenuated by the filter G. The output signal
of the system is denoted by . If the AEC is perfect, the residual error e[n] is shown
in (3.2.3).
ˆ
[ ] [ ] [ ]
*y n* *y n* *y n*
Δ = −
[ ]
*z n*
*e n*[ ]≈*b n*[ ]+*v n*[ ] (3.2.3)
The suppression gain G suppresses the noise v[n], and the output z[n] is approximated
to b[n].

*Z w*( )=*G w E w*( )⋅ ( ) (3.2.4)
The suppress gain comes from the Wiener filter concept in Eq. (3.13) of Sec.3.1.

B
( )
*E*
*EE*
*BB*
*EE*
*S*

*G w* *if b and n are uncorrelated*

*S*
*S*
*S*
=
=
(3.2.5)
( )
*BB*

*S* *w is power spectral density of near-end speech and the * is power
spectral density of residual error.

( )
*EE*

*S* *w*

The error signal is then processed by noise reduction to get the noise free signal . With an optimal echo, the echo is completely canceled by the first filter, leaving the useful signal and noise unchanged. The output from the AEC is

ideally . The second stage aims at reducing noise through the Wiener gain

filter
[ ]
*e n*
[ ]
*z n*
[ ] [ ]
*b n* +*v n*
*BB*
*BB* *VV*
*S*

*S* +*S* . A disadvantage of this integrated structure is that the AEC has to

process noisy signals. In practice, the AEC system is adaptive. The coefficients of the AEC are disturbed by the ambient noise which is omnipresent.

ˆ[ ]
*y n*
*AEC*
[ ]

*x n*

Fig. 3.3 Block diagram of a combined echo canceling and noise reduction system

**3.2.2 NR+AEC structure **

Fig. 3.4 [21] shows the implementation of the integrated noise reduction and
echo cancellation where echo cancellation precedes noise reduction. *x n*[ ] is
processed by the AEC filter H to generate the echo*y n*[ ], which is subtracted from

[ ]

*d n*′ that is the microphone signal*d n*[ ] after suppressing the noise to generate*e n*[ ].
The effects of noise on the AEC can be minimized by placing noise reduction
upstream from this system. The noise reduction operation enhances the signal-to-noise
(SNR), which can improve the AEC behavior. However, the noise reduction causes
nonlinear distortion, and disturbs the AEC.

ˆ[ ]
*y n*
*AEC*
'[ ]
*d n* *d n*[ ]

### [ ]

*x n*

Fig. 3.4 Block diagram of a combined noise reduction and echo canceling system

**3.3 Estimation nonlinear residual error for NAES **

One of most basic filter to echo suppression is Wiener filter in Fig3.3. The Wiener filter gain is shown in (3.3.1).

*EZ*
*EE*
*S*
*G*
*S*
= (3.3.1)
If the*Z* =*B*, the suppression performs optimally. *E* concludes three parts
which are nonlinear residual echo *, near-end speech b , and background noise* .
Assuming the residual echo, near-end speech, and near–end noise are uncorrelated
and ignoring the noise, we have

*nl*

*nl nl*
*EB*
*EE*
*BB*
*BB* *Y Y* *NN*
*S*
*G*
*S*
*S*
*S* *S* *S*
=
=
+ +
(3.3.2)

In fact, we don’t know*S _{BB}*. So, we want to use

*S*, , and to estimate G.

_{EE}*nl nl*

*Y Y*

*S*

*SNN*

*EE*

*Y Ynl nl*

*NN*

*EE*

*S*

*S*

*S*

*G*

*S*− − = (3.3.3)

*EE*

*S* can be easily estimated from*E*. is more difficult to estimate.
*nl nl*

*Y Y*

*S*

We assume the background noise is very small in the quiet room [21].

*S _{NN}* ≈0 (3.3.4)
This would be the case in quiet offices or in cars that are not moving and when the
engine is switched off. The nonlinear residual echo suppression filter is used to reduce
the nonlinear echo further. For the transfer function of this filter, a Wiener filter is
often applied:
ˆ
ˆ
ˆ ˆ
ˆ
ˆ
1
ˆ

*nl nl*

*nl nl*

*EB*

*RE*

*EE*

*EE*

*Y Y*

*EE*

*Y Y*

*EE*

*S*

*H*

*S*

*S*

*S*

*S*

*S*

*S*= − = = − (3.3.5)

It should be noted that any impact of the residual echo suppression filter on residual echoes also affects the local speech signal. When applying Eq.( 3.3.5), the estimated power spectral densities and contain estimation errors. Therefore, the quotient may become larger than one. To prevent that, the filter transfer function can be used in Eq.(3.3.6) where determines the maximum attenuation of the filter. The overestimation parameter [21]

ˆ
*EE*
*S* ˆ
*nl nl*
*Y Y*
*S*
min
*H*

β can be used to control the “aggressiveness” of the filter.

_{min}
ˆ
max 1 ,
ˆ
*nl nl*
*Y Y*
*RE*
*EE*
*S*
*H*
*S*
β
⎡ ⎤
= ⎢ −
⎢ ⎥
⎣ ⎦
*H* ⎥ (3.3.6)

In order to estimate the short-term power spectral density of the error signal first-order IIR smoothing of the squared magnitudes of the frequency domain error signals

ˆ
*EE*

*S*

*E* is applied in order to estimate the short-term power spectral density. m
represents the frame index.

*S*ˆ ( ) (1_{EE}*m* = −γ) *E*2+γ*S*ˆ* _{EE}*(

*m*−1) (3.3.7) Because the disturbed error is not accessible, the estimation of the short-term power spectral density can’t be approximated in the same manner

as .
*nl*
*Y*
ˆ
*nl nl*
*Y Y*
*S*
ˆ
*EE*
*S*

As the above, we will discuss how to estimate . If the can be estimated
accurately, the low level of echo signal and minimum distortion of near-end speech
can be achieved.
ˆ
*nl nl*
*Y Y*
*S* ˆ
*nl nl*
*Y Y*
*S*

**3.3.1 Based on Highly Correlated Nonlinear Residual Echo **

This method [11] proposes a new residual-echo model based on the spectral
correlation between the residual echo and the echo replica. For this method there
should have ambient length for near-end at beginning. At first, find the ratio of AEC
residual error *E k m to AEC replica *( , ) *Y k m when there is only single talk. *( , )

represent the frame index, and the frequency bin, respectively. The

,

*k m* *E k m is *( , )

approximated to *E _{nl}*( , )

*k m in quiet room.*

*E k m*( , ) ≈ *E _{nl}*( , )

*k m*≈

*a k m*( , )⋅

*Y k m*( , ) (3.3.8) Let us consider approximating

*a k m*( , ) by

*a k*ˆ( ) using averaged absolute

values of the residual echo and the echo replica. Then,
single talk
( , )
ˆ( )
( , )
*E k m*
*a k*
*Y k m*
= (3.3.9)
where overline ⋅ means an average operation. When there is no near-end speech,

i.e. . By approximating in (3.3.8) with a regression coefficient

, the residual echo

( , ) 0

*B k m* = *a k m*( , )

ˆ( )

*a k* *E _{nl}*( , )

*k m is modeled as the product of a k*ˆ( ) and

( , )
*Y k m . *

*E _{nl}*( , )

*k m*≈

*E*ˆ

*( , )*

_{nl}*k m*

*a k*ˆ( )⋅

*Y k m*ˆ( , ) (3.3.10)

So, we can find suppression magnitude gain *G _{o p m}*

_{, ,}( , )

*k m*.

( , )
ˆ _{( , )}
( , )
( , ) ( , )
( , )
*BB*
*opt*
*EE*
*EE* *EnlEnl*
*EE*
*S* *k m*
*G* *k m*
*S* *k m*
*S* *k m* *S* *k m*
*S* *k m*
=
−
=
(3.3.11)

However, this method has drawback. The property between the nonlinear residual error and linear echo is not linear in real system.

**3.3.2 Power Filter Model **

We will find a frequency-depend gain which is cascade the residual

error from AEC [12]. If the optimum gain is found, the suppressed error will be close to the near-end signal. The output

( , )
*G k m*

( , )

*Z k m* of the AES with input reads in the

frequency domain.

[ ]
*e n*

*Z k m*( , )=*G k m E k m*( , ) ( , ) (3.3.17)
Here, *E k m*( , ) denotes the STFT of , where k represents the block time index
and m represents frequency bin. For the power spectral density,

[ ]
*e n*

If we want to find optimum value , is equal to near-end

background noise , and is equal to AEC residual error which includes

and the high order nonlinear echo signal .

( )
*G w* *S _{ZZ}*( )

*w*( )

*BB*

*S*

*w*

*S*( )

_{EE}*w*( )

*BB*

*S*

*w*

*S*( )

_{nl}*w*( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , )

*ZZ*

*opt*

*EE*

*BB*

*BB*

*nl*

*EE*

*nl*

*EE*

*S*

*k m*

*G*

*k m*

*S*

*k m*

*S*

*k m*

*S*

*k m*

*S*

*k m*

*S*

*k m*

*S*

*k m*

*S*

*k m*= = + − = (3.3.19)

In the above equation, can be obtained from AEC, but can not.

We will try to find the approximate the high order nonlinear echo power spectral density. This method uses power filter model to replace the loudspeaker nonlinear effect.

( )
*EE*

*S* *w* *S _{nl}*( )

*w*

The input/output relation of a pth-order power filter is given by
1
,
1 0
[ ] ( , )
*P N*
*p*
*p n*
*p* *n*
*y n* *h* *x k m*
−
= =
=

### ∑∑

(3.3.20) where*h*

_{p n}_{,}denotes the filter coefficients of the pth channel having input

*x n . p*[ ] Aiming at a frequency-domain implementation of power filters, we give the short-time Fourier transfer (STFT) representation of (3.3.20):

, (3.3.21)
1
( , ) ( , )
*P*
*p m* *p*
*p*
*Y k m* *H* *X* *k m*
=
=

### ∑

Here, *X _{p}*( , )

*k m*denotes the STFT of

*x k of length M. p*( ) represent kth

frame and mth frequency bin.

,
*k m*

,
*p m*

*H* is the coefficient of the pth power filter
corresponding to the mth frequency bin. In order to un-correlating the channel inputs,
we will use equivalent orthogonal structure (EOS) [14].

The adaptation of the channels is performed independently for each

nonlinear channel (p>2) with respect to the channel dependent error signal , ,

ˆ _{( )}

*o p m*

*E _{o p}*

_{,}( , )

*k m*=

*E k m*( , )−

*Y*ˆ

_{o p}_{,}( , )

*k m*(3.3.27) In comparison to the application of adaptive power filters to nonlinear echo cancellation, they are not interested in an exact identification of the nonlinear components of the echo path

*H*

_{o p m}_{, ,}( , )

*k m*by their adaptive couterparts

[15]. They are rather aiming at estimates of .

, ,
ˆ _{( , )}
*o p m*
*H* *k m*
, ( , )
*o p*
*Y*
*S* *k m*

**3.4 Suppression of nonlinear residual error and background noise **

The combined structure from Sec.3.2 can suppress the background noise. If the loudspeaker is ideal, the combined structures of AEC and NR could cancel the linear echo and suppress the background noise. However, the loudspeaker for mobile communication is cheaper and small, so the nonlinear effect is obvious. The methods in Sec. 3.3 ignore the background noise, and only cancel the linear echo and suppress the nonlinear residual error. In this section, we will suppress the noise first, and then use this concept in the NR+AEC structure o Section 3.3.1.

**3.4.1 AEC+NR suppression of noise and nonlinear residual error **

The NR+AEC structure to suppress the background noise and nonlinear residual error
is shown in Fig.3.7. The microphone receives the linear echo, nonlinear echo,
near-end speech signal, and noise signal. In optimal condition, the AEC can cancel the
linear echo and suppression gain *G _{ANnl}* suppress the nonlinear residual error and the
background noise in (3.4.1).
_ _
_

*BB*

*ANnl avg*

*ANnl*

*EE avg*

*S*

*G*

*S*= (3.4.1)

ˆ[ ]
*y n*
*AEC* *y n*[ ]+*ynl*[ ]*n*
*ANnl*
*G*
[ ]
*x n*

Fig. 3.5 Block diagram of NR+AEC to suppress noise and nonlinear residual error

_
*E avg*

*S* is the average power spectral density of the residual error from AEC in
(3.4.2).

_ _ (1 )

*E avg* *E avg* *E*

*S* = ×β *S* + −β ×*S* (3.4.2)

In ideal condition, the is approximated to . However, we can’t

know . We can estimate to replace . is used

to eliminate and . The background noise N can be

estimated first when there is no far-end speech signal. _ _

*B* *ANnnl avg*

*S* *SB avg*_

_
*B avg*

*S* *SB*_*ANnnl avg*_ *SB avg*_ *SB*_*ANnnl avg*_

_
*E avg*

*S* *SN*_*avg* *SNonlinear avg*_

_ _ _ _ _

*B* *ANnl avg* *E avg* *N* *avg* *Nonlinear avg*

*S* =*S* −*S* −*S* (3.4.3)

_ _ (1 )

*N* *avg* *N* *avg* *N*

*S* = ×β *S* + −β ×*S* (3.4.4)

_ _ (1 )

*Nonlinear avg* *Nonlinear avg* *Nonlinear*

*S* = ×β *S* + −β ×*S* (3.4.5)

**3.4.2 NR+AEC suppression of noise and nonlinear residual error **

background noise continuous. However, background noise disturbs adaptive filter for echo cancellation. So, NR+AEC structure overcomes this disturbance in Fig. 3.6.

ˆ[ ]
*y n*
*AEC*
'[ ]
*d n* *d n*[ ]
[ ] *nl*[ ]
*y n* +*y* *n*
*NAnl*
*G*
[ ]
*x n*

Fig. 3.6 Block diagram of NR+AEC to suppress noise and nonlinear residual error

The suppression gain *G _{NAnnl}* is shown to be

'_
_
*D* *avg*
*NAnnl*
*D avg*
*S*
*G*
*S*
= (3.4.6)
_
*D avg*

*S* is the average power spectral density of the signal received from microphone
in (3.4.7).

_ _ (1 )

*D avg* *D avg* *D*

*S* = ×β *S* + −β ×*S* (3.4.7)

In ideal condition, the *S _{D}*

_{'_}

*is approximated to and the average power*

_{avg}spectral density of linear echo . However, we can’t know and .

_
*B avg*
*S*
_
*y avg*
*S* *SB avg*_ *Sy avg*_

We can estimate *S _{D}*

_{'_}

*which is used*

_{avg}*S*

_{D avg}_{_}to eliminate and

to replace optimal
_
*N* *avg*
*S* *SNonlinear avg*_
'_
*D* *avg*
*S* in Eq. (3.4.8).

'_
*D* *avg*

*S* includes linear echo. The linear echo has been removed from*S _{B}*

_{_}

_{ANnl avg}_{_}. So,

'_
*D* *avg*

*S* is larger than *S _{B}*

_{_}

_{ANnl avg}_{_}, and

*GNAnnl*will cause larger distortion than

*GANnl*for near-end speech signal.

'_ _ _ _

*D* *avg* *D avg* *N* *avg* *Nonlinear avg*

*S* =*S* −*S* −*S* (3.4.8)
NR+AEC structure can suppress background noise before adaptive filter. However,
the suppression gain will cause distortion for microphone signal. The adaptive filter
can’t estimate real room impulse response. The AEC will not cancel echo perfect. In

**3.5 Volterra structure for NAES **

The high order echo basis of the nonlinear acoustic echo suppression can estimate the power spectral density of nonlinear residual error. However, the nonlinear memory effect maybe arises in some microphone. The nonlinear acoustic echo suppression can’t suppress this error and we use the Volterra structure in Section 2.2.3 to overcome nonlinear memory effect. In this section, we use second order Volterra structure to suppress the nonlinear memory effect in frequency domain [22].

First, get the Fourier transfer of the second order Volterra for far-end speech signal. The Fourier transfer for linear far-end speech signal is shown in(3.5.1).

2
1
0
( ) [ ]
*kw*
*N* _{j}*N*
*k*
*X w* *x k e*
π
− _{−}
=
=

### ∑

(3.5.1) The Fourier transfer of the second order Volterra for far-end speech signal is shown in(3.6.2). An second order discrete Volterra filter with input x[k], frequency( )

*Xv w* and memory length L can be described as

### [

### ]

2 1 1 1 0 0 ( ) [ ]*kw*

*N*

*L*

*L*

_{j}*N*

*k*

*p*

*q i*

*Xv w*

*x k*

*p x k*

*q e*π − − −

_{−}= = = =

### ∑∑∑

− − (3.5.2)*volterra*

*G*

2
### ˆ

*Yv*

v
### m

EE### S

Ev_nl### S

1### ˆ

*Y*

1
ˆ [ ]
*y n*1 ˆ

*H*2

### ˆ

*v*

*H*

3
ˆ
*Yv*3

### ˆ

*v*

*H*

ˆ
*n*

*Yv*

### ˆ

*vn*

*H*

[ ]
*x n*[ ]

*y n*[ ]

*b n*

### [ ]

*d n*

[ ]
*e n*

### [ ]

*z n*

1### [ ]

*xv n*

2[ ]
*xv n*3[ ]

*xv n*

### [ ]

*n*

*xv n*

Fig. 3.7 The Volterra structure for NAES

The Volterra structure can estimate second order nonlinear residual error , and

use the value to estimate slope . is more accurate than in Eq.

(3.3.9) to estimate nonlinear residual error, because the second order residual error has
been removed.
2
ˆ
Yv (k)
v
m (k) m (k)_{v} ˆa(k)
b=0 2
v
1
ˆ
E (k)-Yv (k)
m (k)=
ˆ
Y (k) (3.5.3)
Then, we will find nonlinear residual error from . The nonlinear residual error
is equivalent to the linear echo multiplied the fixed value for every frame

plus second order Volterra echo.

v
m (k)
1
ˆ
Y (k, )*m*
v
m (k)
*Ev _{nl}*(k, )

*m*=Y (k, ) m (k)+Yv ( , )ˆ

_{1}

*m*⋅

_{v}ˆ

_{2}

*k m*(3.5.4)

For finding suppression gain, we use and are the power spectrum

density and average from linear residual error and nonlinear residual error
.
avg
Ev
S
avg
*Enlv*
*S*
Ev(k,m)
(k, )
*nl*
*Ev* *m*

avg avg
avg
Ev
Ev
S -S
G_volterra(k,m)=
S
*Enlv*
(3.5.5)

The suppression error is the power spectrum from instant linear residual error

instant

Ev

*Z(k,m)=G_volterra(k,m) S*⋅ (3.5.6)
Using Volterra structure for NAES can suppress nonlinear residual error more
accurate than Hammerstein structure.

**3.6 Using high order nonlinear echo to estimate nonlinear residual **

**error **

In Section 3.3.1, it is assumed that the nonlinear residual errors of orders more than two are linearly related with linear echo . In fact, they are not. The assumption can be wrong. Fig. 3.8 is a block diagram of Section 3.3.1. However, if we could estimate higher order of residual error, the slope is more accurate to estimate nonlinear residual error

ˆ
*Y*
m(k)

### ⊗

E E*nl nl*

*S*

2
⋅
2
⋅
*m*

*G*

EE
*S*

### [ ]

*x n*

### [ ]

*y n*

### [ ]

*b n*

### [ ]

*d n*

ˆ[ ]
*y n*[ ]

*e n*

### [ ]

*z n*

[ ]
*nl*

*e n*

The nonlinear power filter can estimate nonlinear residual error to NAES in Section 3.3.2. So, we exploit this ideal to find medium order nonlinear echoes accurately and use slope method in Section 3.3.1 to estimate high order nonlinear echoes roughly.

We propose the new method as shown in Fig.3.9. The block diagram includes first, and third-order adaptive filters. For finding first, the difference to Section 3.3.1 is removing the third order echo.

pro
m (k)
b=0 3
pro
1
ˆ
E (k)-Y (k)
m (k)=
ˆ
Y (k) (3.6.1)
The nonlinear residual error includes two part that are third-order echo, and above
fifth-order echoes.
3
ˆ ˆ
(k, ) Y (k, )+Y(k, ) m (k)
*nl*
*E* *m* = *m* *m* ⋅ _{pro} (3.6.2)
*Enl*

*S*

*S*

*pro m pro*

_{EE}### G

### [ ]

*x n*

[ ]
*y n*[ ]

*b n*

### [ ]

*d n*

[ ]
*x n*3

### [ ]

*x n*

### ˆ[ ]

*y n*

3
### ˆ [ ]

*y n*

### [ ]

*e n*

### [ ]

*z n*

For finding suppression gain, we use and are the power spectrum density

and average from linear residual error and nonlinear residual error

.
avg
E
S
avg
*Enl*
*S*
E(k,m)
(k, )
*nl*
*E* *m*
avg avg
avg
E
E
S -S
G_proposed(k,m)=
S
*Enl*
(3.5.7)

The suppression error is the power spectrum from instant linear residual error.

instant

E

*Z(k,m)=G_proposed(k,m) S*⋅ (3.5.8)
The good combined system wants to cancel or suppress echo large and keep the
near-end speech signal.

The linear AEC slightly cancels the far-end speech and keeps the near-end speech signal. The nonlinear AEC hears the noise even louder than the near-end speech. This is because the near-end speech would interfere with the adaptive filter to estimate the nonlinear and linear channels in the double-talk situation. The linear AEC and nonlinear AEC are not using suppression structure, so we don’t discuss Gmin in Eq. (3.3.6). The nonlinear AES can greatly suppress the nonlinear residual error. However, the near-end speech signal is seriously lossy. That is because the suppression gain causes the large disturbance. So, we should set Gmin for suppression gain. With Gmin=0.25, the near-end speech is little lossy, in spite of the far-end speech is less canceling than Gmin=0.

**3.6.1 Basis Selection for proposed slope method **

From Section 3.2, we only estimate the linear echo from linear AEC. So, there is only one choice for finding slope basis. The choice is the linear echo. For proposed method, there are several bases to be selected, because we use nonlinear AEC to estimate linear echo and lower order nonlinear echo. Next, we will discuss selecting

basis for proposed method.

A commonly used function for modeling saturation is a sigmoid functionϕ( )*u* in
(3.5.9) [15]. It’s popular to model nonlinear effect of loudspeaker. Fig.3.10 is used
parameter α=3.5,β =1. We try to fix the sigmoid function by fifth-order polynomial
function in Eq. (3.5.10).
2
( ) 1
1 e *u*
*u* _{α}
ϕ =⎛_{⎜ +} _{−} − β
⎝ ⎠
⎞
⎟
7
(3.5.9)
3 5

### ( )

### 3.4761

### 3.1740

### 2.3999

### 0.8233

*s x*

### =

*x*

### −

*x*

### +

*x*

### −

*x*

(3.5.10)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
sigmoid function
input
ou
tp
ut
sigmoid function
Fig. 3.10 Sigmoid function

The nonlinear AEC includes linear and third order adaptive filter. If the nonlinear AEC is ideal, the residual error has only fifth and seventh order echoes. We will select

first or third echo as basis to estimate summation of fifth and seventh order echoes. However, we can’t which order echo is more linear than summation echoes in every frequency bin. For selecting basis, we only observe the statistic distribution and ERLE in chapter 4. From simulation, we can say the third echo is linear than first echo.

In order to accurately estimate the linear property, we define correlation mismatch factorε which is variance of estimate nonlinear residual error and real nonlinear residual error. If the correlation mismatch factorε is less, the order nonlinear echo is more linear.

_
*nl* *nl est*
*E* *E*
ε= − (3.5.16)
_ ˆ
*nl est*
*E* *= ×Ym* (3.5.17)
The ε of third basis is smaller than first basis. So, we will select the highest order
echo that we can get as basis.

**Chapter 4 **

**Computer Simulations **

We will show the simulation of chapter 3 to verify the algorithms in this chapter. First, we define some parameters and speech signal in Section 4.1. Second, compare two combined structures that are AEC+NR and NR+AEC in three conditions. The conditions include background noise, nonlinear residual error, and both of two in Section 4.2. Third, we use Volterra structure to estimate second order memory echo for nonlinear AES. Fourth, comparing six methods in ERLE are linear AEC, linear & third order AEC, linear AEC & third order AES, linear AEC & slope1 AES, AEC1 & third order AES & slope1, and AEC1 & third order AES& slope 3 in Section 4.3. Finally, in Section 4.4, we analyze the statistical distributions and correlation mismatch factorε of three slope methods that are discussing in Section 4.3, and select nonlinear order echo to as basis.

**4.1 Parameters and speech signal of simulations **

For nonlinear systems, the sigmoid function is commonly used [15], is shown in

Fig. 3.10. The sigmoid function is ( ) 2 1

1 e *u*

*u* _{α}

ϕ =⎛_{⎜ +} _{−} − β

⎝ ⎠

⎞

⎟ . In the following nonlinear

model for sigmoid function, the parameters are α =3.5andβ =1. We give a real

speech signal as far-end signal, and then the output of near-end microphone which we get is defined as desired signal in Fig. 4.1.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 iterations am p ti tud e speech signal far-end signal desired signal

Fig. 4.1 Speech signal

**4.2 Compare two combined structures in three conditions **

can suppress background noise or nonlinear residual error first that avoid the noise disturbing operation of AEC. However, if we use NR first, there is a distortion for desired signal. Then, the AEC can’t estimate the real impulse response any more. We will discuss the two structures in three different noise conditions.

**4.2.1 Background noise **

** The input signal is white gauss signal. The loudspeaker is perfect and room **

impulse response is exponential decay with 128 taps in Fig. 4.2. The SNR is 20dB and -5dB. The adaptive filter is linear and the length is 128.

As evaluation criterion we use the echo return loss enhancement (ERLE) defined as

### {

### }

### {

### }

2 2 ( ) ERLE=10 log [ ] ( )*E d k*

*dB*

*E z k*(4.2.1) If the output signal after the cancellation or suppression structures is less correlated than input signal, the performance of the structures is better. From Eq. (4.2.1), ERLE is large for good cancellation or suppression structures.

0 20 40 60 80 100 120 140 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Room Impulse Response

sample A m p li tude

Fig. 4.2 Pseudo room impulse response

The Fig. 4.3 and Fig. 4.4 are shown the simulation in SNR = 20dB, and SNR = -5dB. In Fig. 4.3, AEC+NR structure is better than NR+AEC structure by 3.5dB in SNR = 20dB. However, in Fig. 4.4 AEC+NR structure is worse than NR+AEC structure by 1.5dB in SNR = -5dB. This is because using NR first will cause nonlinear distortion obviously to disturb AEC in high SNR and using AEC first that the background noise will disturb AEC in low SNR. The background noise is large (SNR = -5dB) in Fig. 4.4, so the performance is not good in both structures.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -5 0 5 10 15 20 25 30 Combined Structures: SNR = 20dB iterations ER L E [d B] far-end(WGN) Linear AEC AEC+NR NR+AEC

Fig. 4.3 Combined structures for suppressing background noise (SNR=20dB)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -1 0 1 2 3 4 5 6 7 8 9 Combined Structures: SNR = -5dB iterations ER L E [d B ] far-end(WGN) Linear AEC AEC+NR NR+AEC

**4.2.2 Nonlinear residual error **

We use two combined structures to suppress nonlinear residual error. The environment is same as Section 4.2.1 except for loudspeaker that has nonlinear effect and SNR = 50dB. The nonlinear effect uses the sigmoid function in Fig. 3.10 and Eq. (3.5.9). In order to estimate nonlinear residual error for NR structure, we use “Based on Highly Nonlinear Residual Echo” in Section3.3.1. In Fig.4.5, AEC+NR structure is better than NR+AEC structure by 3dB.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -5 0 5 10 15 20 25

Combined Structure: Nonlinear function 3.4761*x-3.1740*x.3+2.3999*x.5-0.8233*x.7

iterations ER L E [d B ] far-end(WGN) Linear AEC+NR NR+AEC

Fig. 4.5 Combined structures for suppressing nonlinear residual error (SNR=20dB)

**4.2.3 Background noise and nonlinear residual error **

** In Section 3.4, we introduce two combined structures AEC+NR and NR+AEC to **

suppress background noise and nonlinear residual error in simultaneous. The simulation result is shown in Fig.4.4. The AEC+NR structure is better than NR+AEC structure by 3dB. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -5 0 5 10 15 20

Combined Structure: Nonlinear function 3.4761*x-3.1740*x.3+2.3999*x.5-0.8233*x.7

iterations ER L E [d B ] far-end(WGN) Linear AEC+NR AEC+NR+nonlinear NR+AEC NR+AEC+nonlinear

Fig. 4.6 Combined structures for suppressing nonlinear residual error and background noise

**4.3 Performance of Volterra Structure for NAES **

In Section 3.6, we introduce use the Volterra structure in Section 2.2.3 can cancel the nonlinear memory effect. The simulation result is shown in Fig.4.5 and Fig.4.6. The linear filter length is N=64. The second order Volterra filter of length V = 4. For input signal of WGN, the Volterra structure is better than NAEC. However, for input signal of speech, the Volterra structure is worse than NAEC. This is because

1 ˆ

the convergence speed for Volterra structure is slow. The linear filter *H* length is
N=128, and the nonlinear filter is memory polynomial shown in (4.5.1). Volterra
structure for NAES is not better than power filter, because the Volterra structure needs
to adaptive too many coefficients.

### {

### }

3 [ ]+0.05 [n]+ 0.1 [ 1] [ 2] [ 1] [ 3] [ ] [ 4] [ 2] [ 3] [ 2] [ 4] [ 3] [ 4]*S*

*x n*

*x*

*x n*

*x n*

*x n*

*x x*

*x n x x*

*x n*

*x x*

*x n*

*x x*

*x n*

*x n*= − − + − − + − − − + − − + − − + (4.5.1) 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 10 15 20 25 30 35 40

speech + polynomial system

iterations ER L E [d B ] far-end(WGN) AEC1 AEC1v2 AEC1/AESv2 AEC1/AESv2/slope1

**4.4 Simulation of Highly Nonlinear Residual Errors **

**4.4.1 Single talk **

The linear AEC is only used to cancel the linear echo only. The nonlinear residual echo of more than two orders is not cancelled at all. The Nonlinear AEC method can cancel the high orders’ residual echo. The performance of NAEC is dependent on the adaptive filter number you used. We implement the third order NAEC. We use two concepts of “slope” and ”power filter” in Section 3.3.1 and Section 3.3.2 and use the proposed method in Section 3.5 to run the simulation. We arrange the six algorithms’ notation in Table 4.1.

Table 4.1 Notation of six algorithms

notation comment Reference

1 AEC1 AEC [1]

2 AEC13 NAEC [3]

3 AEC1/AES3 AEC/NAES [9]

4 AEC1/slope1Æ3+ AEC/ slope(linear echo) [10]

5 AEC1/AES3/slope1Æ5+ AEC/ NAES/ slope(linear echo) Proposed

6 AEC1/AES3/slope3Æ5+ AEC/ NAES/ slope(third echo) Proposed

AEC1 uses linear AEC to cancel the linear echo that can’t cancel the nonlinear residual error anymore. AEC13 uses linear AEC and third-order nonlinear AEC to cancel linear echo and third echo that still exists the nonlinear residual error more than fifth-order nonlinear residual error. The performance of AEC13 is better than AEC1. AEC1/AES3 uses linear AEC to cancel the linear echo and suppress third-order

nonlinear residual error by “power filter” method in Section 3.2.2. The performance of AEC1/AES3 is better than AEC1/AEC3. That is because the convergence speed of AEC1/AES3 is quicker than AEC1/AEC3. AEC1/slope1Æ3+ uses linear AEC to cancel linear echo and suppress all order nonlinear error by “slope” method of linear echo basis in Section 3.2.1. If the total high order nonlinear residual error is larger than the third-order nonlinear echo, the performance of AEC1/slope1Æ3+ is better than AEC1/AEC3. AEC1/AES3/slope1Æ5+ and AEC1/AES3/slope3Æ5+ use linear AEC to cancel the linear echo, third-order suppression to suppress third echo, and suppress more than fifth-order nonlinear residual error by “slope” method of linear echo or third-order echo basis. AEC1/AES3/slope1Æ5+ and AEC1/AES3/slope3Æ5+ are better than AEC1/AES3 and AEC1/slope1Æ3+, because AEC1/AES3/slope1Æ5+ and AEC1/AES3/slope3Æ5+ can suppress third-order echo accurately and suppress more than fifth-order nonlinear residual error. For real speech signal to loudspeaker, the high order nonlinear residual error is more linearly with the third-order echo.

In Fig. 4.8, we use the real speech signal and real system. The nonlinear effect is of the total more than fifth-order nonlinear residual error is larger than third-order echo. So, AEC1/slope1 is better than AEC1/AES3. AEC13/AES3/slope1 can suppress third-order echo accurately than AEC1/slope1, so the performance is better. The nonlinear residual error that is more than fifth-order echo is more linearly with the third-order echo than first-order echo. So, AEC13/AES3/slope3 is better than AEC13/AES3/Slope1.

In Fig. 4.9, we use speech signal as far-end signal and nonlinear system is
polynomial function *s x*( )=3.4761*x*−3.1740*x*3+2.3999*x*5−0.8233*x*7 .The

proposed method is the best than the others.

In Fig. 4.10, we use white gauss noise as far-end signal and nonlinear system is
polynomial function *s x*( )=3.4761*x*−3.1740*x*3+2.3999*x*5−0.8233*x*7 . The
third-order echo is large. If we can’t estimate third-order echo accurately, the ERLE
will bad. So, AEC1/AES3 is better than AEC1/slope1. The proposed method is the
best than the others.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -2 0 2 4 6 8 10 12 14 16 18

speech + real system

iterations ER L E [d B ] far-end(speech) AEC1 AEC13 AEC1/AES3 AEC1/slope1 AEC1/AES3/slope1 AEC1/AES3/slope3

0 0.5 1 1.5 2 2.5 x 104 -5 0 5 10 15 20 25

speech + polynomial system

iterations ER L E [d B ] far-end(speech) AEC1 AEC13 AEC13/AES3 AEC1/slope1 AEC13/AES3/slope1 AEC13/AES3/slope3

Fig. 4.9 ERLE for speech signal & polynomial system

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -5 0 5 10 15 20 25 WGN + poly system iterations ER L E [d B ] far-end(WGN) AEC1 AEC13 AEC1/AES3 AEC1/slope1 AEC1/AES3/slope1 AEC1/AES3/slope3