## 國 立 交 通 大 學

## 電信工程學系

## 碩 士 論 文

### 根據最佳收斂步伐與通道削減來提升非線

### 性迴音消除的收斂性

**Study on Fast Converging Nonlinear Echo **

**Cancellation Based on Optimum Step Size **

**and Channel Shortening Approaches **

### 研究生:施嘉勝

### 指導教授：謝世福 博士

### 根據最佳收斂步伐與通道削減來提升非

### 線性迴音消除的收斂性

### 學生：施嘉勝 指導教授：謝世福

### 國立交通大學電信工程學系碩士班

**中文摘要 **

為了消除免持聽筒或者視訊會議上非線性的迴音，傳統上可以用 Volterra 濾
波器或 Hammerstein 濾波器來追蹤非線性迴音的通道。然而這兩個濾波器最大的
缺點就是收斂速度慢並需要付出高的計算量。
在此篇論文中，我們提出最佳的可調整式收斂步伐演算法並且應用在 Volterra
濾波器。其目的在於加快收斂速度，此收斂步伐是由估計濾波器與真實的最小閥
係數誤差在均方誤差(MSE)。每一個閥，都隨著係數誤差改變而調整的收斂步伐，
而由於此演算法需要知道真實的迴音通道，所以我們進一步提出模擬通道的實際
的應用。
除 了 收 斂 步 伐 的 控 制 ， 通 道 削 減 結 構 (channel shortening) 也 被 用 來 解 決
Hammerstein 濾波器收斂速度慢與高複雜度的問題。我們做了 Least-square 和適
應性演算法角的理論分析，並且提出多級更新係數的方法來更加快收斂速度。最
後用電腦模擬來支持驗證之前的分析討論。
**Study on Fast Converging Nonlinear **

**Echo Cancellation Based on **

**Optimum Step Size and Channel **

**Shortening Approaches **

** Student: C. S. Shih Advisor: S. F. Hsieh **

**Department of Communication Engineering **

**National Chiao Tung University **

**Abstract **

In order to cancel nonlinear acoustics echo in hands-free telephones or

teleconferencing system. In general, adaptive Volterra filter and Hammerstein model

are known to track nonlinear echo path. However, their major drawbacks are slow

convergence rate and high computation complexity.

In this thesis, we propose an optimum time–and tap– variant step-size for

Volterra filter in order to speed up convergence rate. The step-size is based on the

MMSE criterion of coefficients errors. As the optimum step-size needs to know the real

echo path coefficient, we propose the exponential model for practical implementations.

In addition to adaptive step-size control , the channel shortening structure was

proposed to overcome slow convergence rate and high computation complexity in

Hammerstein structure, we perform the least-square and adaptive algorithm

From which a multiple stage update scheme is proposed in this structure to speed up

convergence rate. Computer simulations justify our analysis and show the improved

**Acknowledgement **

### I would like to express my deepest gratitude to my advisor Dr. S.F.

### Hsieh, for his enthusiastic guidance and great patience, especially the

### autonomy in research. Throughout the composition of this thesis, Dr Hsieh

### provides me with many enlightening viewpoints and insightful suggestions.

### My special thanks go to C.W Huang, S.H Weng, J.H Dai, and H.C Chen

### for their inspiration and encouragement. I also appreciate my friends for

### their inspiration and help. Finally, I would like to show my thanks to my

### parents, sisters, and girlfriend for their unceasing encouragement and love.

**Study on Fast Converging Nonlinear **

**Echo Cancellation Based on **

**Optimum Step Size and Channel **

**Shortening Approaches **

**Contents **

### 中文摘要

**... I**

**ENGHLISH ABSTRACT**

**...II**

**ACKNOWLEDGMENTS**

**... IV**

**CONTENTS**

**... V**

**LIST OF FIGURES**

**... VIII**

**LIST OF TABLES**

**... XI**

**CHAPTER 1**

**...**

**1**

**INTRODUCTION**

**...**

**1 **

**CHAPTER 2 **

**...**

**3 **

**ADAPTIVE NONLINEAR ACOUSTIC ECHO CANCELLATION**

**..**

**3**

**2.1NONLINEAR AECSTRUCTURES**... 5

**2.1.1 Memoryless nonlinear AEC ... 5**

**2.1.2 Memory nonlinear AEC... 6**

**2.2CONVERGENCE RATE SPEED-UP ALGORITHMS FOR NAEC ... 10**

**2.2.1 Input Signal decorrelation ... 10**

**2.2.2 Orthogonal polynomial-basis... 12**

**CHAPTER 3**

**... 13**

**Optimum Step Size For Nonlinear AEC **

**...13**

**3.1CONVENTIONAL STEP SIZE CONTROL***... 14*

**3.2.1 Li near AEC... 15**

**3.2.2 NonLinear AEC... 17**

**3.2DERIVATION OF OTTLMS ALGORITHM**... 22

**3.3EXTENSION TO OTTNLMS ALGORITHM**... 27

**3.4PRACTICAL IMPLEMENTATIONS OF OTTLMS ALGORITHM**... 29

**3.4.1 Exponential models of linear and quadratic kernel... 29**

**3.4.2 Exponential models of time-varying step-size ... 33**

**3.5ECHO PATH AND DOUBLE TALK CONDITIONS**... 35

**3.6COMPLEXITY COMPUTATION **... 39

**3.7SUMMARY **... 40

**CHAPTER 4**

**... 41**

**Channel Shortening Structure For Nonlinear AEC **

**... 41**

**4.1CHANNEL SHORTENING APPROACH**... 41

**4.2THEORETICAL ANALYSIS OF LINEAR ECHO CHANNEL**... 49

**4.2.1 Least-square solutions ... 49**

**4.2.2 Adaptive LMS algorithm and its convergence analysis ... 52**

**4.2.3 Non-unique solution problem ... 54**

**4.3MULTIPLE-STAGE UPDATE IN CHANNEL SHORTENING STRUCTURE**... 56

**4.4VOLTERRA WITH CHANNEL SHORTENING AND OTTLMS... 57**

**CHAPTER 5**

**... 58**

**Simulation **

**...58**

**5.1SMULATION PARAMETERS**... 58

**5.2ERLE CONVERGENT RATE COMPARISON**... 61

**5.2.1 Compison of OTTLMS , only linear OTTLMS and LMS... 61**

**5.2.2 Comparison of OTTLMS and Different parameters of model function... 65**

**5.2.3 Comparison of OTTLMS and OTLMS ... 70**

**5.2.4 Exponentially approximated temporal function and LMS ... 72**

**5.2.5 OTTNLMS and Kuech’s approach ... 75**

**5.2.7 Echo-path and Double talk condition... 77**

**5.3PERFORMANCE COMPARISON FOR CHANNEL SHORTENING STRUCTURE**... 79

**5.3.1 Theoretical shortening and original channel... 79**

**5.3.2 Different length effect ... 81**

**5.3.3 Comparison of LMS convergent analysis and simulated ... 84**

**5.3.4 Comparison of adaptive LMS algorithm and least-square solution ... 86**

**5.3.5 Multiple stage update ... 88**

**5.2.6 Volterra with channel shortening and OTTLMS... 90**

**CHAPTER 6**

**...**

**92 **

**CONCLUSION **
**BIBLIOGRAPHY ...93 **

**List of Figures **

Fig 1.1. Hands-free telephone system……….…….1

Fig 1.2. Nonlinear acoustic echo cancellation system……….……2

Fig 2.1. Hammerstein structure……….……...5

Fig 2.2. Volterra structure……….…………6

Fig 2.3. Wiener structure……….……….7

Fig 2.4. Wiener and Hammerstein structure……….………8

Fig 2.5. Second-order Volterra with decorrelation filter……….….……11

Fig 3.1. Trade off of step size in LMS algorithm……….………..…..14

Fig 3.2. Second-order Volterra acoustic echo canceller………...……18

Fig 3.3. Linear kernel and exponential model of the envelope………….…..….30

Fig 3.4. Real quadratic kernel of the nonlinear loudspeaker………...…30

Fig 3.5. Exponential model of the envelope of the quadratic kernel…….…..…31

Fig 3.6. Linear kernel step size temporal function……….…...…...33

Fig 3.7. Quadratic kernel step size temporal function………...….…..33

Fig 3.8. OTTLMS during echo-path variations……….…..….…36

Fig 4.1. Channel shortening structure for nonlinear AEC………...….…42

Fig 4.2. Comparison of classical Hammerstein and channel shortening structure ………...……….….46

Fig 4.4. Shortening structure for the linear loudspeaker………..…...49

Fig 5.1.1 Room impulse response………...……..60

Fig 5.1.2 Quadratic kernel………..……...60

Fig 5.1.3 Speech signal………..……...61

Fig 5.2.1 Comparison of OTTLMS and LMS algorithm(with white Gaussian input)……….…………..62

Fig 5.2.2 Comparison of OTTLMS and LMS algorithm(with real speech)……..63

Fig 5.2.3 Comparison of OTTLMS , only-linear-LMS ,and LMS algorithm(with white Gaussian input)……….……64

Fig 5.2.4 Comparison of OTTLMS , only-linear-LMS ,and LMS algorithm(with real speech)……….…………65

Fig 5.2.5 RIR and Model function……….……66

Fig 5.2.6 Quadratic kernel and Model function……….…67

(a) Quadratic kernel……….………...67

(b) Under Model……….………67

(c) Matched Model……….………68

(d) Over Model……….………..68

Fig 5.2.7 Comparison of inaccurately model function(with white Gaussian input)………...…………....69

Fig 5.2.8 Comparison of inaccurately model function(with real speech)……..…69

Fig 5.2.9 Comparison of OTTLMS and OTLMS (with white Gaussian input)….70 Fig 5.2.10 Step size of practical OTLMS (with white Gauss input)…………...….71

Fig 5.2.11 Comparison of OTTLMS and OTLMS (with real speech)……...……..71

Fig 5.2.12 Step size of practical OTLMS (with real speech)………...……72

Fig 5.2.14 Comparison of OTTLMS and EAOTTLMS ( with real speech )……73

Fig 5.2.15 Step size of EAOTTLMS……….74

Fig 5.2.16 Comparison of OTTNLMS and Kuech approach(with white Gauss

input)……….………74

Fig 5.2.17 Comparison of OTTNLMS and Kuech approach(with real speech)…75

Fig 5.2.18 Comparison of OTTNLMS and Kuech approach in imperfectly model

condition………...76

Fig 5.2.19 Comparison of OTTNLMS and Kuech approach in EPC and DT

conditions……….77

Fig 5.3.1 Shortened channel from Least-square solution coefficient…………..79

Fig 5.3.2 Coefficient error effect power of different length in FIR filter………80

Fig 5.3.3 Coefficient error effect of different length in shortening filter……...81

Fig 5.3.4 Coefficient error effect of different length in two filters………….….82

Fig 5.3.5 Comparison of theoretical and simulated (coefficient error)…...…...84

Fig 5.3.6 Comparison of theoretical and simulated (Mean-square error)……....84

Fig 5.3.7 Coefficient error comparison of LMS algorithm and least-square

solution……….…...85

Fig 5.3.8 Comparison of different multiple stage update strategies (with white Gaussian

input)………..……....86

Fig 5.3.9 Comparison of different multiple stage update strategies (with real

speech)………...…….87

Fig 5.3.10 Channel shortening for second-order Volterra structure (with white

Gaussian input)………...………88

Fig 5.3.11 Channel shortening for second-order Volterra structure (with real

**List of tables **

Table.3.1 OTTLMS algorithm ……….….27

Table.3.2 OTTNLMS algorithm……….…...28

Table.3.3 Approximated Exponential temporal function of step-size in OTLMS…...35

Table.3.4 Computation complexity comparison of different algorithms…………..….39

Table 4.1.Computation complexity comparison of classical and shortening

structure……….….…48

Table 5.1 Normalized power comparison of original channel and shortened

channel………...79

**Chapter 1 **

**Introduction **

In these years, hands-free system telephone and teleconference systems are widely used. However, the systems usually suffer from the annoying acoustic echo problem; the phenomenon occurs that the far end speech is transmitted back to the microphone at the near end. A hands-free telephone system is shown in Fig 1.1. The main problem of acoustic echo cancellation (AEC) is to copy the unknown echo path and subtract the copied echo components from the microphone output. Since the echo path may be time-variant due to objects moving around the room, an adaptive filter is commonly used for tracking the echo path. If AEC estimates the echo path accurately, the echo would be cancelled and the communication quality would be enhanced.

∫∫

### ∫∫

There are many adaptive algorithms that have been proposed [13]. The least-mean-square (LMS) algorithm is famous for its low computational cost, the affine projection algorithm (APA) and recursive least-squares (RLS) algorithm has its advantage of fast convergence rate, but they are higher computational complexity than LMS algorithm.

However, competitive audio consumer products require not only cheap signal processing hardware but also low-cost analog equipment and sound transducer, the echo path has nonlinear components caused by the capability of power amplifier (PA) [1], which can be overdriven and leads to nonlinear distortion in far-end speech. Therefore, the linear AEC is not sufficient to estimate the acoustic echo path. To overcome this problem, there are many methods that had been proposed. A general nonlinear AEC system is shown in Fig1.2. The far-end speech signal is passing through the nonlinear loudspeaker and the room impulse response and then picked up by microphone.

In order to overcome the nonlinear acoustic echo caused by power amplifier, the popular method is via polynomial functions, i.e. Hammerstein model [5], Volterra

[2]-[3], Wiener [17], Wiener and Hammerstein model [18]. Those four of nonlinear

models will be introduced in Chapter 2.

Another approach is using echo suppression method, which is to increase the attenuation of the nonlinearly distorted residual echo and the convergent speed, but this approach will cause the near-end speech distortion and not eliminate acoustic echo completely.

In this thesis, in order to avoid near-end speech and eliminate acoustic completely, we focus on the nonlinear acoustic cancellation and employ the Volterra and Hammerstein model to track the nonlinear echo path. To overcome their slow convergence rate and high complexity, we propose an optimum time and tap – variant step-size for Volterra filter to speed up convergence rate in Chapter 3.

The channel shortening structure has been proposed in [14] to overcome high computational complexity disadvantage of Hammerstein structure. In chapter 4, we perform theoretical analysis in the senses of LMS and LS in case of a linear loudspeaker. In addition to theoretical analysis, we will propose multiple stage update scheme to speed up convergence rate.

We will provide computer simulations to justify our analysis and show the improved performance of the proposed nonlinear acoustic echo canceller in Chapter 5. Finally, we will give a conclusion of our work.

**Chapter 2 **

**Adaptive Nonlinear Acoustic echo **

**cancellation **

The loudspeakers for hands-free telephone or teleconferencing are usually small and cheap, so the loudspeaker will be saturated at high level speech. When the saturation effect happens, the loudspeaker is not linear any more [1]. The residual error using only linear acoustic echo cancellation is very large. We will discuss the nonlinear acoustic echo cancellation to overcome this question.

To some loudspeakers, the nonlinear effects have memory. If using memoryless structures to model that, the cancellations don’t eliminate nonlinear echo perfect. The memory structures for canceling the memory echo are complex in general, i.e. Volterra model. As shown below, we will introduce and compare the several memoryless and memory structures in section 2.1.

However, the major drawback of nonlinear models lies in slow convergence rate and high computation complexity. In section 2.2, the algorithm to improve convergence rate will also be introduced.

**2.1 Nonlinear AEC structure **

**2.1.1 Memoryless nonlinear AEC **

**z Hammerstein structure **

In the section, we focus on the case that the nonlinearity in the echo can be considered to be memoryless. The Hammerstein structure [5], a famous memoryless nonlinearity model is a cascade of a memoryless polynomial filter and a FIR filter. As shown as Fig 2.1.

Fig 2.1 Hammerstein structure

*In Fig 2.1, the output of K-order Hammerstein model *

*z k*

### ( )

can be expressed by1 0

### ( )

*M*

_{l}### [ (

### )]

*l*

*z k*

*h u k*

*l*

−
=
### =

### ∑

### −

_{ (2.1) }

where the *u k is polynomial filter output and M is FIR memory length *( )
1

### ( )

*K*

### [ ( )]

*i*

*i*

*i*

*u k*

*a x k*

=
### =

### ∑

_{ (2.2) }

**2.1.2 Memory nonlinear AEC **

**[A] Volterra structure **

(1)
*h* *h*(2) ( )*K*
*h*
(1)_{( )}
*z* *k* (2)_{( )}
*z* *k* *z*( )*K* ( )*k*

Fig 2.2 Volterra structure

As shown in Fig 2.2, another common approach to model the nonlinear behavior of loudspeakers is given by Volterra [2]-[3]. In the following we assume the unknown echo path, i.e. the cascade of nonlinear loudspeaker and room impulse response, can

be expressed by ( ) 1

### ( )

*K*

*i*

### ( )

*i*

*z k*

*z*

*k*

=
### =

### ∑

_{ (2.3) }

*where the I/O relation of the p’th order Volterra kernel with finite memory length*

*N*

_{p}yields ,1 ,2 , ,1 ,2 ,1 , , 1 1 1 1 ( ) , ,... , 0 1

### ( )

*p*

*p*

*p*

### (

### )

*p*

*p*

*p p*

*p*

*p*

*p*

*p p*

*p p*

*N*

*N*

*N*

_{p}*p*

*l*

*l*

*l*

*p i*

*l*

*l*

*l*

*l*

*l*

*i*

*z*

*k*

*h*

*x k*

*l*

−
− − −
= = = =
### =

### ∑ ∑

### ∑

### ∏

### −

_{ (2.4) }

**[B] Wiener Model**

In addition to Volterra model, the Wiener model can model memory loudspeaker [17], it consists of two parts, a cascade of a FIR filter and a memoryless polynomial filter, as shown in Fig 2.3.

∑

Room impulse Response

**+**

**_**

Noise
*v(k)*Far-end

*speech x(k)*Desired speech

*y(k)*Residual

*error e(k)*

*z(k)*

*c*

*h*( )

*h*

*x k*

### …

( ) ( )^2 ( )^K### …

1*a*

*a*

_{2}

*aK*∑

*The output of K-order Wiener model *

*z k*

### ( )

can be expressed by 1### ( )

*K*

### [ ( )]

*i*

*i*

*h*

*i*

*z k*

*a x k*

=
### =

### ∑

_{ (2.5) }where the

*x k is FIR output and M is FIR memory length*( )

_{h}1

### ( )

*M*

### (

### )

*h*

*l*

*l*

*x k*

*h x k*

*l*

=
### =

### ∑

### −

_{ (2.6) }

**[C] Wiener and Hammerstein Model **

We have already introduced the Wiener structure and the Hammerstein structure, Bershad [18] proposed the combination of the two structures, it cascades a FIR filter, a memoryless polynomial filter and a FIR filter.

( )
*h*
*x k*
1
*a* *a*2 *aK*
1
*h*
2
*h*

As shown in Fig 2.4, the

*x k*

*h*

### ( )

denote output of first FIR filter,*M and*1

*M*2

are memory length of *h and*1 *h , respectively. The ( )*2 *u k denote the output of *

memoryless polynomial filter with *th*

*K* -order.
1
1
1
0

### ( )

### (

### )

*M*

*h*

*l*

*l*

*x k*

*h x k*

*l*

−
=
### =

### ∑

### −

1### ( )

### [ ( )]

*K*

*i*

*i*

*h*

*i*

*u k*

*a x k*

=
### =

### ∑

_{ }

Thus the output of Wiener-Hammerstein structure can be expressed as:

2 2 1 0

### ( )

### (

### )

*M*

*l*

*l*

*z k*

*h u k l*

−
=
### =

### ∑

### −

We have already introduced general nonlinear structures in this section, we can
summarize and compare those structures. (a) The advantage of both Wiener and
Hammerstein models is fewer parameters are needed, the disadvantage is low
convergence rate because the parameters of nonlinear and linear (i.e. *a*and*h*) are

dependent. (b) The advantage of Volterra model is that it can care all terms of distortion causes by nonlinear loudspeaker, thus the performance of Volterra model is the best to the other models. The disadvantage is that the computational complexity is most which causes low convergence rate. (c) All nonlinear models can be considered as a particular subclass of Volterra model.

In this thesis, we will focus on Hammerstein and Volterra models, which the main drawback is low convergence rate. To overcome low convergence rate, many work has been proposed, for example, input decorrelation[21], Orthogonal polynomial-basis [22-23], step size control [12],[19], and so on. We will introduce those approaches in next section.

**2.2 Convergence rate speed-up algorithms For NAEC **

To accelerate convergence speed, there have been algorithm such as input decorrelation[21],orthogonal polynomial basis [22], and step sized control [12],[19]. We will discuss the first two algorithms here. In Chapter3, we will introduce step size control approaches and propose new step size approach for NAEC.

**2.2.1 Input Signal decorrelation **

In the field of acoustic echo cancellation, such undesired signal components are removed by adaptive filtering. However, the adaptation performance of the LMS algorithm suffers form slow convergence if the input signal is strongly correlated.

A way to overcome this problem is first decorrlate the input signal, and then uses the decorrelated signal as excitation for the adaptation of the echo canceller.

Kuech [21] proposed an efficient configuration of decorrelation filters for use within nonlinear AEC is derived for second-order Volterra filter, it assumed that the unknown echo can be modeled by a finite-length second-order Volterra filter. It can be shown as follows:

(1)_{( )}
*k*
**h** **h**(2)( )*k*
(1)

**c**

_{c}

(2)
(1)_{c}

_{( )}

*k*

**x**

**x**(2)( )

*k*'( )

*y k*(1)

_{( )}

*z*

*k*(2)

_{( )}

*z*

*k*( )

*u k*

*u kr*( )

*n*

*b*

Fig 2.5 Second-order Volterra with decorrelation filter An optimum decorrelation requires a signal in Kuech [21]

0
( ) *AR* ( )
*K*
*n*
*n*
*u k* *b x k* *n*
=
=

### ∑

−where *KAR* denotes AR (autoregressive) random process order, and
0
,
1
1
*n* *AR n* *AR*
*b*
*b* *b* *n* *K*
=
= − ∀ ≤ ≤

where *b is used for the following orthogonality relations hold for ( ) _{n}*

*u k and its*

produces ( )*u k , respectively : r*

### {

### }

### {

### }

### {

### }

### {

### }

( ) ( ) 0 ( ) ( ) 0 ( ) ( ) 0 0 ( ) ( ) 0 0 0*r*

*r*

*s*

*r*

*r*

*E u k*

*a u k*

*r*

*E u k*

*a u k*

*r*

*s*

*E u k*

*a u k*

*a*

*E u k*

*a u k*

*a*

*r*− = ∀ − = ∀ ≠ − = ∀ ≠ − = ∀ ≠ ∧ ≠

Here, the adaptive equations by means of a joint normalize LMS algorithm read:
1 2 1 2 1
(1) (1)
(2) (2)
, , 2
( 1) ( ) ( ) ( )
( 1) ( ) ( ) ( )
*l* *l*
*l l* *l l* *l*
*h* *k* *h* *k* *e k u k* *l*
*h* *k* *h* *k* *e k u k* *l*
μ
μ
+ = + −
+ = + −

where the *u k denotes input of quadratic kernel r*( )

( ) ( ) ( )

*r*

**2.2.2 Orthogonal polynomial-basis **

In [22-23], G. Y. Jiang and Kuech proposed an orthogonal polynomial adaptive

filter to accelerate the convergence of the polynomial model. In general, the input
signals of each channel are not mutually orthogonal, i.e. { ( ) ( )} 0,*i* *j*

*E x k x k* ≠ ∀ ≠*i* *j*.

Thus, a new set of mutually orthogonal input signal has been introduced [23]:

1 1 , 1

### ( )

### ( )

### ( )

*u*

### ( )

*u*

*i*

### ( )

*u*

*u i*

*i*

*p k*

*x k*

*p k*

*x k*

*q x k*

−
=
### =

### =

### +

### ∑

for *1 u K*< < . The orthogonalization coefficients *q can be determined using the u i*_{,}

Gram-Schmidt orthogonalization processing.

In addition to those approaches, step size control is also usually used to overcome the problem of low convergence rate. In Chapter 3, we will introduce several conventional step size algorithms and proposed new step size control approach.

**Chapter 3 **

**Optimum Step Size For Nonlinear **

**AEC **

In addition to [21-23], step size control is also usually used to overcome the problem of low convergence rate.

In this Chapter, we will focus on step size control in Volterra structure. We know tradeoff between fast convergence rate and small residual error power. In LMS algorithm, normally, large step size gives a faster convergence rate but large residual error power. Thus the optimum step size means that providing fast convergence rate and small residual error power at the same time.

In following sections, the conventional step size control is introduced in section 3.1. In Section 3.2, we will derive the optimum time-& tap-variant step-size LMS (OTTLMS) algorithm which is derived by introducing an optimality criterion which is given by MMSE between coefficients errors of real kernel and adaptive coefficients. Its practical implementations are proposed in Section 3.3. The echo path change and double talk conditions are considered in Section 3.4.

**3.1 Conventional step size adjustment **

0 0.5 1 1.5 2 2.5 3
x 104
0
5
10
15
20
25
30
35
iteration
ER
L
E(
d
B)
Large step size

Small step sizee

Fig 3.1 Trade off of LMS algorithm

In Fig 3.1, the evaluate of echo return loss enhancement; we can obtain that due
to tradeoff between fast convergence rate and small residual error in traditional
constant step size, various approaches employing varying step-size in linear echo
cancellation have been proposed, including time-varying [9], tap-varying [10]**, and **
both time- & tap varying [11]. In this thesis, the word “time-varying” represents all
taps use identical step size which is time variant. Similarly, the word “tap-varying”
means each tap has individual and tine-invariant step size, and the word “time- &
tap-varying” means each tap has its individual time-variant step size.

In addition to linear acoustic echo cancellation, Kuech proposed a time-& tap –varying approach in second- order Volterra structure [12] for nonlinear echo cancellation field.

These typical approaches of step size adjustment for linear AEC are summarized below:

**3.1.1 Linear AEC **

**[A] Variable step size LMS algorithm **

The VSLMS approach [9] employs a time-varying (time-variant) step-size which is controlled by the power of the error signal. This is based on using large step-size when the AEC filter coefficient is far from the optimal solution, thus speeding up the convergence rate. Similarity, when the AEC filter coefficient is near the optimum solution, small step-size is used to achieve lower MSE, thus achieving better overall performance. The variable step size LMS algorithm works as follows.

2

'(*k* 1) '( )*k* *e k*( ) with 0 1, 0

μ + =αμ +γ < <α γ >

where the time-variant step size is controlled by

max max
min min
if '( 1)
( 1) if '( 1)
'( 1) otherwise
*k*
*k* *k*
*k*
μ μ μ
μ μ μ μ
μ
+ >
⎧
⎪
+ =_{⎨} + >
⎪ _{+}
⎩

The motivation is that a large residual error i.e., *e k will cause the large *( )
step-size to provide faster tracking the echo path. Similarity, when the residual error is
small, the step size is decreased to yield smaller residual error.

The constant μ_{max} is chosen to ensure that the MSE remains bounded and μ_{min}
is chosen to provide a minimum level of tracking ability.

**[B] Exponentially weighted step size NLMS algorithm **

The exponentially weighted step-size NLMS (ESNLMS) algorithm [10] uses a different step-size (tap-varying) for each tap of an adaptive filter. These step-sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large echo path variation in large steps, and coefficients with small echo path variation in small steps. The ESNLMS algorithm is expressed as:

2
2
( )
( 1) ( ) ( )
( )
*ESNLMS*
*e k*
*k* *k* *k*
*k*
+ = +
**h** **h** **U** **x**
**x**

where **U*** _{ESNLMS}* is the diagonal step-size matrix to account for the tap-variant
step-sizes:
1
2
0
0

*ESNLMS*

*M*μ μ μ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

**U**where

_{0}

*l*for 1, ,

*l*

*l*

*M*

μ =μ γ = … and γ is the room exponential attenuation factor

(0< < . γ 1)

The elements μ* _{l}* are time-invariant and decrease exponentially from μ

_{1}to

*M*

μ with the same ratio γ that depends on the decay rate of the real room impulse response

**c**

.
**[C] Optimum time-& tap-variant step size algorithm **

The OTTLMS approach [11] is employed to minimize each tap coefficient error
variance at each iteration step (i.e. ( )*g k ). The coefficient error is the difference l*

between real kernel and adaptive coefficients. The optimum step size can be obtained by setting the derivative of tap coefficient error variance formula with respect to

( )

*l* *k*

μ equal to zero. The OTTLMS algorithm is expressed as:

(*k*+ =1) ( )*k* + * _{OTTLMS}*( ) ( ) ( )

*k e k*

*k*

**h**

**h**

**U**

**x**where 1, , ( ) ( ) ( )

### 0

### 0

*OTTLMS*

*OTTLMS*

*M OTTLMS*

*k*

*k*

*u*

*k*μ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

**U**, , 2 2 2 1 ( ) ( ) 2 ( ) ( )

*l*

*l OTTLMS*

*M*

*x*

*l*

*x*

*l*

*n*

*l*

*g k*

*k*

*g k*

*g k*μ σ σ σ = = +

### ∑

+ , 2 ( 1) (1 ( )) ( )*l*

*x*

*OTTLMS*

*l*

*g k*+ = −σ μ

*k g k*

As the optimum step-size needs to know the room impulse response to evaluate coefficient error, it is not accessible in general, thus the author employed the recursive relation of second moment coefficient error and used the room impulse response exponential decay model for practical implementation [11].

**3.1.2 Nonlinear AEC **

**[A] Proportionate NLMS for second-order Volterra filters **

For acoustic echo cancellation, it is reasonable to assume that the echo path is sparse, i.e., many coefficients are zeros, therefore only the nonzero active coefficients

need to be identified (updated). This is the idea behind the proportionate NLMS (PNLMS) [20] algorithm. It exploits the sparseness of such impulse response to achieve significantly faster adaptation than NLMS.

Kuech [19] proposed an extension of the proportionate NLMS to second-order
Volterra filters. It assumes that the unknown echo can be modeled by a finite-length
second-order Volterra filter. The nonlinear echo cancellation system model is
summarized in Figure 3.2; the microphone signal ( )*y k is composed of echo *

signal '( )*y k , the noise signal ( )n k accounting for background noise, and the speech *

signal of a near-end talker ( )*s k . *

(1)_{( )}
*k*
**h** ( 2)_{( )}
*k*
**h**

**c**

(1) **c**

(2)
(1)_{( )}

*k*

**x**(2)

_{( )}

*k*

**x**(1)

_{( )}

*k*

**x**

**(2)**

_{x}_{( )}

*'( )*

_{k}*y k*(1)

_{( )}

*z*

*k*(2)

_{( )}

*z*

*k*

Fig 3.2 Second-order Volterra acoustic echo canceller

By (2.3) (2.4), the input/output relation of a second-order Volterra filter is given by

2 2
1 2
1 2 1
2
(1) (2)
1 1
1
(1) (2)
, 1 2
0 0
(1) (1) (2) (2)
1 1
(1) (1) (2) (2)
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
[ ] ( ) ( ) [ ] ( ) ( )
*N* *N*
*M*
*l* *l l*
*l* *l* *l* *l*
*L*
*M*
*l* *l* *j* *j*
*l* *j*
*T* *T*
*z k* *z* *k* *z* *k*
*h* *k x k* *l* *h* *k x k* *l x k* *l*
*h* *k x* *k* *h* *k x* *k*
*k* *k* *k* *k*
− −
−
= = =
= =
= +
= − + − −
+
+

### ∑

### ∑ ∑

### ∑

### ∑

**h**

**x**

**h**

**x**

2 (1) (1) (1) (1) 1 2 (2) (2) (2) (2) 1 2 2 2 2

### ( ) [

### ( ),

### ( ),...

### ( )]

### [ ( ), (

### 1),... (

### 1)]

### ( ) [

### ( ),

### ( ),...

### ( )]

### [ ( ), ( ) (

### 1),... (

### 1)]

*T*

*M*

*T*

*T*

*L*

*T*

*k*

*x*

*k x*

*k*

*x*

*k*

*x k x k*

*x k*

*M*

*k*

*x*

*k x*

*k*

*x*

*k*

*x k x k x k*

*x k*

*N*

### =

### =

### −

### −

### +

### =

### =

### −

### −

### +

**x**

**x**

2
2 2
(1) (1) (1) (1)
1 2
(2) (1) (1) (1)
1 2
(2) (2) (2)
0,0 0,1 1, 1
### ( ) [

### ( ),

### ( ),...

### ( )]

### ( ) [

### ( ),

### ( ),...

### ( )]

### [

### ( ),

### ( ),...

### ( )]

*T*

*M*

*T*

*L*

*T*

*N*

*N*

*k*

*h*

*k h*

*k*

*h*

*k*

*k*

*h*

*k h*

*k*

*h*

*k*

*h*

*k h*

*k*

*h*

_{−}

_{−}

*k*

### =

### =

### =

**h**

**h**

*M andN represent memory lengths of linear and quadratic kernel, the lengths *_{2}

of (2)_{( )}

*k*

**x** and (2)_{( )}

*k*

**h** are both *L*2 =*N N*2( 2+1) / 2

The PNLMS algorithm updates each coefficient of the filter independently of the others by adjusting the adaptation step-size in proportion to the estimated filter coefficient. Thus the extension of the proportionate NLMS to second-order Volterra filters is summarized:

### {

### }

### {

2### }

( ) ( ) ( ) ( ) ( ) ( ) ( ) (1) (1) (1) 1 (2) (2) (2) 1 ˆ ( ) ( ) ( 1) ( ) [ ] ( ) ( ) ( ), , ( ) ( ) ( ), , ( )*i*

*i*

*i*

*i*

*i*

*i*

*T*

*i*

*i*

*M*

*L*

*k*

*e k*

*k*

*k*

*k*

*k*

*diag p*

*k*

*p*

*k*

*k*

*diag p*

*k*

*p*

*k*μ + = + = =

**P**

**x**

**h**

**h**

**x**

**P**

**x**

**P**

**P**…… …… (1) (1) (1) 1 (2) (2) (2) 2

_{1}( ) 1 ( ) (1 ) 2 2 ( ) ( ) 1 ( ) (1 ) 2 2 ( )

*l*

*l*

*l*

*l*

*h*

*k*

*p*

*k*

*M*

*k*

*h*

*k*

*p*

*k*

*L*

*k*α

_{α}α

_{α}− = + + − = + +

**h**

**h**

For *i*∈

### { }

1, 2 , ˆ ( )*e k is used to avoid unstable behavior*[19], α is a scalar.

_{i}The step-sizes are calculated from the last estimate of the filter coefficients so
that a large coefficient receives a large step-size, it is intuitive that if the someone tap
of adaptive filter coefficient (i.e. ( )*i* _{( )}

*l*

*h* *k ) is large value, the coefficient error of this tap *

should be large, thus if we give large step size to update, it may be increase the convergence rate. Hence, PNLMS converges much faster than NLMS.

We observe that both ESNLMS and PNLMS algorithms rely on the concept of using large step-size for large tap. It is quite intuitive that large tap will produce large estimate tap coefficient error and should use a large step-size for fast tracking. This is appropriate at the stage of initial adaptation.

**[B] Optimum step-size for adaptive second-order Volterra filters **

The approach [A] is the direct concept of using adaptive filter coefficients to control step size. In approach [B], Kuech [12] derived the optimum step size theoretically and proposed approximated model to practical application.

The concept of the optimum step size in Kuech approach [12] is identical to OTTLMS approach which is derived by introducing minimum MSE between the coefficient errors of Volterra filter and real echo path.

The desired optimum step sizes for linear and quadratic kernel are shown below respectively

### {

### }

### {

### }

(1) 2 (1) , 2 2 2 [ ( )] ( ) ( ) ( ) ( )*l*

*l opt*

*E v*

*k*

*k*

*E*

*k*

*n k*

*s k*μ ε = + + , (2) 2 (2) , 2 2 2 {[ ( )] } ( ) { ( ) ( ) ( )}

*j*

*j opt*

*E v*

*k*

*k*

*E*

*k*

*n k*

*s k*

### μ

### ε

= + + (3.1)*The linear and quadratic kernel coefficient error in time k can be defined by*

(1)

_{( )}

(1)_{( )}

(1)
*k*

### =

*k*

### −

**v**

**h**

**c**

(3.2)
(2)_{( )}

(2)_{( )}

(2)
*k*

### =

*k*

### −

**v**

**h**

**c**

(3.3)
where
(1) (1) (1) (1)
1 2
(2) (2) (2) (2)
1 2
### ( ) [

### ( ),

### ( ),...

### ( )]

### ( ) [

### ( ),

### ( ),...

### ( )]

*T*

*M*

*T*

*k*

*v*

*k v*

*k*

*v*

*k*

*k*

*v*

*k v*

*k*

*v*

*k*

### =

### =

**v**

**v**

The residual echo comes from filter coefficient errors of the linear and quadratic kernel (1) (2) (1) (1) (2) (2)

### ( )

### ( )

### ( )

*T*

### ( )

### ( )

*T*

### ( )

### ( )

*k*

*k*

*k*

*k*

*k*

*k*

*k*

### ε

### =

### ε

### +

### ε

### =

**x**

**v**

### +

**x**

**v**

For a better understanding of the optimum step size, in [12], the author introduced the auxiliary step size factors:

2 2
2 2 2
{ ( ) ( )}
( )
{ ( ) ( ) ( )}
*dt*
*E* *k* *n k*
*k*
*E* *k* *n k* *s k*
ε
μ
ε
+
=
+ + ,
2
2 2

### { ( )}

### ( )

### { ( )

### ( )}

*bn*

*E*

*k*

*k*

*E*

*k*

*n k*

### ε

### μ

### ε

### =

### +

( ) 2 2 {[ ( )] } ( ) {1, 2} { ( )}*i*

*i*

*E*

*k*

*k*

*i*

*E*

*k*ε ε μ ε = ∈ , (1) 2 (1) (1) 2 (1) 1 {[ ( )] } ( ) {[ ( )] } { ( )}

*l*

*l*

*M*

*l*

*j*

*l*

*E v*

*k*

*k*

*E v*

*k*

*E x*

*k*α = =

### ∑

### {

### } {

### }

2 (2) 2 (2) 2 2 (2) (2) 1 {[ ( )] } ( ) ( ) ( )*j*

*j*

*L*

*j*

*j*

*j*

*E v*

*k*

*k*

*E*

*v*

*k*

*E*

*x*

*k*α = = ⎡ ⎤ ⎡ ⎤ ⎣ ⎦ ⎣ ⎦

### ∑

The above definitions of step-size factors are used to factorize the optimum step sizes
according to
1
2
(1) (1)
,
(2) (2)
,
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
*l opt* *dt* *bn* *l*
*j opt* *dt* *bn* *j*
*k* *k* *k* *k* *k*
*k* *k* *k* *k* *k*
ε
ε
μ μ μ μ α
μ μ μ μ α
=
=
As the parameters _{{[} ( )*i* _{( )] }}2

*E* ε *k* ,*E v*{[ * _{l}*(1)( )] }

*k*2 and

*E v*{[ (2)

*( )] }*

_{j}*k*2 are not accessible in

general, so the author introduce models for estimating those parameters :

1. The ( ) 2

{[ *i* ( )] }

*E* ε *k* are proportionate to the adaptive filter output of linear and

quadratic kernel, respectively

( ) 2 ( )

{[ *i* ( )] } ( )[ ( ) *i* ( ) ]

*i* *i*

*E* ε *k* ≈γ *k* δ β+ *k z* *k* *i*∈{1, 2}

2. The second-moment of coefficient error is proportionate to the magnitude of the corresponding adaptive coefficient

### {

### }

### {

### }

(1) 2 (1) 1 1 1 (2) 2 (2) 2 2 2 [ ( )] ( )[ ( ) ( ) ] [ ( )] ( )[ ( ) ( ) ]*l*

*l*

*j*

*j*

*E v*

*k*

*k*

*k h*

*k*

*E v*

*k*

*k*

*k h*

*k*γ ρ λ γ ρ λ ≈ + ≈ +

For comparison with [11], our work is extending [11] to nonlinear system, second-order Volterra. We employed the advantage of (second moment of coefficient error) recursive relation in [11] and proposed a different practical implementation fromKuech approach [12].

Next, in Section 3.2, we will derive the optimum time-& tap-variant step-size LMS (OTTLMS) algorithm which is derived by introducing an optimality criterion which is given by MMSE between coefficients errors of real kernel and adaptive coefficients. The practical implementation is proposed in section 3.3. The echo path change and double talk conditions are considered in section 3.4.

**3.2 Derivation of optimum time-& tap-variant step-size LMS **

**(OTTLMS) algorithm **

In this section, we will extend [11] to second-order Volterra filter by getting recursive relation of coefficient errors. By this extension, we not only speed up the convergence rate in linear acoustics echo problem, but also in nonlinear echo cancellation. There our notations are identical to section 3.1 (see Fig 3.2)

*We want to find out the step size in time k which can minimize each tap *
*coefficient error variance in time k+1 i.e. MSE for each iteration step. Hence, we use *
diagonal matrixes (1)_{( )}

*k*

**U** and (2)_{( )}

*k*

**U** to replace the step size of conventional
LMS algorithm [13], thus it corresponding LMS algorithm can be rewritten as

(1)

_{(}

_{1)}

(1)_{( )}

(1)_{( ) ( )}

(1)_{( )}

*k*

### + =

*k*

### +

*k e k*

*k*

**h**

**h**

**U**

**x**

(3.4)
(2)_{(}

_{1)}

(2)_{( )}

(2)_{( ) ( )}

(2)_{( )}

*k*

### + =

*k*

### +

*k e k*

*k*

**h**

**h**

**U**

**x**

(3.5)
(1) (1) (2) (2)

### ( )

### ( )

*T*

### ( )

### ( )

*T*

### ( )

### ( )

*e k*

### =

*y k*

### −

**h**

*k*

**x**

*k*

### −

**h**

*k*

**x**

*k*

(3.6)
The (1)_{( )}

*k*

**U**, (2)

_{( )}

*k*

**U** denote linear and quadratic step size matrices our interest:

(1)
1
(1)
(1)
( ) 0
( )
0 * _{M}* ( )

*k*

*k*

*k*

### μ

### μ

⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦**U**… 2 (2) 1 (2) (2)

### ( )

### 0

### ( )

### 0

_{L}### ( )

*k*

*k*

*k*

### μ

### μ

### ⎡

### ⎤

### ⎢

### ⎥

### = ⎢

### ⎥

### ⎢

### ⎥

### ⎣

### ⎦

**U**

### …

where the*th*

*l element of step size matrices is chosen to minimize * *th*

*l coefficient *

error variance in time *k*+1. The criterion is summarized as

### {

### }

( ) ( ) ( ) ( ) 2 ( )### ( )

### arg min

### [

### (

### 1

### )

### ]

*i*

*l*

*k*

*i*

*i*

*i*

*l*

*k*

*E h*

*l*

*k*

*c*

*l*μ

### μ

### =

### +

### −

where *i*∈1, 2, and it means the linear and quadratic kernel, respectively.
By (3.2), (3.3), (3.6), we get recursive relation of linear kernel:

### [

### ]

### {

### }

(1) (1) (1) (1) (1) (1) (1)### (

### 1)

### ( )

### ( ) ( )

### ( )

### ( )

### ( )

### '( )

### ( )

*T*

### ( )

### ( )

### ( )

### ( )

*k*

*k*

*k e k*

*k*

*k*

*k*

*y k*

*k*

*k*

*n k*

*s k*

*k*

### + =

### +

### =

### +

### −

### +

### +

### +

**h**

**h**

**U**

**x**

**h**

**U**

**v**

**c x**

**x**

(3.7)
where
(1) (2)
### ( ) [

*T*

### ( )

*T*

### ( )]

*T*

*k*

### =

*k*

*k*

**x**

**x**

**x**

(1) (2)
### ( ) [

*T*

### ( )

*T*

### ( )]

*T*

*k*

### =

*k*

*k*

**v**

**v**

**v**

Using (3.2) and (3.7), we may rewrite the linear kernel coefficient error (1)_{( )}

*k*
**v**

### [

### ]

(1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (2) (2) (1) (1) (1) (1) ( 1) ( ) ( ) ( ) ( ) ( ) ( ) ( ) [ ( ) ( ) ( )] ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )*T*

*T*

*T*

*k*

*k*

*k*

*k*

*k*

*n k*

*s k*

*k*

*k*

*k*

*k*

*k*

*k*

*k*

*k*

*n k*

*k*

*s k*

*k*+ = − + + = − − + +

**v**

**v**

**U v**

**x**

**x**

**U**

**x**

**I U**

**x**

**x**

**v**

**U x**

**x**

**v**

**U**

**x**

**U**

**x**(3.8) Similar to processing in [11], we can derive the autocorrelation matrix of the linear kernel coefficient errors as follows, and by the direct-average method [13]

(1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (2) (2) (2) (2) (

### (

### 1) [

### 2

### ( )

### ]

### ( )

### [

### ( )] { ( )

### ( )

### ( )

### ( )

### ( )

### ( )}[

### ( )]

### [

### ( )] { ( )

### ( )

### ( )

### ( )

### ( )

*T*

*T*

*T*

*T*

*T*

*T*

*k*

*k*

*k*

*k E*

*k*

*k*

*k*

*k*

*k*

*k*

*k*

*k E*

*k*

*k*

*k*

*k*

*k*

### + ≈ −

### +

### +

**v**

**x**

**v**

**R**

**I**

**U**

**R (k) R**

**U**

**x**

**x**

**v**

**v**

**x**

**x**

**U**

**U**

**x**

**x**

**v**

**v**

**x**

**x**

(1) (1)
1) (1)
(1) 2 2 (1)
### ( )}[

### ( )]

### [

### ( )]

### ( )

### ( )[

### ( )]

*T*

*T*

*T*

*n*

*s*

*k*

*k*

*k*

### σ

*k*

### σ

*k*

*k*

### +

_{x}### +

_{x}**U**

**U**

**R**

**R**

**U**

(3.9)
From formula (3.9), the*E*

### { }

. denotes expectation. By assumption of the mutualindependence of

*x k*

### ( )

,*n k*( )and ( )

*s k , and probability density function of*

*x k*( ) is

an even function, as then

### {

3_{( )}

### }

_{0}

*E x k* = . Thus cross products

terms_{[} (1)_{( )} (1)_{( )} (1)*T*_{( )]} (1)_{( )}
*k* *k* *k* *k*
−
**I U** **x** **x** **v** , (1) (1)_{( )} (2)*T*_{( )} (2)_{( )}
*k* *k* *k*
**U x** **x** **v** , (1) _{( )} (1)_{( )}
*n k* *k*
**U** **x** and
(1)

_{( )}

(1)_{( )}

*s k*

*k*

**U**

**x**

in formula (3.9) could be neglected.
The *l*’th diagonal term of autocorrelation matrix, denoting *l*’th mean-square of

linear coefficient error, can be written as:

### {

### }

4 6 2 (1) (1) (1) 2 (1) (1)2 (1) (1)2 4 (1) 1, (1)2 (2) (1)2 6 (2) 0,### (

### 1)

### (

### 1)

### (1 2

### )

### ( )

### ( )

### ( )

### ( )

### ( )

### ( )

### ( )

### ( )

### ( )

*l*

*l*

*l*

*x*

*l*

*M*

*l*

_{x}*l*

*l*

*x*

*p*

*p*

*p l*

*l*

_{x}*j*

*l*

*x*

*q*

*q*

*g*

*k*

*E*

*v*

*k*

*g*

*k*

*k m g*

*k*

*k*

*g*

*k*

*k m g*

*k*

*k*

*g*

*k*

### μ σ

### μ

### μ

### σ

### μ

### μ

### σ

= ≠ =### ⎡

### ⎤

### + =

_{⎣}

### +

_{⎦}

### ≈ −

### +

### +

### +

### +

### ∑

2 (1)2 2 2 (1)2 2 2### ( )

### ( )

*L*

*q j*

*l*

*k*

*x*

*n*

*l*

*k*

*s*

*x*

### μ

### σ σ

### μ

### σ σ

≠### +

### +

### ∑

(3.10) where 2*x*

σ is the far-end input variance. 4

### {

### }

4_{( )}
*x*
*m* =*E x k* and 6

### {

### }

6_{( )}

*x*

*m*=

*E x k*

denoting the _{4}th_{ and }_{6 moment of }th (1)_{( )}

*x* *k* . As the length of linear and quadratic

*kernel in Volterra M and * *L is sufficiently large, we can approximate *_{2} 4

4
*x*
*x*
*m* ≈σ
σ
≈

2 (1) (1) 2 (1) (1)2 2 2 (1) 4 (2) 2 2 1 0

### (

### 1) (1 2

### ( )

### )

### ( )

### ( )

### ( )

### ( )

*l*

*l*

*x*

*l*

*L*

*M*

*l*

*x*

*x*

*l*

*x*

*j*

*n*

*s*

*l*

*j*

*g*

*k*

*k*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

### μ

### σ

### μ

### σ σ

### σ

### σ

### σ

= =### + ≈ −

### ⎡

### ⎤

### +

_{⎢}

### +

### +

### +

_{⎥}

### ⎣

### ∑

### ∑

### ⎦

(3.11) The optimum time-& tap variant step-size can be obtain by taking derivative of Eq (3.11) with respect to (1)_{( )}

*l* *k*

μ and setting the result equal to zero.

(1) 2 (1) 2 (1) (1) 2 (1) (1) 2 2 ( ) 1 (1) 2 2 (1) 6 (2) 1

### (

### 1)

### 2

### ( ) 2

### ( )

### ( ) 2

### ( )

### 2

### ( )

### 2

### ( )

### ( ) 0

*l*

*M*

*l*

*x*

*l*

*l*

*x*

*l*

*l*

*x*

*n*

*k*

*l*

*L*

*l*

*x*

*s*

*l*

*x*

*j*

*j*

*g*

*k*

*g*

*k*

*k*

*g*

*k*

*k*

*k*

*k*

*g*

*k*

μ ### σ

### μ

### σ

### μ

### σ σ

### μ

### σ σ

### μ

### σ

= =### ∇

### + = −

### +

### +

### +

### +

### ∑

### ∑

Thus we can get the optimum time-&tap-variant step-size of linear kernel

2 (1) (1) , 2 (1) 2 2 4 (2) 1 1

### ( )

### ( )

### ( )

### ( )

*l*

*l OTTLMS*

*M*

*L*

*x*

*l*

*n*

*s*

*x*

*j*

*l*

*l*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

### μ

### σ

### σ

### σ

### σ

= =### =

### +

### +

### +

### ∑

### ∑

(3.12) Analogously to linear kernel, we can get the optimal step size of quadratic kernel is given by 2 (2) (2) , 4 (2) 2 2 2 (1) 1 1### ( )

### ( )

### ( )

### ( )

*j*

*j OTTLMS*

*L*

*M*

*x*

*j*

*n*

*s*

*x*

*l*

*j*

*l*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

### μ

### σ

### σ

### σ

### σ

= =### =

### +

### +

### +

### ∑

### ∑

(3.13) From the result of (3.12) and (3.13), we can obtain that the optimum step sizes are direct proportion to the coefficient error variance. If the coefficient error variance large (i.e. initial state), the optimum step sizes are large; and if the coefficient error variance small, the optimum steps are become small to get small residual error, the result fits our intuition.The numerator of (3.12) and (3.13) mean that the second moment coefficient
error of linear and quadratic kernel, respectively (i.e. (1)_{( )}

### {

(1)_{( )}2

### }

*l* *l*

*g* *k* *E* ⎡_{⎣}*v* *k* ⎤_{⎦} ,

### {

2### }

(2)_{( )}(2)

_{( )}

*j* *j*

*g* *k* *E* ⎡_{⎣}*v* *j* ⎤_{⎦ ), and the denominator of (3.12) and (3.13) mean the }

summation of residual error power and near-end speech power, i.e.

### {

2### }

2 (1) (1) 1 ( ) ( )*M*

*x*

*l*

*l*

*g*

*k*

*E*

*k*

### σ

### ε

= ⎡ ⎤ ⎣ ⎦### ∑

### {

### }

2_{2}4 (2) (2) 1 ( ) ( )

*L*

*x*

*j*

*j*

*g*

*k*

*E*

*k*

### σ

### ε

= ⎡ ⎤ ⎣ ⎦### ∑

Thus we can find that the results of (3.12) and (3.13) fit the work (3.1) in [12].

Similar to processing in [11], we substitute the optimum time-&tap-variant step size of linear and quadratic kernel back to (3.10), thus we can get that the relationship mean-square coefficient errors

(1) (1) 2 (1) , (2) (2) 4 (2) ,

### (

### 1) (1

### ( )

### )

### ( )

### (

### 1) (1

### ( )

### )

### ( )

*l*

*l OTTLMS*

*x*

*l*

*j*

*j OTTLMS*

*x*

*j*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

*k*

*g*

*k*

### μ

### σ

### μ

### σ

### + = −

### + = −

(3.14) for*l*=1,....,

*M*,

*j*=1,....,

*L*

_{2}.

We found the results fit the works on tradition AEC [11].

Double talk condition is not considered in this section, we set ( ) 0*s k* = , the

double talk and echo path change conditions will be considered in section 3.5, thus the approximated OTTLMS algorithm for second-order Volterra filter is summarized in Table 3.1:

(1) (1) (2) (2)

### ( )

### ( )

*T*

### ( )

### ( )

*T*

### ( )

### ( )

*e k*

### =

*d k*

### −

**x**

*k*

**h**

*k*

### −

**x**

*k*

**h**

*k*

2
2
(1)
1
(1) (1) (1)
(1)
(2)
1
(2) (2) (2)
(2)
4 (2) 2 2 (1)
1 1
( ) 0
( 1) ( ) ( )
0 ( )
( ) 0
( 1) ( ) ( )
0 ( )
( )
( ) ( )
*M*

*L*

*L*

*M*

*x*

*j*

*n*

*x*

*l*

*j*

*l*

*g*

*k*

*k*

*k*

*k*

*g*

*k*

*g*

*k*

*k*

*k*

*k*

*g*

*k*

*e k*

*g*

*k*

*g*

*k*μ μ μ σ σ σ = = ⎛ ⎞ ⎜ ⎟ + =

_{+ Δ ⎜}

_{⎟}⎜ ⎟ ⎝ ⎠ ⎛ ⎞ ⎜ ⎟ + =

_{+ Δ ⎜}

_{⎟}⎜ ⎟ ⎝ ⎠ Δ = + +

### ∑

### ∑

**h**

**h**

**x**

**h**

**h**

**x**… … (1) (1) 2 (1) , (2) (2) 4 (2) ,

### (

### 1) (1

### ( )

### )

### ( )

### (

### 1) (1

### ( )

### )

### ( )

*l*

*l OTTLMS*

*x*

*l*

*j*

*j OTTLMS*

*x*

*j*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

*k*

*g*

*k*

### μ

### σ

### μ

### σ

### + = −

### + = −

Table 3.1: OTTLMS algorithm

**3.3 Extension to OTTNLMS algorithm **

The above discussions are based on LMS algorithm. However, when the input is
large, the LMS algorithm suffers from a gradient noise amplification problem. In
order to overcome this difficulty, we extend it to the normalized LMS (NLMS)
algorithm. By the approximation [13] of (1)*T*( ) (1)( ) 2

*x*
*k* *k* =*M*σ
**x** **x** and
(2) (2) 4
2
( ) ( )
*T* *T*
*x*
*k* *k* =*L*σ

**x** **x** , the step size of OTTNLMS can be shown to

be (1) 2 4 (1)
, ( ) ( 2 ) , ( )
*l OTTNLMS* *k* *M* *x* *L* *x* *l OTTLMS* *k*
μ = σ + σ μ and (2) 2 4 (2)
, ( ) ( 2 ) , ( )
*j OTTNLMS* *k* *M* *x* *L* *x* *j OTTLMS* *k*
μ = σ + σ μ .

So, we can rewrite (3.12) and (3.13) as:

2 2 4 (1) 2 (1) , 2 (1) 2 4 (2) 1 1

### ( )

### ( )

### ( )

### ( )

*x*

*x*

*l*

*l OTTNLMS*

*M*

*L*

*x*

*l*

*n*

*x*

*j*

*l*

*l*

*M*

*L*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

### σ

### σ

### μ

### σ

### σ

### σ

= =### ⎡

### +

### ⎤

### ⎣

### ⎦

### =

### +

### +

### ∑

### ∑

2 2 4 (2) 2 (2) , 4 (2) 2 2 (1) 1 1

### ( )

### ( )

### ( )

### ( )

*x*

*x*

*j*

*j OTTNLMS*

*L*

*M*

*x*

*j*

*n*

*x*

*l*

*j*

*l*

*M*

*L*

*g*

*k*

*k*

*g*

*k*

*g*

*k*

### σ

### σ

### μ

### σ

### σ

### σ

= =### ⎡

### +

### ⎤

### ⎣

### ⎦

### =

### +

### +

### ∑

### ∑

Similarly, the (3.14) can be rewritten as:

(1) 2 , (1) (1) 2 4 2 (2) 4 , (2) (2) 2 4 2

### ( )

### (

### 1) (1

### )

### ( )

### ( )

### (

### 1) (1

### )

### ( )

*l OTTLMS*

*x*

*l*

*l*

*x*

*x*

*j OTTLMS*

*x*

*j*

*j*

*x*

*x*

*k*

*g*

*k*

*g*

*k*

*M*

*L*

*k*

*g*

*k*

*g*

*k*

*M*

*L*

### μ

### σ

### σ

### σ

### μ

### σ

### σ

### σ

### + = −

### +

### + = −

### +

for*l*=1,....,

*M*,

*j*=1,....,

*L*

_{2}.

Thus the OTTNLMS algorithm for second-order Volterra filter is summarized in Table 3.2: (1) (2) 1 2

### ( )

### ( )

*T*

### ( )

### ( )

*T*

### ( )

### ( )

*e k*

### =

*d k*

### −

**x**

*k*

**h**

*k*

### −

**x**

*k*

**h**

*k*

(1) 2
,
(1) (1)
2 4
2
(2) 4
,
(2) (2)
2 4
2
### ( )

### (

### 1) (1

### )

### ( )

### ( )

### (

### 1) (1

### )

### ( )

*l OTTNLMS*

*x*

*l*

*l*

*x*

*x*

*j OTTNLMS*

*x*

*j*

*j*

*x*

*x*

*k*

*g*

*k*

*g*

*k*

*M*

*L*

*k*

*g*

*k*

*g*

*k*

*M*

*L*

### μ

### σ

### σ

### σ

### μ

### σ

### σ

### σ

### + = −

### +

### + = −

### +

2 (1) 1_{(1)}(1) (1) 2 (1)

_{2}(2) 1

_{(2)}(2) (2) 2 (2)

_{2}( ) 0 ( ) ( 1) ( ) ( ) 0 ( ) ( ) 0 ( ) ( 1) ( ) ( ) 0 ( )

*M*

*L*

*g*

*k*

*k*

*k*

*k*

*k*

*g*

*k*

*g*

*k*

*k*

*k*

*k*

*k*

*g*

*k*μ μ ⎛ ⎞ ⎜ ⎟ + =

_{+ Δ ⎜}

_{⎟}⎜ ⎟ ⎝ ⎠ ⎛ ⎞ ⎜ ⎟ + =

_{+ Δ ⎜}

_{⎟}⎜ ⎟ ⎝ ⎠

**x**

**h**

**h**

**x**

**x**

**h**

**h**

**x**… … 2 2 4 2 4 (2) 2 2 (1) 1 1 ( ) ( ) ( )

*x*

*x*

*L*

*M*

*x*

*j*

*n*

*x*

*l*

*j*

*l*

*M*

*L*

*e k*

*g*

*k*

*g*

*k*σ σ μ σ σ σ = = ⎡ + ⎤ ⎣ ⎦ Δ = + +

### ∑

### ∑

**3.4 Practical implementations of OTTLMS algorithm **

In Section 3.2, we have already derived optimum time-&tap variant step-size for LMS and NLMS algorithm in Table 3.1 and Table 3.2. Here, the OTTLMS and OTTNLMS not only need prior statistics knowledge 2

*x*

σ , 4

*x*

σ , and 2

*n*

σ , but also the prior
knowledge of second moment of coefficient error (1)_{( )}

*l*

*g* *k and g*(2)* _{j}* ( )

*k*. Thus we

must be known the real room impulse response ** _{c and second order kernel caused }**(1)

by nonlinear loudspeaker** _{c . In general case, the echo path }**(2)

**(1)**

_{c and }**(2)**

_{c}_{ are not }

accessible. In section 3.3.1, we propose a model function to estimate those parameters for the application in nonlinear acoustics echo cancellation.

**3.4.1 Exponential models of linear and quadratic kernel **

Unlike the approximation approach of Kuech approach [12] in Section 3.1.2, we
introduced the recursive formula (3.14), thus we only need to know the real envelope
of real echo path (i.e. (1)_{(0)} _{{} (1)_{(0)} (1) 2_{}} (1) 2

*l* *l* *l* *l*

*g* *E h*⎡_{⎣} −*c* ⎤_{⎦} ⎡_{⎣}*c* ⎤_{⎦ ), thus we proposed an }

exponentially models for implementation.

Here, we will assume reasonably the real linear and quadratic kernel** _{c , }**(1)

**(2)**

_{c }can be modeled as an exponentially decaying envelope shown in Fig 3.3, and Fig 3.5. Let the linear and quadratic envelope functions modeled as:

(1) (1) (1) 0

### (

### ) for

### 1 ~

*l*

*l*

*w*

### =

*w*

*r*

*l*

### =

*M*

(3.15)
1 2
1 2
( )
(2) (2) (2)
, 0 ### (

### )

### for ,

1 2### 1 ~

2*l*

*l*

*l l*

*w*

### =

*w*

*r*

+ *l l*

### =

*N*

_{ (3.16) }where (1)

*r* and *r*(2) are linear and quadratic kernel exponential decay factors

0 50 100 150 200 250 300 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 sample(time) A m pl it ude Linear kernel

Room impluse response Envelope-model function

Fig 3.3 Real Linear kernel and exponential model of the envelope

0 5 10 15 20 0 5 10 15 20 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 sample(time) real quadratic sample(time) A m p lit u d e

0 5 10 15 20 0 5 10 15 20 0 0.02 0.04 0.06 0.08 0.1 sample sample Am p lit u d e

Fig 3.5 Exponential model of the envelope of the quadratic kernel

The diagonal elements of tap coefficient error variance matrix

*R*

**(1)**

_{v}### ( )

*k*

and
( 2)### ( )

*R*

_{v}*k*

are (1)_{( )}

_{[(}(1)

_{( )}(1) 2

_{) ]}

*l* *l* *l*

*g* *k* = *E h* *k* −*c* and *g*(2)* _{j}* ( )

*k*=

*E h*[( (2)

*( )*

_{j}*k*−

*c*(2) 2

*) ] ,*

_{j}respectively. We let the initial linear and quadratic tap coefficients to be zero.
i.e. (1)_{(0) 0}
*l*
*h* = , *h*(2)* _{j}* (0) 0= , so

*g*(1)(0)=

_{l}*E h*[(

*(1)(0)−*

_{l}*c*(1) 2) ]=

_{l}_{⎣}⎡

*c*(1)

_{l}_{⎦}⎤2 ≈⎡

_{⎣}

*w*(1)⎤

_{l}_{⎦ and }2 1 2 2 2 (2) (2) (2) 2 (2) (2) , (0) [( (0) ) ]

*j*

*j*

*j*

*j*

*l l*

*g*=

*E h*−

*c*=⎡

_{⎣}

*c*⎤

_{⎦}≈⎡

_{⎣}

*w*⎤

_{⎦}. By (3.12) and (3.13), if we have (1)

_{(0)}

*l*

*g* and *g*(2)* _{j}* (0) , we can get the initial step-sizes of linear μ

*(1)*

_{l OTTLMS}_{,}(0) and

quadratic kernel filter (2)

, (0)

*j OTTLMS*

μ , with initial step-size plugged into (3.13) we can
get (1)_{(1)}

*l*

*g* and *g*(2)* _{j}* (1) , and so forth . Thus, we can find μ

*(1)*

_{l OTTLMS}_{,}( )

*k*

and (2)

, ( )

*j OTTLMS* *k*

μ , recursively. The practical OTTLMS algorithm with exponentially envelope model functions can be summarized as follows:

1. Measure the exponential decay factor of linear and quadratic kernel (1)
*r and *
(2)
*r* to get *w _{l}*(1)=

*w*

_{0}(1)(

*r*(1))

*l*and 1 2 1 2 ( ) (2) (2) (2) , 2 ( )

*l*

*l*

*l l*

*w*=

*w*

*r*+ .

2. Set up initial value (1)_{(0)} (1) 2

*l* *l*
*g* ≈ ⎣ ⎦ for 1, ,⎡*w* ⎤ *l*= … *M* and
1 2
2
(2) (2)
,
(0)
*j* *l l*
*g* ≈ ⎣⎡*w* ⎤_{⎦} for for ,*l l*_{1} _{2} = …1, ,*N*_{2}
3. According to table.3.1

By using the exponential function to model the linear and quadratic kernel, we can practically implement the OTTLMS algorithm, whose performance will be verified in Chapter 5.

**3.4.2 Exponentially approximated temporal function of step-size in OTLMS and **
**OTNLMS**

In Section 3.3, we proposed practical implement in OTTLMS. Now we would further obtain property of step size in adaptive processing.

Fig 3.6 Linear kernel step size temporal function

Fig 3.7 Quadratic kernel step size temporal function

By Fig 3.6 and 3.7, for our optimum step size in AEC, we obtain that large step size in initial times and small in converged times.