• 沒有找到結果。

雙邊對話偵測器在迴音消除之性能評比

N/A
N/A
Protected

Academic year: 2021

Share "雙邊對話偵測器在迴音消除之性能評比"

Copied!
90
0
0

加載中.... (立即查看全文)

全文

(1)



















 



Doubletalk Detector Performance Evaluation in

Acoustic Echo Cancellation

















(2)

































































!

"

#

$





%

&

'



                    ! " # $ % & ' ( ) * +  , - .  /0 1 2 3 4 5 6 7 8   9 : ; < = > ?  @ A B ' C D E F   G H /I J K 6 7 L M  N O P Q I O P R  S T  @ A ! 0 U V  W * + .  X Y U V Z [  \ ] ^ _ I  @ A ! U V @ A `   a b /c  B d 6 7 e f   < = > ?   @ A ! J @ A `   g 3 a b  h i jk g 3 `  a b /l m i j 6 7 R n o p `  a b W q r s a b  = t /6 7 u v w k  x 9 : ; < = > ? R @ A  W * + .  X Y U V [ j/0 y d  z { | } 8  ~  6 7  e f n € v w  B ' /

(3)

Doubletalk Detector Performance

Evaluation in Acoustic Echo

Cancellation

Student: H. R. Chang Advisor: S. F. Hsieh

Department of Communication Engineering

National Chiao Tung University

Abstract

In the adaptive acoustic echo cancellation, double talk can make the adaptive filter diverge from the optimum. In this thesis, the cross-correlation between the microphone signal and estimate echo is used to judge whether double-talk arises. We also derive the theoretical miss probability as a function of false alarm probability. To distinguish the echo path change from double talk and we also propose a modified cross-correlation double talk detector by microphone energy. We not only develop a way to evaluate DTD algorithm whether double-talk or echo path change arises, but also calculate the miss probability. Computer simulations will validate our derivations and proposed methods.





(4)



Acknowledgement

First, I would like to express my deepest gratitude to my advisor, Dr. S.

F. Hsieh. Without his assistance, enthusiastic, and patience, my thesis

would not complete. I also want to think my parents and my sisters. They

always encourage me when I am bad mood. Finally, thanks to my lab

friends, I can always be happy to do my research

(5)

Contents

   ………I English Abstract………II Contents……….III List of Figures………V List of Tables……….VIII

1 Introduction

………...1

2 Double Talk Detector For Acoustic Echo Canceller

...5

2.1 Doubletalk detectors algorithms……….6

2.1.1 Geigel ………..6

2.1.2 Gradientvector correlation………...7

2.1.3 Two echo path model………...8

2.1.4 Crosscorrelation …...……….10

2.2 Comparisons of Double talk detectors ……….…11

2.3 Robust double talk detector...………15

2.3.1 Robust Geigel DTD………16

2.3.2 Robust cross-correlation DTD………...20

2.4 Theoretical analysis of cross correlation DTD………..21

2.4.1 Correlation of microphone and estimated signals………..21

2.4.2 Correlation of microphone and error signals………. 24

2.5 Theoretical analysis of miss probability in double talk period……….27

(6)

3 The Modified Cross Correlation Double Talk Detector By

Microphone Energy

……… 37

3.1 Cross-correlation DTD in echo path change and double talk... 38

3.2 The modified double talk detector by microphone energy……….40

3.3 Evaluating DTD in echo path change and double talk………...45

3.3.1 A novel technique for DTD evaluation in echo path change... 45

3.3.2 Miss probability in echo path change and double talk……… ..48

3.3.3 Weight miss probability………..50

3.4 Variant thresholds in cross correlation DTD………..52

4

Computer Simulations

……….56

4.1 Simulation parameters and room impulse response……….57

4.2 Comparisons of Double talk detectors……….58

4.3 Robust DTD……….59

4.4 Analysis of the cross-correlation DTD………61

4.5 miss probability analysis of the cross-correlation DTD………..63

4.6 Nonlinear effect for cross-correlation DTD……… 66

4.7 Modified cross-correlation DTD by microphone energy……….67

4.8 A technique for evaluating DTD algorithms in double talk and echo path change ……….71

4.9 Variant thresholds in cross correlation DTD………75

4.10 Square inverse law……….76

(7)

List of Figures

Fig 1.1 Structure of acoustics echo canceller………...2

Fig 2.1 Structure of double talk detector………..6

Fig 2.2 Structure of two echo path model………9

Fig 2.3 Speech activity detector……….13

Fig 2.4 Input and output of the speech activity detector………13

Fig 2.5 Evaluation procedure of DTD………...14

Fig 2.6 The difference ξGeigel in single talk and double talk………....15

Fig 2.7 The relation of the step size and ξGeigel……….16

Fig 2.8 Comparison the step size of the Geigel and robust Geigel DTD…………...19

Fig 2.9 Comparison the step size of the cross and robust cross-correlation DTD….20 Fig 2.10 The method of calculating miss probability………..28

Fig 2.11 The PDF of ρd y,ˆ in single talk………30

Fig 2.12 The PDF of ρd y,ˆ in DT………...32

Fig 2.13 Structure of the nonlinear loudspeaker ……….34

Fig 3.1 Variation of ρd y,ˆbetween DT and EPC………39

Fig 3.2 Structure of the modified cross-correlation DTD by microphone energy….42 Fig 3.3 Flow chart of the modified cross correlation DTD algorithm by microphone energy………43

Fig 3.4 The method of the calculating false alarm probability………..46

Fig 3.5 The method of the calculating miss probability………47

Fig 3.6 Flow chart of the miss probabilities computations for DTD and EPC…….49

(8)

Fig 3.8 The cross correlation ρd y,ˆ under variant threshold………...53

Fig.3.9 Algorithm of the variant threshold selection in cross-correlation DTD……. 55 Fig 4.1 Echo path impulse response………57 Fig 4.2 Far end signal and near end speech……….57 Fig 4.3 Comparison of DTDs miss probability in different near end signal energies.58 Fig 4.4 Comparison of the robust and conventional Geigel DTD………...59 Fig 4.5 Comparison of the robust and conventional cross-correlation DTD………...60 Fig 4.6 Simulated and theoretical the mic/AEC correlation ρd y,ˆ in DT and EPC…61

Fig 4.7 Simulated and theoretical the mic/error correlation ρd e, in DT and EPC…62

Fig 4.8 Simulated and theoretical thresholds in cross-correlation DTD under fixed false alarm probability………64 Fig 4.9 Simulated and theoretical cross correlation DTD miss probability under

fixed false probability……….64 Fig 4.10 The effect of near end speech energy and the miss probability under fixed false

alarm probability……….65 Fig 4.11 Simulated and theoretical miss probabilities in cross-correlation DTD……65 Fig 4.12 Nonlinear effect for cross-correlation ρd y,ˆ………..66

Fig 4.13 Cross correlationρd y,ˆof the different EPC under fixed small DT energy...68

Fig 4.14 Microphone energy of the different EPC under fixed small DT energy….68 Fig 4.15 ERLE of the modified cross-correlation DTD in different EPC under fixed

small DT energy……….69 Fig 4.16 Cross-correlationρd y,ˆof the different EPC under fixed large DT energy...69

(9)

Fig 4.18 ERLE of the modified cross-correlation DTD in different EPC under fixed large DT energy………..70 Fig 4.19 The miss probability in DT period of modified cross correlation DTD in

different DT energy………...72 Fig 4.20 Comparison of the miss probabilities of modified cross correlation DTD

and cross correlation DTD in different EPC……….73 Fig 4.21 Comparison of the weighted miss probability with the miss……….74 Fig 4.22 Comparison of the weighted and no weighted miss probability…………...74 Fig 4.23 Comparison of the fixed threshold and the variant threshold in cross

correlation DTD………75 Fig 4.24 The miss probability for different

κ

'

under fixed false alarm probability...76

(10)

List of Tables

2.1 DTD evaluation procedure in case of double talk………..12 3.1 Cross correlation DTD in different cases……… ..41 3.2 DTD evaluation procedure in case of echo path change………48

(11)

Chapter 1

Introduction

For hands-free communication systems, it is important to provide users a better quality and comfortable conversation. In these hands-free systems, acoustical echo is a major issue that leads to bad speech quality. An echo canceller removes echo due to echo path coupling between a loudspeaker and microphone. Double talk (DT) is a serious problem in the adaptive acoustic echo cancellation which can fail to trace the room impulse response especially for some error feedback adaptive filters like LMS and RLS [1].

A teleconference system with acoustics echo canceller (AEC) is shown in Fig 1.1 where a linear filter ˆh is used to model the echo path h between the speaker and the microphone. Thus the replica of far end speaker’s echo ˆ( )y k is generated, which is

subtracted from the echo received by the microphone signal d k . The AEC filter is ( ) typically updated using an adaptive algorithm to account for any changes in the room impulse response.

The implementation of such a system is not as easy as it seems. Because the performance of an algorithm will be affected by long impulse response length for the linear filter, fast convergence characteristic for signal inputs such as speech and fast adaptability to variations in echo path. Among all the adaptive algorithms for AEC, the LMS algorithm and normalized LMS (NLMS) algorithm [1] are popular ones for their simplicity and predictable behavior.

(12)

ˆh

ˆ( )

y k

Fig 1.1 The structure of acoustics echo canceller

An adaptive echo canceller updates the tap coefficients of an adaptive filter to model echo path using an error signal e k as shown in Fig 1.1. If the tap coefficients are ( ) updated during the double-talk situation, which means that microphone input signal includes both near-end talker signal v k and echo signal ( )( ) y k , they can fluctuate greatly

or diverge to misestimate the impulse response of echo path. Hence, AEC should stop the filter adaptation during the double talk period.

Several double talk detectors (DTDs) has been proposed. The conventional double talk detection algorithms are classified into several categories.

(I) Level comparison type is used to detect double talk by comparing the microphone signal level [2] or the error signal level [3] with the primary input signal level.

(13)

(II) CLMS algorithm [11] is used to distinguish DT from varying echo path and ECLMS has better performance than CLMS but they have the drawback of higher computational complexity.

(III) Cross-correlation type [7] [12] [13] [14] can detect double talk by different correlations. In this thesis, we adopt the cross correlation DTD method which is a better DTD than the other two methods because it is affected very slightly by the volume of the microphone or loudspeaker change.

(IV) Recently also some DTD algorithms have also been developed that are specifically suited for subband [15].

(V) One way to guarantee that the adaptive filter is not unnecessarily halted is to use a secondary FIR filter as in the two-path algorithm [6] [9] [16].

Several doubletalk detectors (DTDs)/step-gain controllers, which halt the adaptation during doubletalk, have been proposed. However, a badly tuned DTD induces the risk of halting the adaptive filter when it should not be halted, e.g., in an echo path change situation. A critical question is that merely measuring these signals cannot discriminate between double-talk and echo-path-change. If echo path change is mislabeled as double-talk, AEC performance degrades.

In Chapter 2, we compare several DTD’s using the technique from [8]. The comparison in [8] considered only Geigel and normalized cross correlation DTD. We add different DTDs to compare. We also derive the theoretical cross correlation in double talk or echo path change. The miss probability from [8] is simulated value rather than theoretical value. We derive the theoretical miss probability in double talk period from [8]. The nonlinear loudspeaker effect is also discussed in cross correlation DTD.

(14)

In Chapter 3, we modify the cross correlation DTD. The modified cross correlation DTD by microphone energy can decide correct in any double talk or echo path change case. We also propose that the evaluating DTD technique can calculate miss probability when echo path change is present. We also use the variant threshold to improve the performance.

In chapter 4, the simulations follow to verify the results of our analysis and we will compare the simulated and analytical cross correlation. The modified cross correlation DTD will verify in different cases. We will also compare the simulated and analytical miss probability in double talk period. Finally, in chapter 5, the conclusions are given there.

(15)

Chapter 2

Double Talk Detector For Acoustic

Echo Canceller

In this chapter, the serious problem, double talk, in AEC will be discussed. An adaptive echo canceller [1] updates the tap coefficient of an adaptive filter to model echo path using the error signal e k as shown in Fig 2.1. If the tap coefficients are updated ( ) during the double talk situation, which means that microphone input signal includes both near-end talker signal v k and echo signal ( )( ) y k , they can fluctuate greatly or diverge to

misestimate the impulse response of echo path. Hence, AEC should freeze the filter adaptation during the double talk period.

In Section 2.1, we introduce several double talk detector algorithms. In section 2.2, we compare the Section 2.1 DTD and calculate the miss probability in DT period. In Section 2.3, we modify the DTD to more robust. In Section 2.4, we will derive the theoretical cross-correlation in double talk and echo path change. The theoretical miss probability is derived in Section 2.5. The nonlinear loudspeaker effect on the cross correlation double talk detector is discussed in Section 2.6.

(16)

ˆh

loudspeaker microphone Far-end Residual error d k( )

h

Near-end

speech

v(k)

n(k) noise

ˆ( )

y k

( ) e k

( )

x k

Echo signal y(k)

DTD

Fig 2.1 Structure of double talk detector

2.1 Double talk detectors algorithms

We have introduced several double talk detectors in introduction. Now, we discuss explicitly the Geigel, gradient vector, two echo path model, and cross correlation DTD.

2.1.1 Geigel DTD

One simple DTD algorithm due to Geigel [2]. The algorithm is given as follows.

( )

=

max{ (

1) ,..., (

)}

Geigel

d k

x k

x k N

ξ

(2.1.1)

(17)

This detection scheme is based on a waveform level comparison between the microphone signal d k and the far-end speech ( )( ) x k assuming the near-end speech

( )

v k in the microphone signal will be typically stronger than the echo signal.

When ξG eig el is larger than the threshold TG e i g e l , the DTD is decided that double-talk is present. Then the adaptation is halted. TG eig el compensate for the energy level of the echo path response h . However, when the magnitude of d k( ) is -6 dB, the Geigel DTD fails to detect the double talk. For an AEC, however, it is not easy to set a universal threshold to work reliably in all the various situations because the loss through the acoustic echo path can vary greatly depending on many factors.

2.1.2 Gradient vector correlation DTD

Rohrs and Younce [4] specifically targeted the DT problem. Their algorithm considered the correlation between the instantaneous gradient estimation and the average of previous estimation. The gradient vector is defined as follows:

( )∇ k x k e k( ) ( ) ⋅

( )k (k 1) ( )k (k B) ∇ = ∇ − + ∇ − ∇ −

( ) ( ) ( 1)

S k = ∇ k ∇ − k

where B> will depend on the filter length. 0

( )k

means that the instantaneous far end signal x k( ) multiplied by the momentary

error e k( ). ( )∇ k is the average of previous estimation.

(18)

detector adjusts the weights using a fixed step size LMS update; otherwise, the coefficients are frozen. But, the algorithm is effective for a small adaptive filter length. However, their performance degraded considerably with long adaptive filter length.

Creasy and Aboulnasr [5] improved the above problem. The algorithm used a variable step size NLMS-based approach by the gradient correlation. The algorithm was given as follows: ∇( )k =x k e k( ) ( )⋅ ( )∇ k = ∇ − + ∇(k 1) ( )k − ∇ −(k B) ( )k ( )k (k 1) ∇ = ∇ ⋅∇ − ( ) ( 1) (1 ) [ ( )] p k =βp k− + −β signk 2 ( )k (k 1) (1 )sign p k p k[ ( )] ( ) µ = ⋅α µ − + −α (2.1.2) ( )k

µ is step size. When the step size µ( )k becomes very small, the DTD decides that double-talk is present. Even the adaptation keeps updating coefficients; the adaptive filter will not diverge. This algorithm is more robust.

2.1.3 Two Echo Path Model

Another structure of DTD, in Fig 2.2, is two echo path model [6]. This structure is a good choice to implement in real environment for its excellent stability. It is based on a structure of two path model, a background filter h and a foreground filterBG h . FG

(19)

ˆ

BG

h

ˆ( )

y k

ˆ

FG

h

( )

b

e k

( ) f e k ( ) d k

DTD

v k

( )

Fig. 2.2 Structure of the two echo path model

If the background filter h is estimated to have better performance than the BG foreground filterh , its filter coefficients are copied to the foreground filter. The double FG talk detector is controlled by comparisons between the short-term powers of the signals,d k( ),x k( ), e k and f( ) e k . b( )

The update conditions for the foreground filter are basically as given by

( ) ( ) ( ) and and ( ) ( ) ( ) b b f e e d X d e P k P k P k a b c P k P k P k = = = (2.1.3) where 1 2 0 1 ( )X M ( ) i P k X k i M − = =

− ,

M

is update interval.

When a, b and c is larger than the threshold T ,a T and b T at the same time, c the DTD decides that double-talk is present . The background filter will not be copied to foreground filter. The foreground filter retains its convergent coefficients.

(20)

2.1.4 Cross Correlation DTD

Ye and Wu [7] proposed a double-talk algorithm based on the cross-correlation between x k( ) and e k( ). However, the cross correlation DTD has a numerous correlation. We can use the different correlation based on d k( ), y kˆ( ), x k( ), and e k( ). Therefore, we choose one of the cross-correlations. We use the cross-correlation between d k( ) and

ˆ( )

y k .The cross-correlation ρ is defined as: d y

ˆ , ˆ , ˆ ( ) ( ) ( ) ( ) d y d y d y P k k P k P k ρ = (2.1.4) where P kd y,ˆ( ) (1- )= λ P kd y,ˆ( -1)+λd k y k( ) ( )ˆ ( ) (1- ) ( -1) 2( ) d d P k = λ P kd k 2 ˆ( ) (1- ) ( -1)ˆ ˆ ( ) y y P k = λ P k +λy k

λ is the forgetting factor , 0< < λ 1

When ρ is below threshold d yTd yˆ , the DTD is decided that double-talk is present. Then

(21)

2.2 Comparisons of Double talk detectors

There have been several algorithms to detect double talk in an acoustics echo canceller. Jun H. Cho and R. Morgan [8] proposed an objective technique to evaluate double talk detectors. The technique could calculate the double talk detector miss probability. In [8], they compare the Geigel DTD with the normalized cross correlation DTD. In this section, we extend the technique to evaluate the Gegel, gradient vector, Two echo path model, and cross correlation DTD. We compare the four kinds of the DTD in this section.

In Section 2.1.3, we introduced the two echo path model. Two echo path model can detect double talk by Eq (2.1.3). We modify the condition to decide double talk.

We only use ( ) , ( ) ( ) ( ) b e d X d P k P k a b P k P k

= = to detect double talk. In order to simplify, we use the forgetting factor to smooth a and b .

The objective technique first step is to calculate threshold under fixed false alarm probability. The false alarm probability is measured as the proportion of the far-end speech in which doubletalk remains declared when there is no near-end speech.

The probability of false alarm at each threshold point is calculated as

f x v P N φ ⋅ ⋅ =

(2.2.1) where φ is the DTD output, x is the activity detector output from Fig2.3. N is the length of the entire far-end speech signal x.

From Fig 2.4, the output of the far end signal speech activity detector is either one or zero, and also the near end speech. We can find that the output is zero when the far end signal is silent. From Fig 2.5, the logical AND with the activity of is necessary to disregard false alarms during innocuous periods of inactivity. Then, the threshold is determined to achieve the given false alarm probability.

(22)

Second step is to calculate miss probability. The miss probability is measured as the proportion of near-end speech duration that remains undetected at different levels of near-end to far-end speech energy ratio (NFR=

2 2 v x σ σ ).

Once the threshold T is determined, the near-end speech is applied at different attenuation levels, and the detection procedure runs again. The miss probability is calculated as follows. m 1 x v P x v φ ⋅ ⋅ = − ⋅

(2.2.2)

In Chap 4, we will use the technique to compare the four kind of DTD. The complete DTD evaluation technique is summarized as follows. 1) Set near-end signal v = 0.

(a) Select threshold T.

(b) Compute false alarm probabilityP using (2.2.1). f

(c) Repeat steps a, b over a range of threshold values. (d) Select threshold value that corresponds to

P =

f

0.1

2) Select NFR value.

(a) Add near-end signal (b) ComputeP using (2.2.2). m

(c) Repeat steps a, b, c over all conditions. 3) Repeat step 2 over a range of NFR values. 4) Plot averageP as a function of NFRm .

(23)

α

x

Fig 2.3 Speech activity detector

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -2 -1.5 -1 -0.5

0 input of the speech activity detector

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 0 0.5 1 iteration

output of the speech activity detector

(24)

x

v

φ

Fig 2.5 Evaluation procedure of DTD

(25)

2.3 The robust double talk detector

In section 2.1, we have introduced several double talk detectors, including Geigel DTD, two echo path model, and cross correlation DTD. These DTD decide double talk by the threshold in Fig 2.6. But, the DTD decision is dichotomous. We can modify the decision more mildly to the robust double talk detector. The robust DTD can adapt the coefficients whatever double talk or single talk is present in Fig 2.7. Therefore, the robust DTD can alleviate the miss probability, and we do not set the sensitive threshold to avoid detecting error.

Fig 2.6 The difference ξGeigel in single talk and double talk

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 iteration Geigel parameter

DT

Threshold

detect error

detect error

(26)

Fig 2.7 The relation of the step size and ξGeigel

2.3.1 Robust Geigel DTD

In section 2.1.1, we have introduced the Geigel DTD. The detector uses the microphone signal and far end signal energy ration to decide double talk in Eq. (2.1.1). We can find that ξGeigel is quick change in the each iteration whatever double talk or single talk and ξGeigel is increasing in DT period. If ξGeigel is larger than the TGeigel, we can detect the double talk and stop adapting. On the contrary, the filter continues adapt coefficients. But, ξGeigel is the decision is dichotomous. Now, we can adapt the coefficients by ξGeigel even double talk is present. This means that the step size is very small in double talk period. 0.155 0.16 0.165 0.17 0.175 0.18 0.185 -0.05 0 0.05 0.1 0.15 0.2 Geigel Parameter Step Size

Single Talk

(27)

Before, we derive the ξGeigel PDF. We modify the Geigel DTD. We show that the modified Geigel DTD is given as follows.

( ) (1- ) ( -1) ( ( )) d d P k = λ P k + ⋅λ abs d k ( ) (1- ) ( -1) max( ( ( ))) x x P k = λ P k + ⋅λ abs x k ( ) ( ) ( ) d Geigel x P k k P k ρ = (2.3.1) Because, ξGeigel is fast change in every time. We smooth ξGeigel to avoid a sudden

change. ρGeigel is the microphone signal amplitude and far end signal ration. However,

Geigel

ρ is difficult to analyze. Therefore, we modify the Geigel DTD criterion to energy

ratio.

The modify Geigel DTD is given as follows.

2 ( ) (1- ) ( -1) ( ) d d P k = λ ⋅P k + ⋅λ d k 2 ( ) (1- ) ( -1) max( ( )) x x P k = λ ⋅P k + ⋅λ x k ( ) ( ) ( ) d x P k P k P k α = (2.3.2)

The Pα is very like ξGeigel, and ξGeigel is amplitude ratio, and the Pα is energy

ration. Now, we derive the Pα PDF [17]. From Fig 2.7, we can find that the detector make some error decision. However, we can analyze Pα PDF to alleviate the detected error.

First, we analyze the microphone signal. The microphone signal includes the echo signal, near end speech, and noise.

( ) T ( ) ( ) ( ) d k =h x k +v k +n k d k2( )= h x k2 2( )+v k2( )+n k2( )

(28)

We assume that the far end signal x k( ), near end speech v k( ), and noise n k( )

are normal distribution and x k , 2( ) v k ,and 2( ) n k are Chi-Square. 2( )

2 2 2 2 1 . ( ) 2 x x x x r v x f x e σ πσ − = Gaussian distribution (2.3.3) 2 2 2 2 1 . ( ) 2 x x x x r v x f x e x σ π σ − = Chi-Square distribution (2.3.4)

We also assume the microphone signal energy and max( ( ))x k2 are also

Chi-Square. The Pα PDF is given as follows. 2 2 2 1 ( ) 1 ( ) P d x d f P P P α α α α π σ σ σ = + (2.3.5)

From Eq. (2.3.5), the expectancy of the random variable Eα is calculated as follows. We also assume the far end signal energy is equal one.

[ ] (4 ) 2 2 d P E Pα µ α π σ π − = = (2.3.6) var[ ] 2 [( ) ]2 P P E P α α =σ = α −µ 2 2 2 0.02863 ( 0.116 0.2375 ) d d d σ σ π σ = − + + (2.3.7)

where the microphone signal energy 2 2 2 2 2

d h x v n

σ σ +σ +σ

If 2 1

d

σ = means only single talk, [ ] 0.13E Pα ≈ , and var[ ] 0.05Pα = . We can find

the converged value is 0.13 in modified Geigel DTD. If the Pα is closed to the converged value. That means that the filter has converged. If the Pα is larger than the converged value, this means double talk is possible present.

We set the soft threshold near Pα mean. However, from (2.3.5), we can find that Pα PDF is not symmetric. The soft threshold lies between 2 2 0.12

Pα Pα

µ − σ = and

2 0.135

(29)

In Fig 2.8, we discuss the relation of the step size and Pα. From Fig 2.8, Geigel DTD decides double talk by one threshold. But, this decision is too hard. However, the robust Geigel DTD set the buffer range to advance DTD performance. This means that the detector has the double threshold. Using the buffer range can robust DTD.

0.11 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 -0.05 0 0.05 0.1 0.15 0.2 Geigel Step Size

Geigle DTD

Robust Geigel DTD

(30)

2.3.2 Robust cross-correlation DTD

Now, we discuss the cross-correlation DTD in this section. Before Section 2.3.1, we set the buffer range to robust Geigel DTD. Using the same idea, we can robust the cross-correlation DTD. We also extend two kinds of the buffer range.

In Section 2.1.4, we introduced the cross-correlation DTD. Now, we modify the decision rule. The threshold set to be 0.7. Therefore, we set the buffer range near 0.7. One set the buffer range lies between 0.6 and 0.8, and another range lies between 0.65 and 0.8.

0.55 0.6 0.65 0.7 0.75 0.8 0.85 -0.05 0 0.05 0.1 0.15 0.2 Cross Correlation Step Size Cross Correlation DTD

Robust Cross Correlation DTD (I) Robust Cross Correlation DTD (II)

Fig 2.9 Comparison the step size of the cross and robust cross-correlation DTD

We can analyze the DTD parameter PDF to set the buffer range. If we have the buffer range, the robust DTD performance is better than the conventional DTD. The result will be simulated in Chapter 5.

(31)

2.4 Theoretical analysis of cross-correlation DTD

In this section, we analyze two kinds of cross correlation DTD. The mic/AEC correlation and mic/error correlation is discussed. When we analyze the correlation in double talk/echo path change, we assume that the adaptive filter has converged in single talk. We also assume that the far end signal x k( ), noise n k( ) and near end signal v k( )

are white Gaussian signals andx k( ),n k( ),v k( ) are mutually independent. If we know the exact cross correlation, we can set the appropriate threshold.

2.4.1 Correlation of microphone and estimated signals

Before, we discussed the cross-correlation DTD. We found the correlation values will decrease whether double-talk or echo path change arises. But we do not know the exact degraded value in any double talk or echo path change degree.

(I) Cross-correlation ρDT in double talk

Cross-correlation in Eq. (2.1.4) is rewritten here for simplicity.

ˆ , ˆ , ˆ ( ) ( ) ( ) ( ) d y d y d y P k k P k P k ρ =

By assuming, the adaptive filter hˆ is closer to echo path channel h . That means

ˆ

h h≈ . The cross correlation DTD use the forgetting factor λ to smooth ρd y,ˆ( )k and

implement online DTD. But, the forgetting factor is hard to analysis. Fortunately, the forgetting factor ρd y,ˆ( )k is closer to the expectation ρd y,ˆ( )k when the ρd y,ˆ( )k

converged. It means ,ˆ ˆ , 2 2 ˆ ( ) [ ˆ] ( ) ( ) ( ) [ ] [ ]ˆ d y d y DT d y P k E d y k P k P k E d E y ρ = ≈ ⋅ ρ

(32)

First, we examineP kd y,ˆ( ). ˆ , [ ( )* ( )]ˆ [( ( ) ( ) ( )) * ( )]ˆ d y P =E d k y k =E y k +v k +n k y kE y k y k[ ( ) ( )]ˆ And y k( ) h x kT ( ) , ( )y kˆ h x kˆT ( )

= = then Pd y,ˆ can express. ˆ , [ ( ) ( ) ]ˆ T T d y PE h x k x k h We assume h hˆ≈ , 2 ˆ ,

[

( ) ( ) ]

T T T d y x

P

E h x k x k h

h hσ

(2.4.1) Next, we proceed to find P kd( )

Pd =E d k d k[ ( ) ( )]⋅ =E y k[( ( )+v k( )+n k( )) ( ( )⋅ y k +v k( )+n k( ))] 2 2 [ T ( ) ( )T ( ) ( )] E h x k x k h v k n k ≈ + + 2 2 2 T x v n h hσ σ σ = + + (2.4.2) Last, ˆ [ ( )* ( ) ]ˆ ˆ TT ( ) ( ) ] T ˆ y P =E y k y k =E h x k x k h T 2 x h hσ = (2.4.3) Finally, we combined (2.4.1) (2.4.2) with (2.4.3)

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ˆ [ ] ˆ [ ] [ ] 1 (Single-talk) (2.4.4) ( ) ( ) 1 1 (Doube-talk) (2.4.5) ( ) ( ) 1 DT T x T T x n x n T x T x T T x v n x n v T x E d y E d E y h h h h h h h h h h h h h h h h ρ σ σ σ σ σ σ σ σ σ σ σ σ σ σ ⋅ =  ⋅ =  ⋅ + ⋅ ⋅  +  =  ⋅   + + +  + ⋅ 

(33)

From (2.4.4), we find that the correlation value is much closer to 1 when single talk is present. From (2.4.5), it can find decreases in accordance with the near end signal variance in double talk period. By theoretical analysis, we can know ρ in any DT DT situation.

(II) Cross-correlation ρEPC n echo path change hc

Next, we analyze the cross correlation when the converged filter hˆ in single talk undergoes an abrupt echo path change hc. This means

h h h

≈ ≠

ˆ

c where h is origin

echo path. First, we analyzed P kd y,ˆ( ) Pd y,ˆ=E d k y k[ ( ) ( )]⋅ˆ =E y k[( ( )+n k( )) ( )]⋅y kˆ ˆ [( ( ) ( )] E y k y k ≈ ⋅ T ˆ2 c x h hσ = (2.4.6) Next, we proceed to find P kd( )

[ ( ) ( )] [( ( ) ( )) ( ( ) ( ))] d P =E d k d k⋅ =E y k +n ky k +n k 2 2 T c c x n h h σ σ ≈ + (2.4.7) Last, 2 ˆ [ ( )* ( )]ˆ ˆ ˆ ˆT y x P =E y k y k =h hσ (2.4.8) Finally, we combine (2.4.6) and (2.4.7) with (2.4.8)

ˆ , ˆ , 2 2 ˆ ( ) [ ] ( ) ( ) ( ) d y d y EPC d y P k E d y k P k P k d y ρ = ≈ ρ 2 2 2 ˆ ˆ ˆ ˆ ˆ ˆ ( ) ( ) ( ) ( ) T T c x c T T T T x c c x c c h h h h h h h h h h h h σ σ σ = = ⋅ ⋅ ⋅ ⋅ ( ) ( ) c T c hh T T c c h h h h h h ρ ≈ = ⋅ (2.4.9)

(34)

From (2.4.9), we find ρd y,ˆ( )k also the decreases in accordance with origin channel

h and changed channel hc correlation. In double talk or echo path change, ρd y,ˆ( )k will

decrease. The cross-correlation DTD is difficult to detect double talk and echo path change.

2.4.2 Correlation of microphone and error signals

Now, we discuss another double talk detector by the microphone signal d k( )and error e k( )correlation.

The micro/error correlation DTD algorithm is expressed as:

, , ( ) ( ) ( ) ( ) d e d e d e P k k P k P k ρ = (2.4.10) where P kd e, ( ) (1- )= λ P kd e, ( -1)+λd k e k( ) ( ) ( ) (1- ) ( -1) 2( ) d d P k = λ P k +λd k ( ) (1- ) ( -1) 2( ) e e P k = λ P k +λe k λ is forgetting factor , 0< < λ 1

(I) Cross-correlation in double talk

, , 2 2 ( ) [ ] ( ) ( ) ( ) [ ] [ ] d e d e d e P k E d e k P k P k E d E e ρ = ≈ ⋅

(35)

Pd e, = E d k e k[ ( ) ( )]⋅ = E h x k[( T ( )+n k( )) (⋅ h x kT ( )−h x kˆT ( )+n k( ))] =E h x k[( T ( )+n k( )) (⋅ ∆hx k( )+n k( ))] 2 2 , [ ( ) ( ) ( ) ( )] T T T T d e x n P E h x k x k h n k n k∆ + = h hσ + (2.4.11) σ Next, we proceed to find P kd( )

[ ( ) ( )] [( T ( ) ( )) ( T ( ) ( ))] d P =E d k d k⋅ =E h x k +n kh x k +n k T 2 2 x n h hσ σ ≈ + (2.4.12) Last 2 2 [ ( ) ( )] T e x n P =E e k e k ≈ ∆ h hσ +σ (2.4.13) Finally, we combined (2.4.11) (2.4.12) and (2.4.13) to get

, , ( ) ( ) ( ) ( ) d e d e d e P k k P k P k ρ 

2 2 2 2 2 2 ( )( ) T x n T T x n x n h h h h h h σ σ σ σ σ σ ∆ + ≈ + ∆ ∆ + 2 2 2 2 2 2 2 2 = ( ) ( ) T x n T x n x n h h h h h h σ σ σ σ σ σ ∆ + ∆ + + − ∆

Before we assumed the adaptive converged ˆ , h h= ∆ ≈ .h 0

, 2 2 2 2 2 2 2 ( ) ( ) T x n d e T T x n x n h h k h h h h σ σ ρ σ σ σ σ ∆ + = ∆ + +

2 2 2 2 2 1 1 ( ) T x n T x n h h h h σ σ σ σ = + ∆ +

(36)

2 2 d,e 2 2 2 1 (single talk) (2.4.14) 1 ( ) 1 (Double talk) (2.4.15) 1 ( ) T x n T x n v h h k h h σ σ ρ σ σ σ    +  ≈    +  + 

From (2.4.14), the single talk is very close 0 when SNR is very large. This means that the microphone signal is significantly different from the error. So, the correlation value is small. From (2.4.15), we can find that the near end speech energy increases the noise power. Because the near end speech is view as the noise in AEC, so the noise power adds the near end speech energy in double talk period. Similarly, we can find the correlation value in (2.4.15) becomes large in double talk period.

(II) Cross-correlation in echo path change First, we examine P kd e, ( ) Pd e, = E d k e k[ ( ) ( )]⋅ = E h x k[( cT ( )+n k( )) ((⋅ hcTh x kˆ) ( )+n k( ))]

(T T ˆ) 2 2 T 2 2 c c x n c x n h h hσ σ h hσ σ ≈ − + = ∆ + (2.4.16) Next, we proceed to find P kd( )

Pd =E d k d k[ ( ) ( )]⋅ ≈h h σc cT x2+ (2.4.17) σn2 Last

[ ( ) ( )] T 2 2

e x n

(37)

, , ( ) ( ) ( ) ( ) d e d e d e P k k P k P k ρ  2 2 2 2 2 2 ( )( ) T x n T T x n x n h h h h h h σ σ σ σ σ σ ∆ + ≈ + ∆ ∆ + 2 2 2 ( )( ) T c x T T T c c x x h h h h h h σ σ σ ∆ ≈ ∆ ∆ T c h h

ρ

(2.4.19)

From (2.4.19), when echo path change hc is present, ρd e, ( )k is equal to the

correlation of the change channel hc and the difference h∆ in the new channel and origin channel h . From (2.4.15) and (2.4.19), ρd e, ( )k will decrease whatever double talk or echo path change. The mic/error correlation DTD is difficult to detect the two situations.

2.5 Theoretical analysis of miss probability in double talk

period

In Section 2.2, we introduced the technique to evaluate double talk detectors, and calculated the miss probability. But, the miss probability is done by numerical simulation in Fig 2.10. In this section, we will derive the theoretical miss probability. We can analyze the DTD parameter probability density function (PDF). We will derive the PDF of the cross-correlation ρd y,ˆ( )k , an important DTD parameter. From the PDF, the miss

(38)

Fig 2.10 The methods of calculating miss probability

Now, we analyze the correlation ,ˆ ˆ , ˆ ( ) ( ) ( ) ( ) d y d y d y P k k P k P k ρ = from (2.1.4). Because ˆ , ( ) d y k

ρ use forgetting factor to smooth. But, ρd y,ˆ( )k is a random variable. Then, by

discarding the forgetting factor in (2.1.4), where the moment random variable

ˆ , 2 2 2 2 ˆ ˆ ( ) ( ) ( ( ) ( ) ( )) ( ( )) ( ) ˆ ( ) ( ) ( ( ) ( ) ( )) ( ( )) T T d y T T d k y k h x k v k n k h x k k d k y k h x k v k n k h x k ρ = ⋅ = + + ⋅ ⋅ + + ⋅ ˆ

h h≈ is assumed as the filter converges in single talk.

2 2 ˆ , 2 2 2 2 2 2 2 2 2 2 ( ) 1 ( ) ( ) ( ) ( ( ) ( ) ( ))( ( )) 1 ( ) ρ ≈ = + + + + d y h x k k v k n k h x k v k n k h x k h x k (2.5.1)

From (2.5.1), we simplify ρd y,ˆ( )k is a function of the random variables 2( )

x k ,v k2( ),and n k2( ). Since the random variable x k( ) is normally distributed, and

random variable x k2( ) is Chi-Square distributed. The complete DTD evaluation

technique [8] calculates the miss probability. Before measuring miss probability, the DTD threshold is predetermined to meet the given false alarm probability.

-25 -20 -15 -10 -5 0 5 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NFR Miss probability Cross-correlation DTD 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 iteration cross correlation T

False Alarm Miss

(39)

First, we set that near end signal is zero (v k( )=0). The correlation ˆ , 2 2 2 2 1 ( ) ( ) ( ) 1 T d y T T T T h xxh k h xx h n h xx h n h x ρ = = + ⋅ + (2.5.2)

In order to find the DTD threshold, we must derive (2.5.2) PDF. Now, we need the h hxT 2 PDF. We also defines βx2 h hxT 2

=  , whereβ =h hT , whose PDF is also Chi-Square distribution.

2

1

( )

2

s s

f s

e

s

β

πβ

=

We define random variablez n2 s

 . By the derivation, the random variable PDF z is calculated as follows. 2 2

1

( )

1

(

)

z n n

f z

z

z

π σ β

β σ

=

+

Now, the cross correlation is simplified. It is relation with the random variablez.

ˆ , 1 ( ) 1 d y k z ρ = +

We continue to transform random variable. Defining random variable w 1+z , the random variable w PDF is given.

2 2 2 2

2

1

1

( )

w

1

1

1

(

1) (

)

w n n

w

f w

w

w

π

β σ

β

σ

=

+

Then the correlation PDF is simplified. ,ˆ

1 ( ) d y k w ρ = . Last, we define 1 w

ρ  . The random variable ρ PDF is given as follows. d y

, 2 2 2 2 2

2

1

1

( )

, 0 1

1

(1

) (

)

d y n n

f

ρ

ρ

ρ

ρ

ρ

π

β σ

ρ

β

σ

= ⋅

<

<

+

(2.5.3)

(40)

We have the correlation PDF without near end signal, so we can calculate the theoretical DTD threshold under fixed false alarm probability. The false alarm probability

f

P means that the detectors decide double talk when double talk is not present.

Pf =P(DT is detected | DT Not happens)

If the correlation is below threshold, the detector will decide double talk. If we know the fixed false probability, we can calculate the theory threshold by (2.5.4).

Fig 2.11 The PDF of ρ in single talk d y

In Fig 2.11, the threshold is chosen, and the false alarm probability is also determined. The dotted line area is equal the false alarm probability.

, 2 2 2 2 0 0 2

2

1

1

( )

1

(1

) (

)

d y T T f n n

P

f

ρ

ρ ρ

d

d

ρ

ρ

ρ

π

β σ

ρ

β

σ

=

+



(2.5.4)

T is theoretical threshold, we assume β h hT ⋅ 

f

(41)

2 2 2 [ ] (1 ) f n T P ArcTan T π σβ = − (2.5.5) 2 2 2 2 [tan( )] 2 {( [tan( )] ) 1} 2 f n f n P T P π σ β π σ β − − = + 2 2 2 2 2 2 1 1 1 1 1 2 n f T P σ α α α π = ≈ − = − ⋅ + (2.5.6) where -1 -1 tan( ) ( ) 2 2 f f n n P P h h π πσ α σ

From (2.5.5), we can find the theoretical threshold as a function of the false alarm probability. In (2.5.6), we simplify the theoretic. The simplicity can more easy to see the relation between T and Pf .Now, we added near-end signalv k( )in (2.5.2), and the

ˆ , ( ) d y k ρ PDF would change. ˆ , 2 2 2

1

( ) =

1

d y T

k

n

v

h hx

ρ

+

+

(2.5.7)

Comparing with (2.5.2), (2.5.7) has an added near-end signal random variable. We assume the random variable r n= +2 v2 is still Chi-square distribution for simplicity.

Now, the PDF of ρd y,ˆ( )k with DT can be got.

, 2 2 2 2 2 2 2

2

1

1

( )

, 0 1

1

(

)(1

) (

)

(

)

d y v n v n

f

ρ

ρ

ρ

ρ

ρ

π β σ

σ

ρ

β

σ

σ

= ⋅

<

<

+

+

+

(2.5.8)

Comparing (2.5.3) with (2.5.8), we can find that the difference of the PDF change only the noise power from the near end signal energy and noise power. Now, we can

(42)

calculate the miss probability Pm from (2.5.8).

Fig 2.12 The PDF of ρ in DT d y

(DT is Not detected | DT happens) m

P P

The PDF of ρ in DT is in Fig 2.12, and the threshold is also the same with in d y

Fig 2.10. When ρ is below threshold, the detector decides DT in double talk period. On d y

the contrary, double talk occurs that the correlation is above threshold. The detector deices no double talk, so the detector occur error. From Fig 2.11, we define the miss probability from (2.5.9). The dotted line area is equal the miss probability.

2 0 dy( , )

T

v

fρ ρ σ dρ

means that the detector decide correct probability. The correct probability means that the DTD detect double talk in double talk period. Therefore, the miss probability P = 1 – correct probability. m

1 2 2

1

T

( ,

)

=

( ,

)

P

= −

f

ρ σ

d

ρ

f

ρ σ

d

ρ

(2.5.9) 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Cross correlation P D F Single-talk PDF Double-talk PDF

T

m P

(43)

where T is theoretical threshold. Because (2.5.9) is too complicated, so it can not be written a closed form. From (2.5.9), we can find the miss probability depends on near end signal energy and threshold. In Section 2.4, we analyze the mean ofρ . In Section 2.5, we dyˆ

deeply analyzeρ . dyˆ

Using the integration definition, we can approximate (2.5.9).

ˆ 1 ˆ ( ) , 1 dy m T f dy T P =

ρ ρ dρˆ( ˆ ) 1 (1 ) 2 dy dy fρ ρ T T ≈ = + ⋅ − (1 ) 2 2 12 '12 f f n P T P σ π κ κ κ ≈ ⋅ − ≈ ⋅ ≈ (2.5.10)

In (2.5.10), we can find a square inverse law between the false alarm and miss probability. We can use the law to simplify the computation. Without solving the DTD parameter PDF, we can find κ' to get the miss probability.

2.6 Nonlinear loudspeaker effect for DTD

Before we discussed that the acoustic echo path cancellers use linear adaptive filter structures in double talk period to model the acoustic path of the loudspeaker enclosure microphone system, such as the FIR filter ˆ( )h k described by its impulse response. However, low-cost applications employ small loudspeakers operating beyond their range of linear transduction, and mobile communication terminals may be designed to tolerate clipping of large amplitudes in the amplifier to achieve high sound levels [10].

Unlike the earlier linear loudspeaker, now, the echo component has nonlinear part and linear part. The cross-correlation double talk detector can detect double talk in linear loudspeaker. But for nonlinear loudspeaker in Fig 2.10, can the cross correlation DTD detect double talk?

(44)

( )

x k

( )

d k

( )

x k

( )

d k

Fig 2.13 Structure of the nonlinear loudspeaker

Next, we analyze that the cross-correlation detector in case of a nonlinear loudspeaker. First, we assume that the nonlinear function in Fig 2.13 is 3

1 3

( )

s x =a x a x+ .

( )

s x is output signal from nonlinear loudspeaker,x is far end signal, and a1 , a3 is nonlinear function coefficients. We can find that nonlinear function include with linear and nonlinear part. Now, we analyze P kd y,ˆ( ) Pd y,ˆ=E d k[ ( ) * ( )]y kˆ =E y k[( ( )+v k( )+n k( )) ( )]⋅y kˆ 3 1 3 ˆ [( (T ) ( ) ( )) T ( )] E h a x a x v k n k h x k = + + + ⋅

(45)

Then, we focus on ( )P kd [ ( )* ( )] d P E d k d k 2 2 3 3 2 3 [ ( ) ( ) ] 1 [ ( ) ( )] T T a E x k x k h a E x k x k = + ⋅ 2 4 2 2 1 3 2h a a E x k[ ( )] σn σv + + + (2.6.2) Last Pyˆ=E y k y k[ ( ) ( )]ˆ ⋅ ˆ =E h x k[(ˆT ( )) (⋅ h x kˆT ( ))] [ˆT 2( ) ]ˆ ˆ ˆT 2 x E h x k h h hσ = = (2.6.3) From (2.6.1) (2.6.2) and (2.6.3), ρd y,ˆ( )k becomes

,ˆ ˆ , ˆ ( ) ( ) ( ) ( ) d y d y d y P k k P k P k ρ = 3 4 1 2 2 2 4 2 6 2 2 1 3 3 1 ( [ ] ) ( (2 [ ] [ ]) x n v x T a E x a a a E x a E x a h h σ σ σ σ + = + + + + 4 3 2 1 2 2 2 4 6 3 3 2 2 2 2 2 1 1 1 (1 [ ]) 2 [ ] [ ] 1 x n v T x x x a E x a a E x a E x a h h a a σ σ σ σ σ σ + = + + + + We assume a1a3 ,and 2 1 T

h h h= = . Before the far end signal x k( ) is white signal,E x[ ] 34 = , and E x[ ]2 =σx2. 3 2 1 ˆ , 2 2 3 2 2 2 1 1 3 (1 ) 1 6 x d y n v T x x a a a a h h a σ ρ σ σ σ σ + ≈ + + + (2.6.4)

(46)

From (2.6.4), we can find that the ratio 3 1

a

a affects the cross-correlation DTD in

nonlinear loudspeaker. If the 3 1

a

a ratio is larger, ρd y,ˆ( )k is smaller in double talk period.

In Section 2.4, we also derive the theoretic cross correlation in linear loudspeaker. If a =1, and 1 a =0, it means that the loudspeaker has only linear part. 3

ˆ , 2 2 1 1 d y n v T h h ρ σ σ = + + (2.6.5)

We can find that (2.6.5) is the same (2.4.5). This means that we derive the theoretic correlation in nonlinear loudspeaker is accurately. And we can find that the nonlinear loudspeaker will affect the cross-correlation value.

(47)

Chapter 3

The Modified Cross Correlation

Double Talk Detector by Microphone

Energy

In Chapter 2, we have derived the theoretical correlations, and miss probability. But, there is a serious problem that we discussed in section 3.1. When the echo path change happens, the correlation value will be decreased, like in double talk period. The detector detects error between double talk and echo path change. This distinction is important because the adaptive filter coefficients should be continuously updated during the echo path change but not during the double talk period. In Section 3.2, we will propose the modified cross-correlation double talk detector. The modified cross correlation DTD uses the microphone energy to distinguish the echo path change from double talk.

In Section 3.3, we will propose the technique to evaluate DTD in different echo path change. Before the Morgan’s technique [8] only can evaluate the DTD in double talk period. We can combine our technique with Morgan’s technique to evaluate the DTD in double talk and echo path change. In Section 3.4, the variant threshold can improve the DTD performance. Because, the cross correlation DTD is very sensitive. The cross correlation DTD is more robust by the variable threshold.

(48)

3.1

Cross-correlation DTD in echo path change and double

talk

One of problems in double talk detector is that there is difficult to distinguish the echo path change [3] from DT. This distinction is important because the adaptive filter coefficients should be continuously updated during the echo path change but not during the double talk periods. On the contrary, when there is an abrupt change of the echo path change (EPC) in the near-end room, the adaptive filter with fast rate of convergence is required to track the echo path change. It is therefore necessary for a DTD to be able to distinguish between the DT situation and the echo path change in order to obtain appropriate tracking performance of the adaptive filter. For both cases of DT and echo path change, the misadjustment of the adaptive filter, and thus the error signal, is drastically increased. Thus, the error signal cannot be used as a DTD alone since it cannot distinguish between these two events.

For an example, in Fig 3.1, the cross correlation ρdyˆ( )k is decreasing in double

talk period. However, ρdyˆ( )k is also decreasing when echo path change is present. From

Fig 3.1, double talk is 1k from 1.5k, and echo path change occurs in 2.2k. So, the cross correlation double talk detector can not distinguish between DT and EPC. The variation of

ˆ( )

dy k

ρ has four cases. Therefore, the conventional cross correlation DTD can not

(49)

0 0.5 1 1.5 2 2.5 3 3.5 x 104 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration Cross Correlation

DT

EPC

Fig 3.1 Variation of ρ between DT and EPC d y

In next section, we propose the modified cross-correlation double talk detector. The modified cross-correlation detector can distinguish the four cases. Whatever the level of double talk or echo path change is present, the modified DTD can detect correct.

(50)

3.2

The modify cross-correlation double talk detector

In Section 3.1, we discussed that the cross-correlation DTD is hard to differentiate between double talk and echo path change. In this section, we propose the modified cross -correlation DTD. The conventional cross correlation DTD although can detector one situation. But in general case, the conventional cross correlation DTD detect error between DT and EPC.

In Table 3.1, we will consider four cases of the cross-correlation, depending on the near-end signal energy and the degree of the echo path change. The small EPC means that the new change channel is close to the origin channel. The large DT means that the near end speech energy is large.

From Table 3.1, we can find four typical cases in double talk and echo path change. But the conventional cross correlation DTD works in only one case. To make the detector robust, we extend the cross-correlation DTD by incorporating microphone energy. Fig 3.2 is the structure of the modified cross-correlation DTD.

(51)

DT EPC

Small

Large

Small

Large

Table 3.1 The cross correlationρdyˆ( )k in four cases

0 0.5 1 1.5 2 2.5 3 3.5 x 104 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 3.5 x 104 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration Cross-Correlation 0 0.5 1 1.5 2 2.5 3 3.5 x 104 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cross-correaltion 0 0.5 1 1.5 2 2.5 3 3.5 x 104 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cross-correlation iteration

(52)

ˆ

h

ˆ( )

y k

Fig 3.2 Structure of the modified cross-correlation DTD by including the

microphone energy

The modified cross-correlation DTD includes the microphone energy detector. The energy detector can detect the double talk, and help cross-correlation DTD make correct decision. The energy detector is like Geigel double talk detector. With the energy detector, the modified cross-correlation DTD can detect correctly the four typical cases. Using the microphone energy, it has the drawback when the near end speech energy is small. The modified cross-correlation DTD is hard to detect double talk in near end speech small energy. The modified DTD drawback is the same with the Geigel DTD. However, the Geigel DTD is difficult to detect echo path change.

(53)

( )

d k

( )

x k

ˆ( ) y k ˆ( ) dy k ρ T > < T > <

Fig 3.3 Flow chart of the modified cross correlation DTD algorithm by microphone energy

(54)

First, we use the correlation ρdyˆ( )k in (2.1.4) to decide single talk or double talk

(or echo path change). This means that double talk or echo path change is present when correlation is smaller than some threshold. Second, we use the microphone energy to detect double talk. The microphone energy detector algorithm in (2.3.2) is written for simplicity.

( ) ( ) ( ) d x P k P k P k α =

The microphone energy detector actually is a Geigel DTD with smoothed microphone and far end signal energy. If P kα( )is larger than the threshold, we can decide double talk. Once doubletalk is declared, the detection is held for a minimum period of time. If P kα( ) is smaller than the threshold. We can decide echo path change.

Now, we use two detectors that we can detect all situations. With two detectors, we can are more confident to decide double talk or echo path change. The cross-correlation DTD can detect double talk in near end speech small energy. But, the Geigel DTD can not. We make the cross-correlation DTD to be more robust. In Chapter 4, simulations of all cases will be performed to verify the effectiveness of the modified cross-correlation DTD.

數據

Fig 2.1      Structure of double talk detector
Table 2.1 DTD evaluation procedure in case of double talk
Fig 2.3 Speech activity detector
Fig 2.5 Evaluation procedure of DTD
+7

參考文獻

相關文件

Take a time step on current grid to update cell averages of volume fractions at next time step (b) Interface reconstruction. Find new interface location based on volume

Take a time step on current grid to update cell averages of volume fractions at next time step (b) Interface reconstruction.. Find new interface location based on volume

Step 3 Determine the number of bonding groups and the number of lone pairs around the central atom.. These should sum to your result from

List up all different types of high-sym k (points, lines, planes) 2...

All steps, except Step 3 below for computing the residual vector r (k) , of Iterative Refinement are performed in the t-digit arithmetic... of precision t.. OUTPUT approx. exceeded’

For R-K methods, the relationship between the number of (function) evaluations per step and the order of LTE is shown in the following

Cultivating a caring culture and nurturing humanistic qualities Building an ever-Learning School.. Targets Leading Key

conglomerates and religious bodies have to consult these high-level stipulations when they settle on their own constitutions. Worldly law developed in this way step by step. The