Outlines of Thesis - 多通道語音強化使用相對轉移函數建構之零波束形成

The thesis can be divided into two parts: The adaptive filter with nullformer and adaptive nullforming algorithm. The topics of each chapter are described as follows.

Chapter 2: The problems are formulated in this chapter. Then the reference signal based adaptive filter (RSAB) would be reviewed, including the architecture and mathematical descriptions

Chapter 3: The linear constrained minimum variance (LCMV) problem would be described. The Frost algorithm would solve the problem. Finally the generalized sidelobe canceler (GSC) using relative transfer function would be derived based on Frost algorithm

Chapter 4: Introducing differential microphone and finding null space of interfering signal using singular value decomposition (SVD). Then Appling fixed nullformer to RSAB and GSC

Chapter 5: The variable nullforming algorithm using order recursive least square estimation (ORLS) and subspace distance. Then a voice activity detection method using the algorithm was proposed. Finally, the variable nullforming is applied to GSC to .

Chapter 6: Experiment results shows the performance of RSAB, GSC, RSAB with nullforming and GSC with nullforming

Chapter 7: Conclusion and future study

Chapter 2 Reference Signal Based Adaptive Beamforming

2.1 Introduction

The time domain reference signal based adaptive beamforming (RSAB) was introduced by Dahl et al. [2]. The work in [11] proposed frequency domain RSAB, which optimize the performance at each frequency bin. From RSAB, filter weighting adjustment has two purposes: one is to minimize the interfering sources and noises another is to equalize the channel effect. The architecture of RSAB is discussed in the following section.

2.2 Problem Formulation

Consider an array with M sensors in a noisy reverberant environment receiving one nonstationary desired source and some stationary interfering signals. The received signal in time domain would be

( ) ^D( ) ^D( ) ( ); 1,...,

m m m

x n  a n  s n  n n m  M

(2-1)

where each symbol represents:

 convolution operation

m( )

x n signal received by mth sensor

D( )

a n

m the transfer function (TF) between desired source and mth microphone

D( )

s n desired source

m( )

n n the noise received by mth sensor.

The received signal is analyzed frame by frame in frequency domain so the short time Fourier transform (STFT) can be approximately written as

( , ) ^D( , ) ^D( , ) ( , ); 1,...,

For the case with two or more interfering sources, the TFs of desired source and interfering sources are independent. Therefore the received signal in frequency domain with one desired source and N interfering sources from different directions can be

is the vector form of TFs between interfering sources and microphone array and ( , )

s ki  is the ith interfering source.

2.3 Reference Signal Based Adaptive Filter

RSAB requires prior information before executing the beamformer. The prior information is pre-recorded signals received by microphone array-s k₁( , ),..., s_M( , )k  and the reference signal-r k( , )



. A set of pre-recorded speech signals are collected by placing a source on the desired position and letting the source emit for a short while under quiet environment. The pre-recorded signals provide a priori information between desired source and the microphone array. The reference signal could be the original source or original source received by another microphone in good quality.

After collecting the pre-recorded signal and reference signal, the procedure of the RSAB is divided into two phases- training phase and filtering phase. Figure 2-1 shows the overall system architecture.

Figure 2-1 Reference signal based adaptive beamformer

In the algorithm, the voice activity detection (VAD) is used to detect the activity of desired signal. When VAD shows that desired signal is inactive, the system started training phase using normalized least mean square (NLMS). For the training phase, the error signal at frequency  is written as pre-recorded desire source.

The purpose of RSAB is to minimize the mean square error between received signal and the desired signal. The mean square error is

( , ) ( , ) JLMS 



^ k

  

k .

Then minimize the mean square error

† † The optimal solution would be obtained by taking the derivative to previous equation to find a local minimum. But the optimal solution is not practical for implementation.

Therefore, the adaptive solution is introduced. For adaptive solution, the weighting

( , )k



w is updated in the steepest direction thus

( 1, ) ( , ) J^LMS k



 k

 

 ^^ ^

w w

w _. ^(2-7)

From (2-6) and (2-7) , using NLMS algorithm to achieve a stable solution in each frequency. Therefore, the filter weighting update procedure is

†

When VAD detected that the received sound signal contains desired speech signal, the system switched to the filtering phase. The system starts to filter the received signal with w trained in training phase so

( , ) †( ) ( , ) y k  w  x k  .

Where y k( , ) denotes output signal, and x( , )k  denotes received signals in filtering phase. The flow of the procedure is described in Figure 2-2.

End ?

Figure 2-2 Flow of the reference signal based domain adaptive beamformer

Chapter 3 Linear Constrained Minimum Variance Beamforming

3.1 Introduction

Frost [7] proposed a method to minimize the target signal power under constraint.

Griffiths and Jim [8] reconsidered the Frost’s algorithm and obtained generalized sidelobe canceler (GSC). GSC is widely used to cope with interference signal. Gannot et al. [4] applied relative transfer function (RTF) to GSC to enhance the performance when there’s a nonstationary desired source in a reverberant room. In this chapter the Frost algorithm is introduced and then RTF GSC.

3.2 Frequency Domain Frost Algorithm

3.2.1 Optimal Solution

Starting from the same problem formulated in section 2.2. The purpose is to find a set of weighting that filter the received signal and obtain the original desired source.

The filter weighting in vector form is



1 2



( , )

k   w k

( , )

 w k

( , )

 w k

_M( , )



w

The set of filter weighting can be used to filter the received signal so the output would be

12 would be zero. Therefore, a constraint is set as

( , ) †( , ) ( ) ( , ) constrained minimum variance (LCMV) problem can be formulated as:



^†



^†

min ( , )k



_xx( , ) ( , ) subject to k



( , ) ( )k

 

 f^( , )k



w w w w a (2-4)

Using complex Lagrange multipliers to solve the problem

†

3.2.2 Adaptive Solution

The constrained form of the optimal solution is impractical in the real world. It’s difficult to find the room impulse response by using system identification method. This constrained form can’t tract changes in the environment [4]. So by Frost [7], the

adaptive form was introduced, which would be more useful in practical environment.

Consider the steepest descent adaptive algorithm:

 

Solving the Lagrange multiplier yields

(k1, )  ( ) ( , ) k    ( )_xx( , ) ( , )k  k   ( )

3.3 Generalized Sidelobe Canceler

From the frost algorithm, the filter weighting could be separated into two parts;

comparing the filter weighting with adaptive Frost algorithm, let

FBF 2

The architecture of GSC could be separated into three parts, fixed beamformer (FBF), blocking matrix (BM) and adaptive noise canceler (ANC). The purpose of FBF is to obtain the signal that contains the desired source and a stationary noise. The BM blocks the desired source to extract the stationary noise. Then the ANC uses multichannel wiener filter to estimate the noise of FBF and cancel the noise. Figure 3-1 shows the entire architecture of GSC. Detail discussion on each element in the architecture is given as follows.

15 Fixed

Beamforber

Blocking Matrix Adaptive Noise Canceler

Figure 3-1 Generalized sidelobe canceler

1. Fixed Beamformer (FBF):

From (2-15), the output of FBF is

Because f^( ) is just a simple delay, the output of FBF contains undistorted desired source and noise. This is an optimal solution where the desired source is just a simple delay from output of FBF. The issue of the optimal solution is that the actual TFs are difficult to find so Gannot et al. applied relative transfer function (RTF) to GSC to approach the suboptimal solution [4]. The RTF is easy to obtained by system

identification method proposed by [12]. RTF is the ratio of RIR between two microphones. Let the first microphone be the reference microphone then the RTF is

 

Take the vector form

   

function of desired source to the first microphone.

2. Blocking Matrix(BM):

The BM using RTFs could properly block the desired signal. Therefore, the columns of BM are the bases of desired signal null space. Therefore, considering the following matrix

Therefore, output of BM would be the noise only signal.

From the criterion of GSC, the output of blocking matrix should be independent of k because the noise is assumed stationary. But in practical, the BM cannot block the entire desired signal. Thus the output of BM would be changed under nonstationary source. Thus the vector form of output from BM is

 

3. Adaptive noise canceler (ANC):

The output of ANC would be noise only signal because P^†( ) is the null space

Recalling (2-4), the purpose is to minimize the output power so minimize



^FBF^{( , )} ^†( , ) ( , ) ²



E y k



g k



u k



Then the multichannel Wiener filter would be

( ,k ) _uu^1( ,k )_u_y( ,k ) NLMS algorithm to recursively update the weighting of ANC [4], thus the weighting of ANC would be normalize the power and make the recursion more stable.

Chapter 4 Nullforming

4.1 Introduction

In this chapter, several methods are introduced to achieve nullforming and the associated algorithms are explained for the adaptive filters to enhance the desired source.

4.2 Differential Microphone

Delay and sum beamformer is commonly used under both the far field and free field assumptions. The method enhances the signal from desired direction

-sin _T

j d

e v

  

  

( , )

y k  ^θ

( , ) s k 

1( , ) x k 

2( , ) x k 

Figure 4-1 Differential microphone

Elko et al. [13] proposed differential microphone to reduce the signal from target direction. Figure 4-1 shows the architecture of differential microphone. The method makes a nullforming using two microphones subtraction with delay compensation. The signal is assumed a far field plain wave and the microphones are perfectly matched thus the output of differential microphone pairs can be written as,

20 microphones. Rearrange the formula, the magnitude of output would be

  equation (3-2). The beam patterns are plotted under different distance of microphones by letting the magnitude of source be 1, the speed of sound be 343 m/s and the target direction be .

The differential microphone works like high pass filter so differential microphone would enhance the noises in high frequencies. Different distance of microphones would affect the ability to deal with different frequency band. For the short distance differential microphone, lower band of frequencies would be eliminated from almost every direction.

Figure 4-2 Beam pattern of differential microphone with d=0.12 m (left) and d=0.24 m (right)

4.3 Nullforming Using Null Space of Interfering Signal

Previous section shows a nullformer for one interfering source. There may be two or more interfering sources in practical environment. The thesis uses singular value decomposition (SVD) to find the null space of the entire interfering signal. Assume there are N interfering sources in the environment and we have the RTFs of them as described by (2-17) in chapter 3. The RTFs of interfering sources are

and take the complex conjugate of these RTFs in matrix form

1 2

22 zero singular values

( ) 0, ( ) 0 1, ,

This null space is a fixed nullformer where



1 2



( ) ( ) ( ) ( )



 N_



N_



U u u u (3-4)

is an M input and N output filter.

In the following section, the fixed nullformer would be applied to RSAB and GSC.

4.4 Reference Signal Based Adaptive Filter with Fixed Nullforming

The nullformer could be used to block the interfering signals thus applying the nullformer to RSAB would eliminate the residual noise and reconstruct the desired source. Figure 4-3 shows the architecture of adaptive filter with nullformer. The effect of adaptive filter with nullformer could be considered as the convolution of room impulse response and impulse response of nullformer system.

Room impulse response Nullformer Adaptive filter

 

Figure 4-3 System of RSAB with fixed nullformer

For the case with multiple interfering sources described in (2-3).Let the multiplication of RIR and nullformer be a new room impulse

( ) † ( ) ( )



 FN

 

R U A (3-5)

and the new input of adaptive filter would be

 

The nullformr would cause a great distortion for it’s a high pass filter. Therefore, the reference signal of RSAB would be used to reconstruct the desired signal. In the

pre-recording procedure showed in Figure 4-4, the pre-recorded

signals-1( , ),..., _{M N}( , )

Figure 4-4 Pre-recording procedure of RSAB

The procedures of training phase and filtering phase are the same as described in section 2.3. The only difference is that there’s a nullformer before the input of adaptive filter. Figure 4-5 shows the architecture of RSAB.

Nullforming RSAB

Figure 4-5 System architecture of RSAB

4.5 Generalized Sidelobe Canceler with Fixed Nullforming

Ordinary GSC does not work for the condition with nonstationary interfering signal in the environment. The existence of nonstationary signal does not satisfy the

criterion of GSC. The work in [10] proposed a dual-source transfer function GSC (DTF-GSC) method to eliminate a directional nonstationary source by modify the FBF and BM. DTF-GSC could block one nonstationary source. But when there are two or more interfering sources, DTF-GSC is not effective in blocking all these sources.

There are some features when applying the nullformer to GSC. Figure 4-2 shows that the nullformer is a high pass filter. The high pass feature would cause the received signal a great distortion. Therefore, the fixed beamformer and blocking matrix must be modified to satisfy the architecture of GSC with nullformer.

From (3-5), the effect of nullforming is the multiplication of RIR and impulse response of nullformer. Multiply the nullformer weighting from (3-8) with desired signal RTF. Then the new RTF is

) † ( ) ( )

is the desired signal RTF and

1 2

( ) ( ) ( ) ( )

M N

Null Null Null Null T

h h h

 _{ }^   _  ^_

is the new RTF, which is the null space of interfering signals, from (3-4)

† † † †

Apply SVD to the new obtained RTF

( ) ( ) ( ) ( )

Therefore, the new FBF would be obtained ( ) _F ( ) ( )

NFBF   N  n 

w U w (3-15)

By (3-14) (3-15), The output of FBF would be

The FBF would block the interfering signal and make a beam to the desired signal. Let



2 2 1



( ) ( ) ( ) ( )





 

M N_{ }



  

. The blocking matrix would be

( ) ( ) ( )

Recalling section 3.3, the ANC is the same one as described in (2-24). The architecture of GSC with nullforming is showed in Figure 4-6.

Figure 4-6 Generalized sidelobe canceler with nullforming

Chapter 5 Variable Nullforming Adaptive Filter

5.1 Introduction

In previous chapters, several methods to approach nullforming are introduced.

These nullforming methods are fixed so they are not able to track the interfering sources. For example, the weighting of nullformer was previously set to one desired direction so the nullformer works well when interfering sources emit in the exact direction. When there are new interfering sources from other direction or the original interfering source change the direction, these kinds of fixed nullformer are unable to block the interfering sources.

In this chapter, a novel method to construct a variable nullformer is proposed. The nullforming algorithm could trace the change of sources. Then the algorithm applies the variable nullforming to generalized sidelobe canceler to obtain the reconstructed desired signal.

5.2 Variable Nullforming

5.2.1 Estimate Signal Subspace Using Order Recursive Least Square

Starting from the estimation of RTF vector, the estimation of RTF vector is from the output of blocking matrix [4]. Blocking matrix contains the bases spanning the null space of RTF vector. When there’s only one sound source in action, rearranging the term (2-20) in chapter 3, the signal received from microphones are

 

₁

( , ) 1_m , ( , ) ( , )

m m

x k  h k  x k  u k  . (4-1)

Assume that the statistics of desired signal is stationary in each frame and the RIR changes slowly in a short period. Considering the cross PSD of kth frame

        independent of the frame index k. [12] proposed a system identification method with nonstationary signal by applying least square (LS) estimation to the following equation

 

one source in the environment, the method could not describe the RTF properly. The work in [10] proposed a method to estimate the blocking matrix when there are two sources emitting simultaneously in the environment. When number of sources increases, the number of reference microphones increases. Increasing number of reference microphones would increase the number of nullforming directions.

Considering N sources emitting simultaneously, the linear equation would become

 

An excessive number of reference microphones are not needed when there are not many sources in action. It’s difficult to know the number of active sources in each time so the signal subspace is estimated by order recursive least square estimation (ORLS).

The method works by increasing the number of reference microphones when number of emitting sources grows. ORLS was originally used in line-fitting by increasing the order [15]. Rewrite (4-4) to the linear equation



 







to the (k-1)th microphone and



^m_k is the mth entry of signal subspace under kth order.

 

In each order, estimate the parameters by using LS estimation then the estimator would be

To simplify the representation, the following derivation excludes the representation of parameters-

  ^k

^

. The previous equation can be reformulated to recursive form by

M M m . To enhance the efficiency, the inverse operation is formulated to

† † † square error of mth microphone under ith order is

  

^†



, ˆ ˆ , ,

m i m m

LS m i i m i i

J  m M m M mi M

the least square error could be updated by using the recursive form



^†¹ ¹



Figure 5-1 shows that the least square error shrinks gradually as the order of recursion increases. The recursion should be stopped when the error is less than a threshold.

Stop ORLS when J_LS^{m i}^,  J_LS^threshold

Because the scale of least square error would depend on the present data, a robust stopping condition is set the threshold be the scale of the error from the first recursion.

Stop ORLS when J_LS^{m i}^, J_LS^M^,1

Figure 5-1 Effect of choosing time of iteration on least square error

The incoming data from small-size microphone array are highly correlated thus

† i i

M M

would become singular. The singularity would cause the estimation result being unstable. Tikhonov regularization [16], which is also called ridge regression is used to avoid the ill-condition. Tikhonov regularization reduces the singularity by imposing a penalty term, generally the penalty term is



I. This method may result in a biased estimation but stabilize the estimator. The nonnegative complexity parameter



controls the amount of bias. Therefore, by the definition of Tikhonov regularization, the new form of linear equation in (4-6) becomes



^†



¹ ^†

ˆ^m

i  M Mi i I ^ M mi m

 . (4-10)

Applying Tikhonov regularization to ORLS, the result is ( 1) †

Assume the order of the estimation is N+1, the estimated RTFs in mth microphone would be

Take the matrix form

     

length of bases being the same as the number of microphones. Therefore,

 

are the estimated subspaces of present acting sources.

After the estimation, the number of estimated parameter shows the number of sources to describe the subspace under each frequency. The estimated bases are linear combination of subspaces from each source.

5.2.2 Estimate Subspace of Interfering Sources from Signal Subspace

The goal is to find out the null space of interfering sources. Therefore, after estimating the signal subspace from ORLS, the desired signal subspace must be excluded from the estimated subspace. Assume the subspaces of sources from each direction are obtained, the similarity between estimated subspace and each previously known subspace can be estimated by evaluating subspace distance by using the definition in [17]. The definition of subspace distance is

   

^† between two subspaces. The subspace distance measurement works like doing inner product. The work in [17] proves that the measurement has the following properties.

1. Nonnegativity:

 U V

 

^{0, and}

 U V

 

0 if and only if

U  V

Figure 5-2 Subspace distance of desired signal (left) and interfering signal (right)

There are P previously estimated subspaces so P values of subspace distance would be obtained. Then by sorting these values and choose the least N RTF vectors

   

Rearrange the index of these N vectors to 1,…,N



1 2 N





B b b b (4-14)

where the columns of

B

are the most probably bases of present sources. These bases are linear combination of N RTF vectors. Rearranging the index j₁, , j to _N 1, , N, each estimated basis is

where _cⁱ_j is the coefficient of each RTF vector. Assume the RTF of desired source is the dth vector in

B

. Then eliminating the component of desired signal yields

 

where I is a P by P identity matrix and

The bases may contain dependent basis thus the orthogonal triangular decomposition (QRD) is used to exclude the dependent basis. So



OE QR

_(4-17)

where Q is a unitary matrix,

R

is a upper triangular matrix and

E

is a permutation matrix. The permutation matrix is used to make the diagonal terms of upper triangular matrix decreasing. The dependent bases correspond to the zero diagonal terms. Therefore, let

 ˆ

Where D is the number of independent bases and O contains the independent bases of interfering signal subspace.

․․․ h₁

․ Interfering signal

subspace

Figure 5-3 Relation of desired signal subspace and interfering signal subspace

For the accurate estimation of signal subspace, order recursive least square estimation needs a number of data. The alogorithm repeats the procedures to update the nullformer. When the nonstationary signal changes rapidly, the update frequency would be raised. Figure 5-4 shows update procedure of proposed algorithm. Assume the algorithm uses K frames to estimate the procedure of collecting data. The uptade

在文檔中多通道語音強化使用相對轉移函數建構之零波束形成 (頁 13-0)