Introduction - 適應性波束形成器於寬頻語音純化使用具二階約束之卡曼濾波器

1.1 Motivation and Objective

Speech enhancement in a noisy environment is an important research issue for speech signal processing. It will cause a great impact on both respects of voice recognition and communication. Although the hearing of human beings is able to recognize desired speeches even under noisy environment, it is still regarded as a difficult task for computers or machines.

The common sensor for receiving sound waves is the microphone. Single microphone can collect spectral information but not the spatial information. The advantage of microphone arrays is applied to catch not only spectral information but also spatial information among the sound waves. Adaptive spatial filter, which is called beamformer, is one of the most effective methods and are extensively studied for hands-free speech communication or recognition among several existing microphone-array-based speech enhancement algorithms in recent years.

The background noise and reverberation from undirected diffused noises or directed interferences are the most dominant reasons for the degradation of signal quality. The noises and reverberation level will determine the distortion level of the desired signal. Although the methods of multichannel speech enhancement are used to reduce the effect of noise and reverberation, they do not perform well in real practice when the pre-assumption of adaptive spectral filter violates the environment conditions.

This provides the motivation of this thesis to study and propose innovative methods to handle both interference suppression and desired source mismatch problems, which is useful in a scenario like a real life conference in a meeting room or communication in the living room, where the equality of sound is deteriorated by human beings’ talking noise and reverberation in the space of the room.

1.2 Literature Review

Microphone arrays can be used to achieve the effect of spatial filtering, which is generally called Beamformer (BF) [1]. The beamformers can be categorized in two types, fixed beamformers and adaptive beamformers. Although the implementation costs of fixed beamformers are often lower than the adaptive beamformers, the beamforming effect is not robust enough due to there is no update mechanism in the algorithm.

Fixed beamformers include delay-and-sum beamformer (DAS) [2], constant directivity beamformer (CDB) [3] and fixed superdirective beamformers [4]. The fixed weights are utilized to form a spatial filter according to the pre-known spatial information. The DAS is the simplest structure in beamformer. It compensates to the relative time delay between distinct microphone signals and then sums the steered signals with a fixed weighting in every channel to form a single output. The CDB maintains the spatial response over a wide frequency-band; and the fixed superdirective beamformer keeps desired source distortionless at a pre-defined direction while attempting to suppress the noises from the other directions. These approaches assume the desired source and interferences are at pre-known location in stationary environment. Hence, these algorithms are sensitive to steering mismatches, which degrade the capacity of noise reduction and result to desired source distortion and signal self-cancellation.

Instead of using fixed beamformer, an adaptive beamformer can generate a beam response to the desired source direction and null at undesired signals to suppress the noises and interferences. Many adaptive beamformer techniques were extensively studied in the last three decade. The linearly constrained minimum variance (LCMV) beamformer was proposed in [5] to minimize the array output power under a look direction constraint. A special case similar to LCMV is the minimum variance

distortionless response (MVDR) proposed by Capon in [6]. Another popular technique is the generalized sidelobe canceller (GSC) algorithm which essentially transforms the LCMV constrained minimization problem into an unconstrained one [7].

The formulation of MVDR is implemented with Kalman filter using the state-space observer form. Owing to the undesirable mismatch between the actual desired signal steering vector and the presumed one in single steered constraint, various adaptive beamformers were proposed to improve the performance. The signal mismatches can be induced by signal point error [8], imperfect array calibration [9], or channel effect. In the presence of these effects, an adaptive beamformer suppresses the desired signal instead of maintaining distortionless response. Such phenomenon is commonly referred as signal self-nulling [10]. To strengthen the robustness against steering vector error, various methods are investigated [17], [19]. The Kalman filter can also be substituted by second-order extended Kalman filter [18], [20] and constrained Kalman filter [12], [13] to improve its robustness and reducing non-linearity against mismatch problems.

Among adaptive beamformers which are realized by Kalman filter, the usage of constraint projected method and steering vector bound regulation in wideband concept is a solution to the signal mismatch problem. The relative theory can be found in [17], [18], [21], [22].

1.3 Thesis Scope and Contribution

The contribution of this thesis is to propose and implement an innovative algorithm against signal mismatch problem for speech enhancement. The scope of thesis can be divided to two parts. The first part is to formulate a constrained adaptive beamformer considering the multiple arrays directivity and spatial coherence of spatial filtering. The second part is to handle the beamforming constraints given by the information of voice activity detection to achieve better performance of speech enhancement.

In the first part, the formulation using MVDR structure with signal mismatch problems is given. To obtain the solution, the nonlinear second-order extended Kalman filter is applied to deal with inequality nonlinear constraints as well as constraining the state prediction. In the optimal minimum mean-square error (MMSE) algorithm, the selection of parameters is to avoid suppression of the desired signal component (signal self-nulling) in broadband sense. Each selection has different result in different signal mismatch situation. The principle of selection is investigated and explained.

In the second part, the noise tracking can be utilized as null constraints for further enhancement when the desired source is not present. We incorporate the equality-constraints (Hard-Constraints) into the Kalman filter by projecting the updated state estimate onto the constrained region. The robustness of performance against signal mismatch for directive noise and dereveberation is achieved by choosing appropriate parameters in different conditions (ex: microphone arrays number, mismatch angle). In particular, the information given by the voice activity detector can also be reused to select appropriate parameters in beamforming. The performance of the algorithm is discussed and explained.

1.4 Outline of Thesis

The remainder of this thesis is organized as follows.

Chapter 2: The beamformers of adaptive spatial filtering which are based on the robust beamforming technique Minimum Variance Distortionless Response (MVDR) are introduced. The ideal steered linear inequality constraint is incorporated into the steepest gradient method and state space formulation to implement MVDR. By comparisons, the pros and cons construct the foundation of proposed algorithm.

Chapter 3: The detailed formulation of second-order extended Kalman filter with nonlinear inequality constraints is presented. It includes the solutions to signal mismatch problem and beamformer null constraint for suppression of interferences, given the information of voice activity detection (VAD).

The technique of choosing the appropriate parameter in wideband beamforming and its effect are also discussed. Finally, the overall flowchart and architecture are illustrated and explained.

Chapter 4: The results of simulation and experiment are shown. It contains comparison between adaptive spatial beamformers and the capability of beamforming against signal mismatch problem in Room Impulse Response (RIR) and real room respectively. Some objective indices are calculated to compare the performance between proposed algorithm and other existing algorithms.

Chapter 5: The conclusion of this thesis and some issue that is discussed for future studies in this chapter.

在文檔中適應性波束形成器於寬頻語音純化使用具二階約束之卡曼濾波器 (頁 12-17)