Chapter 1 Introduction
1.1 Overview of Beamformers
In speech communication, if the desired signal and the interfering signals occupy the same frequency band, it is difficult for temporal or spectral filtering methods to
separate the signal from the interferences. However the desired and the interfering signals are usually emitted from different spatial locations. This location difference can be exploited to separate them using a beamformer. A beamformer is an array of microphones which provide spatial information regarding acoustic dynamics of the sources. Typically, a beamformer linearly combines the spatially sampled waveform from each microphone in the same way as the finite impulse response (FIR) filter combines the temporally sampled data. The diagram of a beamformer with M microphones is shown in Fig. 1-1.
In the following, the existing beamformers are explained in two categories: fix beamformers and adaptive beamformers.
)
1(n
x x2(n) xM(n)
X X X
q
1q
2q
M+
∑
== M
m m mx n q output
1
) (
Figure 1-1 Diagram of the beamformer
1.1.1 Fix Beamformers
Fix beamformers includes delay-and-sum beamformer (DSB) [1], constant directivity beamformer (CDB) [2-4] and fixed superdirective beamformers [5-7]. They utilize fixed coefficients to achieve a desired spatial response. The DSB is the simplest structure in fixed beamformer and it first compensates for the relative time delay between distinct microphone signals and then sums the steered signal to form a single output. Jan and Flanagan [8] explicitly modeled the transfer function from source to
sensors to replace the simple delay assumption. Further, they extended the DSB concept by introducing the matched filter array beamformer. CDB is designed such that the spatial response is the same over a wide frequency band while the fixed superdirective beamformer attempts to suppress noise coming from all directions without affecting the desired speech signal from a principal direction. Fix beamformers generally assume the desired sound source, interference signals, and noises are slowly varying and at known locations. Therefore, these algorithms are sensitive to steering errors which limit their noises suppression performance and cause the desired signal distortion or cancellation. Furthermore, these algorithms also have limited performance under highly reverberation environments.
1.1.2 Adaptive Beamformers
Instead of using fixed coefficients to suppress noises and interference signals, an adaptive beamformer [9-14] can adaptively forms its directivity beam-pattern to the desired signal and its null beam-pattern to the undesired signals. In the fixed beamformers, the null beam-pattern exists when the noise’s direction is known and remains unchanged. To cope with environmental changes, various adaptive beamformers were proposed to improve the performance. One of the key issues in adaptive beamformers is the sensitivity due to the mismatch between the actual desired signal steering vector and the presumed one [11], [12]. The mismatch can be induced by signal point errors [13], imperfect array calibration [14], or the channel effect (e.g., near-far problem [15], environment heterogeneity [16] and source local scattering [17]). In the presence of these effects, an adaptive beamformer can easily mix up the desired signal and interference components; that is, it suppress the desired signal instead of maintaining distortionless response. This phenomenon is commonly
referred to as signal self-nulling [18]. As a result, much effort has been devoted to the robustness issues [11].
Modifications to adaptive beamformer techniques for robustness were extensively studied. The linearly constrained minimum variance (LCMV) beamformer was proposed in [9] to minimize the array output power under a look-direction constraint.
Another popular technique is the generalized sidelobe canceler (GSC) algorithm which essentially transforms the LCMV constrained minimization problem into an unconstrained one [10]. In the last decade, several techniques addressing this problem of the mismatch of the steering vector in the LCMV or GSC structure were developed [19]-[23]. For example, Hoshuyama et al [20] proposed two robust constraints on blocking matrix design. Spriet et al [22] proposed a robust adaptive beamformer called the spatially pre-processed speech distortion weighted multichannel Wiener filter which takes speech distortion into account in its optimization criterion and encompasses the standard GSC as a special case. Further, some ad hoc approaches were discussed to overcome the arbitrary desired signal mismatches, such as the diagonal loading of the sample covariance matrix [24], [25] and the eigenspace-based beamformer [26], [27].
1.1.3 Explicit Transfer Function Modeling for Adaptive beamformers
The other method to mitigate the problem of signal steering vector mismatch for adaptive beamformer is to abandon the delay-only propagation assumption and explicitly model the sound signal propagation from the source to the microphones by an arbitrary transfer function (TF) [28]. Affes and Grenier presented GSC-based near-field beamformer [29] using matched filters with signal subspace tracking. The matched filters which can be identified by the proposed signal subspace tracking
algorithm under the assumption of the FIR model and small displacements of the talker is used to design the fixed beamformer (FB) of the GSC.
Rather than estimating the TF, Gannot et al. [30] proposed the transfer function ratio (TFR) concept and applied to the GSC algorithm. The TFR can be estimated by exploiting the nonstationary characteristics of the desired signal. The suboptimal speech enhancement algorithm that can be implemented by using TFR to design the FB and blocking matrix (BM) of GSC is proposed. Several adaptive beamformer algorithms based on the GSC structure using TF ratio information have been proposed [31]-[33]. Dahl et al. [34] proposed a reference signal based adaptive beamformer which can suppress the nonstationary and stationary noise as well as recover the reverberation at the same time. This method uses FIR based normalized least-mean-square (NLMS) filtering approach to perform noise suppression and speech dereverberation by using pre-recorded speech signals and the desired signal acquired when the environment is quiet. Improvements on the finite number of taps in the FIR filters and relaxation on the disturbance assumption were studied [35]. Huang et al. [36], [37] treated a microphone array as a multiple-input multiple-output (MIMO) system and proposed a two-stage procedure for separation and dereverberation of speech signals. The interference signals can be removed by using two microphones with known TFs and the separated reverberant speech can be dereverberated by using the multiple-input/output inverse theorem. However, the stationary noise is neglected in this work and the transfer function of each speech source should be identified in advance during each single-talk interval which also limits its applications in practice.
1.1.4 Uncertainty of the Steering Vector for Adaptive beamformers
Most of the early methods of making the adaptive beamformers more robust to the steering vector errors are rather as hoc in that the choice of their parameters or the structural modifications is not directly related to the uncertainty of the steering vector [11]. Recently, Vorobyov et al proposed a new approach to robust adaptive beamforming in the presence of an arbitrary unknown steering vector mismatch [38].
This approach is based on the optimization of worst-case performance. They also showed that the robust minimum variance distortionless response (MVDR) beamformer using worst-case performance optimization can be formulated as a second-order cone program and solved in polynomial time via the interior point method. In further works, [40]-[44], several extensions of the robust MVDR beamformer of [38] have been considered.