Recursive EM and SAGE-inspired algorithms with application to DOA estimation

(1)

Recursive EM and SAGE-Inspired Algorithms

With Application to DOA Estimation

Pei-Jung Chung and Johann F. Böhme, Fellow, IEEE

Abstract—This paper is concerned with recursive estimation

using augmented data. We study two recursive procedures closely linked with the well-known expectation and maximization (EM) and space alternating generalized EM (SAGE) algorithms. Un-like iterative methods, the recursive EM and SAGE-inspired algorithms give a quick update on estimates given new data. Under mild conditions, estimates generated by these procedures are strongly consistent and asymptotically normally distributed. These mathematical properties are valid for a broad class of problems. When applied to direction of arrival (DOA) estimation, the recursive EM and SAGE-inspired algorithms lead to a very simple and fast implementation of the maximum-likelihood (ML) method. The most complicated computation in each recursion is inversion of the augmented information matrix. Through data augmentation, this matrix is diagonal and easy to invert. More importantly, there is no search in such recursive procedures. Consequently, the computational time is much less than that asso-ciated with existing numerical methods for finding ML estimates. This feature greatly increases the potential of the ML approach in real-time processing. Numerical experiments show that both algorithms provide good results with low computational cost.

Index Terms—Array processing, DOA estimation, EM

algo-rithm, recursive EM, recursive estimation, recursive SAGE, SAGE algorithm, stochastic approximation.

I. INTRODUCTION

T

HE CENTRAL interest of this work is recursive param-eter estimation using augmented data. The expectation and maximization (EM) [9] and space alternating generalized EM (SAGE) [12] algorithms are well-known iterative methods to lo-cate modes of a likelihood function. If very large data sets are involved, numerical procedures can become very expensive. To overcome this problem, we propose two alternative procedures derived from the EM and SAGE algorithms in which the data are run through sequentially.

The first part of our paper is devoted to mathematical prop-erties of the recursive EM and SAGE-inspired algorithms. The recursive EM and SAGE-inspired algorithms are stochastic ap-proximation procedures with a specialized gain matrix derived

Manuscript received February 19, 2003; revised July 13, 2004. This work was supported in part by the National Science Council (NSC), Taiwan, R.O.C., under Contact NSC-94-2213-E-009-050. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Constantinos B. Papadias.

P.-J. Chung is with the Department of Electronics Engineering, Na-tional Chiao Tung University, 300 Hsin Chu, Taiwan, R.O.C. (e-mail: peijung_chung@yahoo.com).

J. F. Böhme is with the Department of Electrical Engineering and Informa-tion Sciences, Ruhr-Universität Bochum, 44780 Bochum, Germany (e-mail: boehme@sth.ruhr-uni-bochum.de).

Digital Object Identifier 10.1109/TSP.2005.850339

from the augmented data. Choosing an appropriate gain matrix is particularly important if a good convergence rate is desired [2]. The algorithms presented in this paper provide a conve-nient way to design the gain matrix. In the pioneering paper [25], Titterington suggested the recursive EM algorithm and proved weak consistency and asymptotic normality for the univariate version. The analysis therein is based on classical methods of stochastic approximation [10], [22]. The results of [25] are ex-tended to the multivariate case in [5]. Here, we consider a more general procedure in which constraint sets are allowed. We show that under proper conditions, the recursive EM algorithm leads to strong consistency and asymptotic normality. Rather than following the approach in [25], our investigation is based on the ordinary differential equation (ODE) method developed by Kushner and Clark [16] and Ljung [18], which characterizes the limit behavior of the algorithm by a mean limit ODE. The mo-tivation for choosing the ODE method is that it simplifies the treatment of classical cases and provides a general approach to deal with complicated noise and dynamics.

In addition to the convergence properties of recursive EM, we also present a recursive procedure inspired by the SAGE al-gorithm. The recursive SAGE-inspired algorithm is derived for the specific case in which the parameter subsets in each cycle are disjoint. We use the term “SAGE-inspired” to emphasize the fact that rather than updating parameter subsets sequentially, the suggested recursion updates all elements of the parameter si-multaneously. Under similar conditions, the recursive SAGE-in-spired algorithm enjoys strong consistency and asymptotic nor-mality as well. Although recursive EM and SAGE-inspired al-gorithms do not have optimal convergence rates, they are com-putationally preferable than the “optimal” stochastic approxi-mation procedure. The augmented inforapproxi-mation matrix required by the recursive EM or SAGE-inspired algorithm is generally much easier to compute and invert than the Fisher information matrix required by the “optimal” recursive procedure.

The asymptotic results given in Section IV are applicable to a broad class of problems. In the second part, the recursive EM and SAGE-inspired algorithms are applied to array processing problems. Using recursive algorithms, there is no need to wait for a long time to collect the whole data set. Upon the arrival of new data, such procedures give a quick update of the esti-mate. This supports the use of the maximum-likelihood (ML) approach in real-time processing.

Based on recursive EM, algorithms for recursive direction of arrival (DOA) estimation were proposed in [13] and [19]. The methods in [13] are aimed at tracking multitargets. The signal waveforms were assumed to be known in the derivation. 1053-587X/$20.00 © 2005 IEEE

(2)

In [19], both stochastic and conditional signal models are con-sidered. Major differences between our algorithms and the pre-vious work [19] include the following points.

1) The spectral parameters are updated by observed data rather than by augmented data to obtain better conver-gence rates and more stability.

2) The step size used in [19] is limited to .

In the current paper, we have a more flexible choice . In most cases, a good choice of is essential for obtaining satis-factory convergence rates. With proper modifications, the pro-posed algorithms can also be used in tracking moving sources. Unlike subspace tracking methods [28], the recursive proce-dures based on the ML approach are not only applicable in the narrow band case but also in the wideband case. The first exper-imental and numerical results of recursive EM and SAGE-in-spired algorithms are published in [7] and [8].

In the following, the recursive EM and SAGE-inspired algo-rithms are formulated in Sections II and III, respectively. Sec-tion IV deals with asymptotic properties of the proposed rithms. Based on the recursive EM and SAGE-inspired algo-rithms, we develop recursive procedures for DOA estimation and discuss their convergence in Section V. Finally, numerical results are presented in Section VI.

II. RECURSIVEEM ALGORITHM

Suppose are independent observations, each with underlying density function . The augmented data spec-ified by the EM algorithm is independent with the density function . The -dimensional vector

represents the unknown parameter. According to [9], the aug-mented data is specified so that is a many-to-one mapping. Let denote the estimate after observations. The following procedure

(1) is aimed at finding the extremum of that would coincide with the ML estimator. The constants ,

and

(2) (3) represent the gradient vector and the augmented information matrix calculated at , respectively. is a column gradient operator with respect to . Recursion (1) is called a recursive EM algorithm because it is closely related to the EM algorithm. In practice, the user will not allow the iterate to go beyond an upper or a lower bound. The iterate is confined to a bounded set

. Formally, the constrained algorithm can be expressed as (4) where is the projection onto the constraint set . A simple

example for is a hyperrectangle with

, .

As pointed out by Titterington [25], recursion (1) has a strong link to the EM algorithm. By proper formulation of the EM

al-gorithm, it can be shown that approximately, the augmented data log-likelihood is maximized by iterates generated by recursive EM.

III. RECURSIVESAGE-INSPIREDALGORITHM

The SAGE algorithm [12] generalizes the idea of data aug-mentation to simplify computations of the EM algorithm. It pre-serves the stability of EM and can improve the convergence rate significantly in some settings. Instead of estimating all param-eters at once, each iteration of SAGE consists of cycles. The parameter subset associated with the th cycle is updated by maximizing the conditional expectation of the log-likelihood of the augmented data corresponding to this cycle. Like the EM algorithm, if the data sets are large, the required computational time may become long. Therefore, it could be advantageous to develop a recursive procedure finding the estimates generated by the SAGE algorithm.

For simplicity, we will only consider the case in which the parameter subsets are disjoint. Let be the augmented data of the th cycle cor-responding to the observations . The characterizing density function is denoted by , where con-tains all components of , except those of . At the th cycle, is kept at a fixed value, and the conditional expectation of the augmented data log-likelihood is maximized with respect to . To find the maximizing point of , we suggest the following recursive procedure:

(5)

where the constants , , and is a

block diagonal matrix

..

. . ..

(6) with the th block

(7) If the constraint is considered, recursion (5) can be written as (8) It was mentioned in the previous section that there is a strong relationship between the EM algorithm and the recursive EM algorithm. Recursions (5) and (8), on the other hand, are sim-ilarly linked to the SAGE algorithm. Based on the whole data set , the SAGE algorithm updates the parameter subsets sequentially in each iteration. Rather than giving a partial update on , the recursive procedure (5) [or (8)] generates a simultaneous update on all elements of upon ar-rival of new data . The possibility of partial update on and its influence on convergence properties are still under investiga-tion.

(3)

The following derivation is based on Taylor expansion at the maximum point of each cycle. In the constrained case, we as-sume that the maximum point lies in the interior of . Given the observations , the SAGE algorithm generates a sequence of estimates by repeating the following steps.

For

Initialize

For

E-step: Evaluate

(9)

M-step: Maximize with

respect to

(10)

(11) end

end

Now, consider the recursive formulation of SAGE. At time in-stant , define the “recursive” augmented likelihood as

(12) The new estimate is obtained by the following procedure.

For

(13)

end

Finally,

(14) To obtain a proper approximation of , we will con-sider the Taylor expansion of the first and second terms on the right-hand side of (12). Approximately

(15)

The first-order term vanishes because .

By the definition of , is

approx-imately given by

(16) Let and denote the conditional density functions. Using Bayes’ law, the augmented data log-likelihood

can be expressed as

(17) From [12], we know that the last term of (17) must be independent of to assure the monotonic increase of the observed data log-likelihood. Thus, in the th cycle, is regarded as a constant. Approx-imately, (17) is given by

const

(18) Given appropriate regularity

(19) equations (15), (16), and (18) lead to the following expression:

(20) The maximizing parameter is given by

(21) As the parameter subsets are disjoint, (21) can be implemented simultaneously for . This implies

(22) Thus, given regularity conditions, recursions (13) and (14) can be approximated by (22). The iterates of the recursive SAGE-inspired algorithm are given by (22). The recursive procedures (5) and (8) are more general forms of (22).

(4)

IV. ASYMPTOTICPROPERTIES

Two questions of particular importance about the recursive EM and SAGE-inspired algorithms are whether the sequence generated by (1), (4), (5), or (8) converges and how fast it converges. We will show that both algorithms are indeed sto-chastic approximation procedures with a specialized gain matrix for minimizing the Kullback–Leibler distance. The asymptotic behavior of the iterates is governed by a mean ordinary differ-ential equation (ODE). Based on the stochastic approximation theory, we will show that under mild conditions, is strongly consistent and asymptotically normally distributed.

To begin with, define

(23) where the expectation is with respect to and

. We cite a theorem from [26]. Theorem 1: has the following properties: a) . b) There exists , a neighborhood of , such that

for , .

Proof: See Appendix A. Now consider

and

(24) Then

and (25)

is a direct consequence of Theorem 1. It is clear that recursions (1), (4),(5), and (8) are procedures to find zeros of and

, respectively. Furthermore, the observation of

(26) can be decomposed as

(27) where is a martingale difference noise [17], [27]. Note that

can be written as

(28) The independence of yields the following:

(29) and

(30)

where is a martingale [27]. Then, (27) follows immedi-ately. Similarly, in the recursive SAGE-inspired algorithm,

can be decomposed as

(31)

A. Convergence

From the previous discussion, we know that

and are unbiased

obser-vations of the functions and corrupted by a martingale difference noise. By probability inequalities for martingale sequences, it can be shown that the asymptotic behavior of the iterate is determined effectively by that of a “mean” ODE. Since is a gradient and [or ] is positive definite, the stationary points of the ODE associated with the recursive EM and SAGE-inspired algorithms are asymptotically stable. If the initial estimate is not too far from , we can expect that converges to .

In order to describe the behavior of the constrained algorithm, it is necessary to define the projected ODE [17]

(32) where is the projection term, the minimum term needed to keep in the constraint set . is zero if .

Assume . If is on the interior

of a face of and points out of , then points inward, orthogonal to the face. Note that

[17], where denotes the Euclidean norm.

The probability one convergence of recursions (1), (4), (5), and (8) follows directly from convergence of a more general procedure

(33) in the unconstrained case, and

(34) in the constrained case. We assume is a positive definite matrix and continuous in . Applying the ODE method widely used in the stochastic approximation theory [2], [17], we obtain the following results.

Theorem 2: Suppose a) and b) hold for the recursions (33) and (34). Then, generated by the constrained algorithm (4) converges with probability one to some invariant set of the ODE (35) in , where . For the unconstrained al-gorithm (1), we assume additionally that is bounded with probability one. Then, converges with probability one to the invariant set of the ODE

(36) Furthermore, converges with probability one to the set of stationary points of the corresponding ODE.

Proof: See Appendix B.

Remark: The result of Theorem 2 holds for the recursive EM and SAGE-inspired algorithms if is replaced by

or , and is replaced by or

(5)

The previous theorem shows us that under proper conditions, converges with probability one to the stationary points. For the unconstrained algorithm, the stationary points are points such that . For the constrained algorithm, if the sta-tionary points lie in , the stationary condition is just . On the boundary, the condition is . As pointed out in [17], the constraint can give rise to spurious stationary points on the boundary, but this is the only type of singular points that can be introduced by the constraint.

Now, assume that the constraint set is large enough, so that the stationary points are inside the . From Theorem 1, we know the true parameter vector satisfies the stationary condition and is one of the limit points of . In view of the well-known multiple maxima that are possible on likelihood surfaces, one could, of course, not expect consistency of the recursions irrespective of the starting point [25]. In other words, the iterate converges to if the initial estimate is in the domain of attraction of

.

B. Asymptotic Distribution

In this subsection, we will be concerned with the conver-gence speed of the algorithm. The converconver-gence speed is mea-sured by the convergence rate, defined as the normalized error around the limit point with . Under assumptions made in Theorem 2, it can be shown that is normally distributed with zero mean and co-variance matrix . To avoid mathematical difficulties, the limit point is supposed to be interior to the constrained set in the constrained algorithm.

First, we consider the following approximate expression. Lemma 1: The function around the limit point can be approximated by

(37) where is a stable matrix. A stable matrix is a square matrix, all of whose eigenvalues have negative real parts. The information matrix of the observed data is

. Proof: See Appendix C.

Exploiting results of stochastic approximation theory [2], [17], [20], one can obtain the asymptotic normality of the recursive EM and SAGE-inspired algorithms. The following theorem shows how the rate of convergence depends on the choice of and the object function .

Theorem 3: Consider the recursions (33) and (34). Suppose

a) , b) , and c)

con-verges to an isolated stable point of the ODE .

Then, i) if and is a stable

matrix, has asymptotic distribution ,

where is the solution of

(38)

where , .

ii) If , has asymptotic distribution , where is the solution of

(39) Proof: See Appendix D.

Remark: The results of Theorem 3 hold for the recursive EM and SAGE-inspired algorithms if is replaced by

or .

From Theorem 3, we know that as , the expectation

of decreases with the order for and

for . Consequently, the best choice is . However, in practice, selecting may cause too small step sizes at the initial stages, and the algorithm may not provide sufficient changes in to approach the true parameter.

Further-more, when the gain matrix or is

re-placed by , the recursion

(40) yields the asymptotically optimal covariance . Thus, the estimates generated by (40) is asymptotically effi-cient in the sense that the covariance matrix approaches the lower bound given by the asymptotic Cramér–Rao inequality [11]. However, the augmented information matrices

and are, in general, much easier to calculate and in-vert than . Besides, an optimal rate of convergence can also be achieved by the recursive EM or SAGE-inspired algorithms if an additional averaging step is undertaken

(41) which is known as the Polyak–Ruppert procedure [21]. More details about this method can be found in [17] and [20].

V. APPLICATION TODOA ESTIMATION

The recursive EM and SAGE-inspired algorithms introduced previously are applied to the source localization problem. A narrow-band signal model is used in the derivation. As the pro-posed approach is based on the likelihood function, extension to the wideband case is straightforward if one applies the asymp-totic properties of Fourier transformation to the data [3], [4]. The algorithms presented here are aimed to find a fixed parameter. However, if a properly chosen small constant step size is used, the same procedures can be also used to track slowly varying parameters [2]. In the following, we give a brief description of the signal model and develop recursive methods for estimating DOA parameters. Also, we will show that the proposed algo-rithms are closely related to maximization of the concentrated likelihood function.

A. Signal Model

Consider an array of sensors receiving signals generated by far-field narrow-band sources. The th snapshot of the array output is given by

(6)

where summarizes the DOA

pa-rameters,

con-tains the steering vectors , ,

, and

denote the signal waveform and noise vector, respectively. We assume 1) is unknown and deterministic, 2) is independent, identically complex normally distributed with zero mean and covariance matrix , where is an identity matrix. Consequently, is independent, complex normally distributed with mean and covariance matrix . The log-likelihood function of conditioned on the signal waveform can be written as

(43) where . The problem of interest is to esti-mate sequentially.

B. Recursive Estimation of DOA Parameters

Based on the recursive EM and SAGE-inspired algorithms presented in Sections II and III, we develop a recursive proce-dure for estimating DOA parameters. The th snapshot corresponds to , the observed data at time . According to recursion (1), (4), (5), or (8), all elements in should be up-dated simultaneously. However, to avoid a complicated gain ma-trix or , these procedures will only be ap-plied to . The signal and noise parameters, denoted by

and , are updated by computing their ML es-timates once the new DOA parameters are available. Thus, the gradient vector and the augmented information matrices , are calculated with respect to .

Note that .

Taking the first derivative of the log-likelihood (43) with re-spect to , we can easily obtain the th element of the gradient vector at time instant

Re

(44)

where .

The data augmentation scheme used for computing

is obtained by decomposing the array output into its signal and noise parts. Formally, it can be expressed as

(45) The augmented data associated with the th signal

(46) is complex normally distributed with mean and covariance matrix with the constraint . For

simplicity, we choose . The log-likelihood associ-ated with this augmentation scheme is given by

(47) Since the signals are decoupled through the augmentation scheme (45), is a diagonal matrix when we only consider the DOA parameters . By definition (3), the th diagonal element of is the conditional expectation of the second derivative of the augmented log-likelihood

(48) which is given by

(49)

where .

In the recursive SAGE-inspired algorithm, the augmented data of the th cycle is specified as

(50) At each cycle, we consider one signal and the total noise com-ponent. The associated log-likelihood is given by

(51) As the recursive SAGE is only applied to the DOA parameter, is a diagonal matrix. By (6) and (51), , the th diagonal element can be expressed as

(52) More details about calculating and can be found in [6].

For both algorithms, once the estimate is available, the signal and noise parameters are obtained by computing their ML estimates for given and as follows:

(53) (54)

(7)

where is the generalized left inverse of the matrix

, is the orthogonal

comple-ment of the projection matrix , and

.

At beginning of the recursion, the initial estimate may be obtained by beamforming or subspace methods. The signal and noise parameters and can be initialized by (53) and (54), respectively. In summary, the recursive EM (or SAGE-inspired) algorithm proceeds as follows.

Initialize For

Calculate gradient vector and aug-mented information matrix by

equations (44) and (49) [or (52)]. Update the DOA parameters by

(55) or

(56) in the unconstrained case. Use the projection operator in the con-strained case.

Update the signal and noise parameters by equations (53) and (54).

end

Because of the simple structure of the augmented data,

and are diagonal matrices. The

associated inverse matrices can be easily obtained by inverting the diagonal elements of and . Compared to the optimal procedure (40) with the gain matrix

(57) where is a diagonal matrix with elements of and [23], recursions (55) and (56) are much easier to implement. Although the algorithms pre-sented in this subsection are developed for a narrow band signal, they can be easily extended to the wide band case. From the asymptotic theory of Fourier transform, we know that the Fourier transformed data at each frequency bin is independent from each other [3], [4]. In this case, the contribution of each frequency bin can be calculated by (44), (49), and (52). The gradient vector and the augmented matrix are obtained by summing up contributions over frequencies of interest.

Comparing (49) and (52), differs from

only in the coefficient of the second term. When the iterate is close to the true parameter , the first term can be neglected. The gain matrix of the recursive SAGE-inspired algorithm is times that of the recursive EM algorithm. It can be expected that both algorithms have the same convergence behavior if the step

size of the recursive EM algorithm is chosen to be times that of the recursive SAGE-inspired algorithm.

C. Relation to the Concentrated Likelihood Function

One may notice that under the conditional signal model, the likelihood function (43) varies from snapshot to snapshot. The assumption made by the recursive EM and SAGE-inspired al-gorithms that all observations are i.i.d. is no longer satisfied. Do the procedures developed in the previous subsection really lead to the true DOA parameters? We shall show that the recursive EM and SAGE-inspired algorithms for DOA estimation do lead to the desired true parameters .

It is well known in the literature [3] that the log-likelihood function (43) can be concentrated with respect to the signal and noise parameters. For one snapshot, the concentrated likelihood is given by (58) where (59) Let (60) be the gradient vector of . It will be shown in the following that the gradient vector , defined by (44), has the same direction as .

Result 1: Let and be defined by (44) and (60), respectively. Then

(61) Proof: See Appendix E.

To investigate the convergence behavior, we need to define a function closely related to the concentrated likelihood function (62) where

(63) Note that is the covariance matrix of array outputs when is assumed to be a zero-mean stationary process with co-variance matrix . The notation in (63) is used to empha-size that is the true covariance matrix. Although the recur-sive DOA estimation algorithms were derived under the deter-ministic signal model, the dynamics of the algorithms are gov-erned by , in which is replaced by . Using (63) does not constitute conflicts with the deterministic signal model used in derivation. The source signals are a stationary process. When is seen as a realization, we have the deter-ministic signal model.

Result 2: The recursive EM and SAGE-inspired algorithms for DOA estimation are stochastic approximation procedures for finding zeros of

(64) which may coincide with the true parameters .

(8)

Fig. 1. MSE versus recursions. = [24 32 45 ]. SNR = 010 dB.

= n , = n .

VI. SIMULATIONS

We study the performance of the recursive EM and SAGE-inspired algorithms for DOA estimation by numerical experi-ments. A linear array of 15 sensors with equal interelement spac-ings of half a wavelength is used. Results from three numerical experiments will be presented.

In the first experiment, three sources of equal power located at are used to generate signals. The procedure starts from the initial value . In the second experiment, a more critical situation with two closely located sources is considered. The initial estimate is given by . The signal-to-noise ratio (SNR), defined as the ratio between the signal power of each source and sensor noise , is chosen to be 10, 0, 10 dB. The maximum number of snapshots is 300. The step size for the recursive SAGE-inspired algorithm is chosen to be

. The recursive EM algorithm uses two step sizes,

or . In the third experiment, we consider

the scenario with and SNR dB. In

addition to the recursive EM and SAGE-inspired algorithms, a procedure using the gain matrix is applied to the same data.

The number of Monte Carlo trials performed in each experi-ment is 500. The estimation accuracy is measured by the mean-squared error (MSE), which is defined as . The results are compared to the Cramér–Rao bound (CRB) for stochastic signal model, which is the lowest bound for the ML estimator [24] and does not depend on realizations of the signals.

In order to avoid outliers, the recursion (55) or (56) is carried out when the change in each DOA parameter at each stage is less than a fixed value . In our simulation, is chosen to be 2 . If , we set the change in the DOA parameter to be 0.001 with the sign of the corresponding element of the gra-dient vector. Using such a mechanism is equivalent to replacing [or obtained by (49) or (52) by a large positive number. The condition for probability one

con-vergence [or ] is still satisfied.

Also, the iterate is kept bounded.

= 3n , = n .

Fig. 3. MSE versus recursions. = [24 32 45 ]. SNR = 0 dB. =

n , = n .

Results of the first experiment are plotted in Figs. 1–6. The MSEs of both algorithms decrease with increasing number of re-cursions. Figs. 1, 3, and 5 show that when the same step sizes are used, the recursive SAGE-inspired algorithm converges faster than the recursive EM algorithm. When the step size of the cursive EM algorithm is three times as much as that of the re-cursive SAGE-inspired algorithm, their convergence rates are almost identical (see Figs. 2, 4, and 6). At SNR , 10 dB, the recursive EM algorithm has slightly lower MSE than the recur-sive SAGE-inspired algorithm. The results presented in Figs. 2, 4, and 6 are not surprising. Since the number of sources , when , both algorithms should have similar convergence behavior as predicted by (49) and (52). Comparing MSEs at SNR , 0, 10 dB, we can easily see that MSEs become smaller at higher SNRs.

Figs. 7–12 show the performance of both algorithms in a more critical scenario . The distance of the initial estimate to the true parameters is the same as that

(9)

3n , = n .

n , = n .

in the first experiment. Both procedures lead to convergent it-erates at all SNRs. At higher SNRs, the estimation accuracy is better than that at lower SNRs. In general, both algorithms per-form worse than in the first experiment. It can be also observed that the MSEs are larger, and thus, the convergence rates are slower. Comparing Figs. 7, 9, and 11 with Figs. 1, 3, and 5, we can observe that the influence of the closely located sources be-come more significant at higher SNRs. In Figs. 8, 10, and 12, it can also be observed that both algorithms have a similar

con-vergence behavior when .

Three stochastic approximation procedures with gain

ma-trices , , and are compared

in the third experiment. The initial estimates are given by

and . From Fig. 13, we

can observe that the algorithm with the same gain matrix as the optimal procedure (40) has a better convergence rate after the iterates are close enough to the true parameter. Note that using the gain matrix does not mean the optimal procedure

3n , = n .

= n , = n .

because a step size of rather than is used. So, we refer to this algorithm as the -procedure. As pointed out in [17], is the best step size for . In practice, using fast decreasing step size may not provide sufficient changes to approach to the true parameter. Fig. 14 presents results obtained by using a better initial estimate. The MSEs are, in general, lower than those in Fig. 13 for the same number of snapshots. This is not surprising since the algorithm requires fewer recur-sions to enter the neighborhood of the true parameter.

In summary, the estimates generated by the recursive EM and SAGE-inspired algorithms converge to the true parameters. Using the same step sizes, the recursive SAGE-inspired algo-rithm converges faster than the recursive EM algoalgo-rithm. When , they have similar convergence behavior. Both methods perform better at higher SNRs or in a situation with widely separated sources. The convergence rate can be improved by using the gain matrix at the expense of higher computational cost. The CRB computed for a stochastic

(10)

= 3n , = n .

n , = n .

signal model is not achieved because 1) the algorithms presented in this paper are derived under the deterministic signal model, 2) the recursion procedure is only applied to the DOA parameter, 3) the convergence rates given in Section IV-B are asymptotic results, and 4) a step size of rather than is chosen because of a practical convergence rate. Compared to the batch processing, the recursive procedure is fast and computationally efficient but has a poorer estimation performance.

VII. CONCLUSION

This paper is concerned with recursive parameter estimation using augmented data. The recursive EM and SAGE-inspired algorithms are formulated in a very general form. It was shown that the recursive SAGE-inspired algorithm is closely related to the SAGE algorithm. Under mild conditions, the sequence of the estimates generated by the recursive EM and SAGE-in-spired algorithms converge with probability one to a stationary

3n , = n .

= n , = n .

point of the likelihood function. The normalized error vector is asymptotically normal distributed with zero mean and a covari-ance matrix that can be obtained by solving a matrix equation. These results are valid for a broad class of problems.

Based on the recursive EM and SAGE-inspired algorithms, we developed recursive procedures for estimating DOA param-eters. Because of the simple structure of the augmented data, the recursive procedures have a very simple implementation. It was proved that convergence behavior of the proposed algorithms is governed by a function closely related to the concentrated likeli-hood function. Simulations showed that estimates generated by the recursive EM and SAGE-inspired algorithms achieve satis-factory accuracy over a wide range of SNRs. In general, higher SNR and better initial estimates lead to faster convergence. Both algorithms performed better in a scenario with widely separated signal sources than with closely located signal sources. The ap-plication to the direction-finding problem demonstrated that the

(11)

= 3n , = n .

Fig. 13. MSE versus recursions. = [24 32 45 ]. = [19 36 50 ].

SNR= 0 dB. = n , = n , = n .

recursive EM and SAGE-inspired algorithms provide a compu-tationally efficient method to find ML estimates.

APPENDIX A PROOF OFTHEOREM1 Note that (65) where (66) is the Kullback–Leibler distance between and . a) and b) follow immediately since it is well known

Fig. 14. MSE versus recursions. = [24 32 45 ]. = [22 34 47 ].

SNR= 0 dB. = n , = n , = n .

that , with equality if and only if for an identifiable model .

APPENDIX B PROOF OFTHEOREM2

1) The convergence of to the invariant set of (35) or (36) is guaranteed by decreasing step sizes and conditions a) and b).

The step sizes are specified as , , , and . Therefore

(67)

Assumptions a) and b) yield the following inequality:

(68)

Note that similar to (27), can be

decomposed as

(69) By (67)–(69), and [17, Th. 2.1, p. 95], we conclude that asymptotically, follows the trajectory of the solution to the ODE (35) in the constrained case, (36) in the unconstrained case.

2) The stationary points of (35) satisfy the condition

. Recall that . As

is the gradient of the Kullback–Leibler distance and is positive definite, we can show that

(12)

the stationary points are asymptotically stable in the Liapunov sense. Thus, converges to the set of stationary points.

More precisely, using as the Liapunov function, the derivative of along the solution can be expressed as

(70) The inequality on the right-hand side results from the fact that

.

The stationary points of (36) satisfy the condition . Using as a Liapunov function, it can be also shown that the stationary points are asymptotically stable.

APPENDIX C PROOF OFLEMMA1 Consider the first-order Taylor expansion

(71) where the Jacobi matrix of is given by

(72) Then, the th row of is given by

(73)

where denotes the th row of . Since , (73) can be simplified to

(74) The equation above implies

(75) Because and are positive definite, is negative definite; thus, it is a stable matrix.

APPENDIX D PROOF OFTHEOREM3

We will show that under assumptions a), b), and c), recursions (33) and (34) satisfy the following conditions.

1) There are constants , such that .

2) There are constants , such that .

3) .

4) as , where is a

stable matrix.

5) The matrix is defined as and then, apply results from [20].

Conditions 1), 2), and 4) can be easily verified by Lemma 1. Using (37)

(76)

where is a bounded matrix under

as-sumptions a) and b). Thus, 1) is satisfied. Condition 2) can be also verified by (37). Condition 4) is a direct consequence of Lemma 1.

By definition, . With assumptions

a) and b), we can verify 3) by the following inequality:

(77) The matrix is the covariance of at the limit point . Since

(78) We have shown that conditions 1)–5) are satisfied by recur-sions (1) and (4). The asymptotic normality of

follows from [20, Th. 5.8 and 5.10, pp. 291–293]. APPENDIX E

PROOF OFRESULT1

Substituting (58) in (60), the th element of can be written as

(79)

Note that is the ML estimate

for the noise parameter at . By the fact and the identity [15]

(80) (79) can be further simplified to

(13)

Note that is the ML estimate for the signal parameters at . The first derivative of with respect to has only nonzero elements in the th column

(82) These observations lead to

(83) Furthermore

(84) Equations (81), (83),and (84) yield the following:

Re

(85) Comparing the above result with (44), it can be concluded that

(86)

APPENDIX F PROOF OFRESULT2 Equation (62) can be rewritten as

(87) Obviously, has a maximum at . Consequently

(88) Comparing functions (58) and (62), the gradient vector can be seen as a noisy observation of . By Result 1, we know and are equivalent, except for a constant, which can be absorbed into the step size. The inverses of and can be regarded as gain matrices that improve the convergence rate of the algo-rithm. Thus, the recursive EM and SAGE-inspired algorithms derived under a deterministic signal model are indeed stochastic approximation procedures for finding zeros of . For a good initial estimate , the recursive procedures converge to the desired parameters .

ACKNOWLEDGMENT

The authors would like to thank A. O. Hero for a fruitful discussion. They also would like to thank Associate Editor C.

B. Papadias and the anonymous reviewers for their constructive comments that significantly improved the manuscript.

REFERENCES

[1] T. M. Apostol, Mathematical Analysis, 2nd ed. Reading, MA: Ad-dison-Wesley, 1975.

[2] A. Benveniste, M. Métivier, and P. Priouret, Adaptive Algorithms and

Stochastic Approximations. New York: Springer-Verlag, 1990. [3] J. F. Böhme, “Array processing,” in Advances in Spectrum Analysis and

Array Processing, S. Haykin, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1991, pp. 1–63.

[4] D. R. Brillinger, Time Series: Data Analysis and Theory. San Fran-cisco, CA: Holden-Day, 1981.

[5] P. J. Chung and J. F. Böhme, “Recursive EM and SAGE algorithms,” in Proc. IEEE Workshop Statist. Signal Process., Singapore, Aug. 6–8, 2001, pp. 540–542.

[6] , “Comparative convergence analysis of EM and SAGE algorithms in DOA estimation,” IEEE Trans. Signal Process., vol. 49, no. 12, pp. 2940–2949, Dec. 2001.

[7] , “DOA estimation of multiple moving sources using recursive EM algorithms,” in Proc. Sensor Array Multichannel Signal Process.

Work-shop, Washington, DC, Aug. 4–6, 2002, pp. 323–326.

[8] , “EM and SAGE algorithms for towed array data,” in The

Appli-cations of Space-Time Adaptive Processing, R. Klemm, Ed. London, U.K.: IEE, 2004.

[9] A. P. Dempster, N. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc., vol. B39, pp. 1–38, 1977.

[10] V. Fabian, “On asymptotic normality in stochastic approximation,” Ann.

Math. Statist., vol. 39, pp. 854–866, 1968.

[11] , “On asymptotically efficient recursive estimation,” Ann. Statist., vol. 6, no. 4, pp. 854–866, 1978.

[12] J. A. Fessler and A. O. Hero, “Space-alternating generalized expecta-tion-maximization algorithm,” IEEE Trans. Signal Process., vol. 42, no. 10, pp. 2664–2677, Oct. 1994.

[13] L. Frenkel and M. Feder, “Recursive Expectation-Maximization (EM) algorithms for time-varying parameters with applications to multiple target tracking,” IEEE Trans. Signal Process., vol. 47, no. 2, pp. 306–320, Feb. 1999.

[14] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1985.

[15] D. Kraus, “Approximative Maximum-Likelihood-Schätzung und ver-wandte Verfahren zur Ortung und Signalschätzung mit Sensorgruppen,” Dr.-Ing. dissertation, Faculty Elect. Eng., Ruhr-Universität Bochum, Shaker Verlag, Aachen, Germany, 1993.

[16] H. J. Kushner and D. S. Clark, Stochastic Approximation for Constrained

and Unconstrained Systems. New York: Springer-Verlag, 1978. [17] H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and

Applications. New York: Springer-Verlag, 1997.

[18] L. Ljung, “Analysis of recursive stochastic algorithms,” IEEE Trans.

Autom. Control, vol. AC-22, no. 4, pp. 551–575, Aug. 1977.

[19] D. Maiwald, “Breitbandverfahren zur Signalentdeckung und-ortung mit Sensorgruppen in Seismik- und Sonaranwendungen,” Dr.-Ing. disser-tation, Faculty Elect. Eng., Ruhr-Universität Bochum, Shaker Verlag, Aachen, Germany, 1995.

[20] G. Pflug, Optimization of Stochastic Models. Norwell, MA: Kluwer, 1996.

[21] B. T. Polyak and A. B. Juditsky, “Acceleration of stochastic approxima-tion by averaging,” SIAM J. Control Optim., vol. 30, pp. 838–855, 1992. [22] J. Sacks, “Asymptotic distribution of stochastic approximation

proce-dures,” Ann. Math. Statist., vol. 29, pp. 373–405, 1958.

[23] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramer-Rao bound,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 5, pp. 720–741, May 1989.

[24] , “Performance study of conditional and unconditional direction-of-arrival estimation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 10, pp. 1783–1795, Oct. 1990.

[25] D. M. Titterington, “Recursive parameter estimation using incomplete data,” J. R. Statist. Soc. B, vol. 46, no. 2, pp. 257–267, 1984. [26] D. M. Titterington, A. F. M. Smith, and U. E. Makov, Statistical Analysis

of Finite Mixture Distributions. New York: Wiley, 1985.

[27] D. Williams, Probability With Martingales. Cambridge, U.K.: Cam-bridge Univ. Press, 1991.

(14)

[28] B. Yang, “Projection approximation subspace tracking,” IEEE Trans.

Signal Process., vol. 43, no. 1, pp. 95–107, Jan. 1995.

Pei-Jung Chung received the Dr.-Ing. degree in 2002

from Ruhr-Universität Bochum, Bochum, Germany. From 1998 to 2002, she was with Signal Theory Group, Department of Electrical Engineering and In-formation Sciences, Ruhr-Universität Bochum. From May 2002 to January 2004, she held a postdoctoral position at Carnegie Mellon University, Pittsburgh, PA. From February 2004 to July 2004, she was a post-doctoral researcher at University of Michigan, Ann Arbor. She is now an Assistant Professor with the De-partment of Electronics Engineering, National Chiao Tung University, Hsin Chu, Taiwan, R.O.C. Her research interests include array processing, statistical signal processing, and its applications.

Johann F. Böhme (M’74–SM’88–F’90) received the

Diplom degree in mathematics in 1966, the Dr.-Ing. degree in 1970, and the Habilitation degree in 1977, both in computer science, from the Technical Univer-sity of Hannover, Hannover, Germany, the UniverUniver-sity of Erlangen, Erlangen, Germany, and the University of Bonn, Bonn, Germany, respectively.

From 1967 to 1974, he was with the sonar-research laboratory of Krupp Atlas Elektronik, Bremen, Ger-many. He then joined the University of Bonn until 1978 and the FGAN in Wachtberg-Werthhoven, Ger-many. Since 1980, he has been Professor of signal theory with the Department of Electrical Engineering and Information Sciences, Ruhr-Universität Bochum. His research interests are in the domain of statistical signal processing and its applications.

Dr. Böhme was the recipient of the 2003 IEEE Signal Processing Society Technical Achievement Award.