**Convergence of a Block Coordinate Descent** **Method for Nondifferentiable Minimization**

^{1}P. T^{SENG}^{2}

Communicated by O. L. Mangasarian

**Abstract.** We study the convergence properties of a (block) coordinate
descent method applied to minimize a nondifferentiable (nonconvex)
*function f (x*1*, . . . , x**N*) with certain separability and regularity proper-
*ties. Assuming that f is continuous on a compact level set, the sub-*
sequence convergence of the iterates to a stationary point is shown when
*either f is pseudoconvex in every pair of coordinate blocks from among*
*NA1 coordinate blocks or f has at most one minimum in each of NA2*
*coordinate blocks. If f is quasiconvex and hemivariate in every coordi-*
*nate block, then the assumptions of continuity of f and compactness of*
the level set may be relaxed further. These results are applied to derive
new (and old) convergence results for the proximal minimization algo-
rithm, an algorithm of Arimoto and Blahut, and an algorithm of Han.

They are applied also to a problem of blind source separation.

**Key Words.** Block coordinate descent, nondifferentiable minimization,
stationary point, Gauss–Seidel method, convergence, quasiconvex func-
tions, pseudoconvex functions.

**1. Introduction**

A popular method for minimizing a real-valued continuously differen-
*tiable function f of n real variables, subject to bound constraints, is the*
(block) coordinate descent method. In this method, the coordinates are par-
*titioned into N blocks and, at each iteration, f is minimized with respect to*
one of the coordinate blocks while the other coordinates are held fixed. This
method, which is related closely to the Gauss–Seidel and SOR methods for
equation solving (Ref. 1), was studied early by Hildreth (Ref. 2) and Warga
(Ref. 3), and is described in various books on optimization (Refs. 1 and 4–

1This work was partially supported by the National Science Foundation Grant CCR-9731273.

2Professor, Department of Mathematics, University of Washington, Seattle, Washington.

475

0022-3239兾01兾0600-0475$19.50兾02001 Plenum Publishing Corporation

10). Its applications include channel capacity computation (Refs. 11–12), image reconstruction (Ref. 7), dynamic programming (Refs. 13–15), and flow routing (Ref. 16). It may be applied also to the dual of a linearly constrained, strictly convex program to obtain various decomposition methods (see Refs. 6–7, 17–22, and references therein) and parallel SOR methods (Ref. 23).

Convergence of the (block) coordinate descent method requires typi-
*cally that f be strictly convex (or quasiconvex or hemivariate) differentiable*
and, taking into account the bound constraints, has bounded level sets (e.g.,
Refs. 3–4 and 24–25). Zadeh (Ref. 26; also see Ref. 27) relaxed the strict
*convexity assumption to pseudoconvexity, which allows f to have a non-*
unique minimum along coordinate directions. For certain classes of convex
functions, the level sets need not be bounded (see Refs. 2, 6–7, 17, 19–22,
*and references therein). If f is not (pseudo)convex, then an example of*
Powell (Ref. 28) shows that the method may cycle without approaching any
*stationary point of f. Nonetheless, convergence can be shown for special*
*cases of non(pseudo)convex f, as when f is quadratic (Ref. 29), or f is strictly*
*pseudoconvex in each of NA2 coordinate blocks (Ref. 27), or f has unique*
*minimum in each coordinate block (Ref. 8, p. 159). If f is not differentiable,*
the coordinate descent method may get stuck at a nonstationary point even
*when f is convex (e.g., Ref. 4, p. 94). For this reason, it is perceived generally*
*that the method is unsuitable when f is nondifferentiable. However, an*
*exception occurs when the nondifferentiable part of f is separable. Such a*
*structure for f was considered first by Auslender (Ref. 4, p. 94) in the case*
*where f is strongly convex. This structure is implicit in a decomposition*
*method and projection method of Han (Refs. 18, 30), for which f is the*
convex dual functional associated with a certain linearly constrained convex
program (see Ref. 22 for detailed discussions). This structure arises also in
*least-square problems where an l*1-penalty is placed on a subset of the para-
meters in order to minimize the support (see Refs. 31–33 and references
therein).

Motivated by the preceding works, we consider in this paper the non-
*differentiable (nonconvex) case where the nondifferentiable part of f is sep-*
*arable. Specifically, we assume that f has the following special form:*

*f (x*1*, . . . , x**N**)Gf*0*(x*1*, . . . , x**N*)C ∑^{N}

*k G1*

*f**k**(x**k*), (1)

*for some f*0:ℜ^{n}^{1}^{C}^{···Cn}* ^{N}*>ℜ∪

*{S} and some f*

*k*:ℜ

^{n}*>ℜ∪*

^{k}*{S}, kG*

*1, . . . , N. Here, N, n*1

*, . . . , n*

*N*

*are positive integers. We assume that f is pro-*

*per, i.e., f*≡兾S

*. We will refer to each x*

*k*

*, kG1, . . . , N, as a coordinate block*

*of xG(x*1

*, . . . , x*

*N*). We will show that each cluster point of the iterates generated by the (block) coordinate descent method is a stationary point of

*f, provided that f*0*has a certain smoothness property (see Lemma 3.1), f is*
*continuous on a compact level set, and either f is pseudoconvex in every*
*pair of coordinate blocks from among NA1 coordinate blocks, or f has at*
*most one minimum in each of NA2 coordinate blocks (see Theorem 4.1).*

*If f is quasiconvex and hemivariate in every coordinate block, then the*
*assumptions of continuity of f and compactness of the level set may be*
relaxed further (see Proposition 5.1). These results unify and extend some
previous results in Refs. 4, 6, 8, 26–27. For example, previous results
*assumed that f is pseudoconvex and that f*1*, . . . , f**N*are indicator functions
*for closed convex sets, whereas we assume only that f is pseudoconvex in*
*every pair of coordinate blocks from among NA1 coordinate blocks, with*
*no additional assumption made on f*1*, . . . , f**N*. Previous results also did not
*consider the case where f is not continuous on its effective domain. Lastly,*
we apply our results to derive new (and old) convergence results for the
proximal minimization algorithm, an algorithm of Arimoto and Blahut
(Refs. 11–12), and an algorithm of Han (Ref. 30); see Examples 6.1–6.3.

We also apply them to a problem of blind source separation described in Refs. 31, 33; see Example 6.4.

In our notation, ℜ^{m}*denotes the space of m-dimensional real column*
*vector. For any x, y*∈ℜ* ^{m}*, we denote by〈x, y〉the Euclidean inner product

*of x, y and by兩兩x兩兩 the Euclidean norm of x, i.e.,*

*兩兩x兩兩G1*〈x, x〉.

*For any set S*⊆ ℜ^{m}*, we denote by int(S ) the interior of S and denote*
*bdry(S )GS\int(S ).*

*For any h:*ℜ* ^{m}*>ℜ∪{S}, we denote by dom h the effective domain of h,
i.e.,

*dom hG{x∈ℜ*^{m}*兩h(x)FS}.*

*For any x*∈dom h and any d∈ℜ* ^{m}*, we denote the (lower) directional deriva-

*tive of h at x in the direction d by*

*h′(x; d )Glim inf*

λ↓0*[h(xC*λ*d )Ah(x)]兾*λ.
*We say that h is quasiconvex if*

*h(xC*λ*d ) ⁄ max{h(x), h(xCd )},* *for all x, d and*λ∈[0, 1];

*h is pseudoconvex if*

*h(xCd ) ¤ h(x),* *whenever x∈dom h and h′(x; d )¤0;*

*see Ref. 34, p. 146; and h is hemivariate if h is not constant on any line*
*segment belonging to dom h (Ref. 1). For any nonempty I*⊆*{1, . . . , m}, we*

*say that h(x*1*, . . . , x**m*) is pseudoconvex [respectively, has at most one mini-
mum point].

**2. Block Coordinate Descent Method**

We describe formally the block coordinate descent (BCD) method below.

**BCD Method.**

*Initialization. Choose any x*^{0}G*(x*^{0}1*, . . . , x*^{0}*N*)∈dom f.

*Iteration rC1, r¤0. Given x** ^{r}*G

*(x*

*1*

^{r}*, . . . , x*

^{r}*N*)∈dom f, choose an index

*s∈{1, . . . , N} and compute a new iterate*

*x** ^{rC1}*G

*(x*

*1*

^{rC1}*, . . . , x*

^{rC1}*N*)∈dom f satisfying

*x*^{rC1}*s* ∈arg min

*x**s*

*f (x** ^{r}*1

*, . . . , x*

^{r}*sA1*

*, x*

*s*

*, x*

^{r}*sC1*

*, . . . , x*

^{r}*N*), (2)

*x*^{rC1}*j* G*x*^{r}*j*, ∀j≠*s.* (3)

We note that the minimization in (2) is attained if
*X*^{0}G*{x: f (x)⁄f (x*^{0})}

*is bounded and f is lower semicontinuous (lsc) on X*^{0}*, so X*^{0} is compact
*(Ref. 35). Alternatively, this minimization is attained if f is convex, has a*
minimum point, and is hemivariate in each coordinate block (but the level
*sets of f need not be bounded). To ensure convergence, we need further that*
each coordinate block is chosen sufficiently often in the method. In particu-
lar, we will choose the coordinate blocks according to the following rule
(see, e.g., Refs. 7–8, 21, 25).

**Essentially Cyclic Rule.** *There exists a constant T¤N such that every*
*index s*∈{1, . . . , N} is chosen at least once between the rth iteration and the
*(rCTA1)th iteration, for all r.*

*A well-known special case of this rule, for which TGN, is given below.*

**Cyclic Rule.** *Choose sGk at iterations k, kCN, kC2N, . . . , for kG*
*1, . . . , N.*

**3. Stationary Points of f**

*We say that z is a stationary point of f if z∈dom f and*
*f*′(z; d )¤0, ∀d.

*We say that z is a coordinatewise minimum point of f if z∈dom f and*
*f (zC(0, . . . , d**k**, . . . , 0))¤f (z),* ∀d*k*∈ℜ^{n}* ^{k}*, (4)

*for all kG1, . . . , N. Here and throughout, we denote by (0, . . . , d*

*, . . . , 0) the vector inℜ*

_{k}

^{n}^{1}

^{C}

^{···Cn}

^{N}*whose kth coordinate block is d*

*and whose other*

_{k}*coordinates are zero. We say that f is regular at z*∈dom f if

*f*′(z; d )¤0, ∀d G(d1*, . . . , d**N*),

*such that f*′(z; (0, . . . , d*k*, . . . , 0))¤0, *k G1, . . . , N.* (5)
This notion of regularity is weaker than that used by Auslender (Ref. 4,
p. 93), which entails

*f*′(z; d )G ∑^{N}

*k G1*

*f*′(z; (0, . . . , d*k*, . . . , 0)), *for all dG(d*1*, . . . , d**N*).

For example, the function

*f (x*1*, x*2)Gφ*(x*1*, x*2)Cφ(−x1*, x*2)Cφ*(x*1,−x2)Cφ(−x1,−x2),
where

φ*(a, b)Gmax{0, aCbA*1_{a}^{2}C*b*^{2}},

*is regular at zG(0, 0) in the sense of (5), but is not regular in the sense of*
Ref. 4, p. 93.

Since (4) implies

*f*′(z; (0, . . . , d*k*, . . . , 0))¤0, *for all d**k*,

*it follows that a coordinatewise minimum point z of f is a stationary point*
*of f whenever f is regular at z. To ensure regularity of f at z, we consider*
*one of the following smoothness assumptions on f*0:

*(A1) dom f*0*is open and f*0*is Gaˆteaux-differentiable on dom f*0.
*(A2) f*0 *is Gaˆteaux-differentiable on int(dom f*0*) and, for every z*∈

*dom f*∩*bdry(dom f*0*), there exist k∈{1, . . . , N} and d**k*∈ℜ^{n}^{k}*such that f (zC(0, . . . , d**k**, . . . , 0))F f (z).*

Assumption A1 was considered essentially by Auslender (Ref. 4,
Example 2 on p. 94). In contrast to Assumption A1, Assumption A2 allows
*dom f*0 to include boundary points. We will see an application (Example
6.2) where A2 holds but not A1.

**Lemma 3.1.** *Under A1, f is regular at each z∈dom f. Under A2, f*
*is regular at each coordinatewise minimum point z of f.*

**Proof.** *Under A1, if zG(z*1*, . . . , z**N*)∈dom f, then z∈dom f0. Under
*A2, if zG(z*1*, . . . , z**N**) is a coordinatewise minimum point of f, then*
*z∉bdry(dom f*0*), so z*∈int(dom f0*). Thus, under either A1 or A2, f*0 is
*Gaˆteaux-differentiable at z. Fix any dG(d*1*, . . . , d**N*) such that

*f*′(z; (0, . . . , d*k*, . . . , 0))¤0, *k G1, . . . , N.*

Then,

*f*′(z; d )G〈∇*f*0*(z), d〉*Clim inf

λ↓0 ∑^{N}

*k G1*

*[ f*_{k}*(x** _{k}*Cλ

*d*

*k*

*)Af*

_{k}*(x*

*)]兾λ*

_{k}¤〈∇*f*0*(z), d*〉C∑^{N}

*k G1*

lim inf

λ↓0*[ f*_{k}*(x** _{k}*Cλ

*d*

*k*

*)Af*

_{k}*(x*

*)]兾λ G〈∇*

_{k}*f*0

*(z), d〉*C ∑

^{N}*k G1*

*f*′*k**(z*_{k}*; d** _{k}*)

G ∑^{N}

*k G1*

*f*′(z; (0, . . . , d*k*, . . . , 0))

¤0. 䊐

**4. Convergence Analysis: I**

Our first convergence result unifies and extends a result of Auslender
(Ref. 4, p. 95) for the nondifferentiable convex case and some results of
Grippo and Sciandrone (Ref. 27), Luenberger (Ref. 8, p. 159), and Zadeh
*(Ref. 26) for the differentiable case. In what follows, r*≡*(NA1) mod N*
*means rGNA1, 2NA1, 3NA1, . . . .*

**Theorem 4.1.** *Assume that the level set X*^{0}G*{x: f (x)⁄f (x*^{0})} is
*compact and that f is continuous on X*^{0}. Then, the sequence
*{x** ^{r}*G

*(x*

*1*

^{r}*, . . . , x*

^{r}*N*)}

*r G0, 1,...*generated by the BCD method using the essen- tially cyclic rule is defined and bounded. Moreover, the following statements

hold:

(a) *If f (x*1*, . . . , x*_{N}*) is pseudoconvex in (x*_{k}*, x*_{i}*) for every i, k∈*

*{1, . . . , N}, and if f is regular at every x∈X*^{0}, then every cluster
*point of {x*^{r}*} is a stationary point of f.*

*(b) If f (x*1*, . . . , x**N**) is pseudoconvex in (x**k**, x**i**) for every i, k*∈
*{1, . . . , NA1}, if f is regular at every x*∈X^{0}, and if the cyclic rule
*is used, then every cluster point of {x** ^{r}*}

*r*≡(NA1) mod Nis a stationary

*point of f.*

(c) *If f (x*1*, . . . , x**N**) has at most one minimum in x**k* *for kG*
*2, . . . , NA1, and if the cyclic rule is used, then every cluster point*
*z of {x** ^{r}*}

*r*≡

*(NA1) mod N*

*is a coordinatewise minimum point of f. In*

*addition, if f is regular at z, then z is a stationary point of f.*

**Proof.** *Since X*^{0} *is compact, an induction argument on r shows that*
*x*^{rC1}*is defined, f (x*^{rC1}*)⁄f (x*^{r}*), and x** ^{rC1}*∈X

^{0}

*for all rG0, 1, . . . . Thus, {x*

*}*

^{r}*is bounded. Consider any subsequence {x*

*}*

^{r}*r∈R*, with

*R*

⊆{0, 1, . . .}, con-
*verging to some z. For each j∈{1, . . . , T}, {x*

*}*

^{rATC1Cj}*r∈R*is bounded, so by passing to a subsequence, if necessary, we can assume that

*{x** ^{rATC1Cj}*}

*r∈R*

*converges to some z*

*G*

^{j}*(z*

*1*

^{j}*, . . . , z*

^{j}*N*),

*j G1, . . . , T.*

Thus,

*z** ^{TA1}*G

*z.*

*Since { f (x*^{r}*)} converges monotonically and f is continuous on X*^{0}, we
obtain that

*f (x*^{0})¤ lim

*r→*S

*f (x*^{r}*)Gf (z*^{1}*)G· · ·Gf (z** ^{T}*). (6)
By further passing to a subsequence, if necessary, we can assume that the

*index s chosen at iteration rATC1Cj, j∈{1, . . . , T}, is the same for all r*∈

*R*

*, which we denote by s*

*.*

^{j}*For each j∈{1, . . . , T}, since s*^{j}*is chosen at iteration rATC1Cj for*
*r∈*

*R*

, then (2) and (3) yield
*f (x*^{rATC1Cj}*)⁄f (x** ^{rATC1Cj}*C

*(0, . . . , d*

_{s}*, . . . , 0)), ∀d*

^{j}*s*

^{j}*, jG1, . . . , T,*

*x*^{rATC1Cj}*k* G*x*^{rATCj}*k* , ∀k≠*s*^{j}*, jG2, . . . , T.*

*Then, the continuity of f on X*^{0}yields in the limit that

*f (z*^{j}*)⁄f (z** ^{j}*C

*(0, . . . , d*

*s*

*, . . . , 0)), ∀d*

^{j}*s*

^{j}*, jG1, . . . , T,*

*z*^{j}*k*G*z*^{jA1}*k* , ∀k≠*s** ^{j}*,

*j G2, . . . , T.*(7)

Then, (6) and (7) yield

*f (z*^{jA1}*)⁄f (z** ^{jA1}*C

*(0, . . . , d*

_{s}*, . . . , 0)), ∀d*

^{j}*s*

^{j}*, jG2, . . . , T.*(8)

*(a), (b) Suppose that f is regular at every x*∈X

^{0}

*and that f (x*1

*, . . . , x*

*N*)

*is pseudoconvex in (x*

*k*

*, x*

*i*

*) for every i, k∈{s*

^{1}}∪

*· · ·∪{s*

*}. This holds*

^{TA1}*under the assumption (a) or under the assumption (b), with {x*

*}*

^{r}*r∈R*being

*any convergent subsequence of {x*

*}*

^{r}*≡(NA1) mod N*

_{r}*. We claim that, for jG*

*1, . . . , TA1,*

*f (z*^{j}*)⁄f (z** ^{j}*C

*(0, . . . , d*

*k*, . . . , 0)), ∀d

*k*,∀k Gs

^{1}

*, . . . , s*

*. (9)*

^{j}*By (7), (9) holds for jG1. Suppose that (9) holds for jG1, . . . , lA1 for some*

*l∈{2, . . . , TA1}. We show that (9) holds for jGl. From (8), we have that*

*f (z*^{lA1}*)⁄f (z** ^{lA1}*C

*(0, . . . , d*

*s*

*, . . . , 0)), ∀d*

^{l}*s*

*, implying*

^{l}*f*′(z^{lA1}*; (0, . . . , z*^{l}*s** ^{l}*A

*z*

^{lA1}*s*

*, . . . , 0))¤0.*

^{l}*Also, since (9) holds for jGlA1, we have that, for each kGs*^{1}*, . . . , s** ^{lA1}*,

*f*′(z

^{lA1}*; (0, . . . , d*

*k*, . . . , 0))¤0, ∀d

*k*.

*Since by (6) z** ^{lA1}*∈X

^{0}

*, so f is regular at z*

*, the above two relations imply*

^{lA1}*f*′(z

^{lA1}*; (0, . . . , d*

_{k}*, . . . , 0)C(0, . . . , z*

^{l}

_{s}*A*

^{l}*z*

^{lA1}*s*

*, . . . , 0))¤0, ∀d*

^{l}*k*.

*Since f is pseudoconvex in (x*

*k*

*, x*

*s*

^{l}*), this yields [also using z*

*G*

^{l}*z*

*C (0, . . . , z*

^{lA1}

^{l}*s*

*Az*

^{l}

^{lA1}*s*

^{l}*, . . . , 0)] for kGs*

^{1}

*, . . . , s*

*that*

^{lA1}*f (z** ^{l}*C

*(0, . . . , d*

*k*

*, . . . , 0))¤f (z*

^{lA1}*)Gf (z*

*), ∀d*

^{l}*k*.

*Since we have also that (7) holds with jGl, we see that (9) holds for jGl.*

*By induction, (9) holds for all jG1, . . . , TA1.*

*Since z** ^{TA1}*G

*z and (9) holds for j GTA1, then (4) holds for k G*

*s*

^{1}

*, . . . , s*

^{TA1}*. Since z*

*G*

^{TA1}*z and (8) holds (in particular, for j GT ), then (4)*

*holds for kGs*

*also. Since*

^{T}*{1, . . . , N}G{s*^{1}}∪*· · ·∪{s** ^{T}*},

*this implies that z is a coordinatewise minimum point of f. Since f is regular*
*at z, then z is in fact a stationary point of f.*

*(c) Suppose that f (x*1*, . . . , x**N**) has at most one minimum in x**k**for kG*
*s*^{2}*, . . . , s*^{TA1}*. This holds under the assumption (c), with {x** ^{r}*}

*r∈R*being any

*convergent subsequence of {x*

*}*

^{r}*r*≡

*(NA1) mod N*

*. For each jG2, . . . , TA1, since*

(7) and (8) hold, then the function
*d**s** ^{j}*>

*f (z*

*C*

^{j}*(0, . . . , d*

*s*

*, . . . , 0))*

^{j}*attains its minimum at both d**s** ^{j}*G

*0 and d*

*s*

*G*

^{j}*z*

^{jA1}*s*

*A*

^{j}*z*

^{j}*s*

*. By assumption, the*

^{j}*minimum point is unique, implying 0Gz*

^{jA1}*s*

*A*

^{j}*z*

^{j}*s*

^{j}*, or equivalently, z*

*G*

^{jA1}*z*

^{j}*. Thus, z*

^{1}G

*z*

^{2}G

*· · · Gz*

*G*

^{TA1}*z and (7) yields that (4) holds for k G*

*s*

^{1}

*, . . . , s*

^{TA1}*. Since z*

*G*

^{TA1}*z and (8) holds (in particular, for j GT ), then (4)*

*holds for kGs*

*also. Since*

^{T}*{1, . . . , N}G{s*^{1}}∪*· · ·*∪*{s** ^{T}*},

*this implies that z is a coordinatewise minimum point of f. If f is regular at*

*z, then z is also a stationary point of f.* 䊐

*Notice that, if f is pseudoconvex, then f is pseudoconvex in (x*_{k}*, x** _{i}*) for

*every i, k*∈{1, . . . , N}; if f is quasiconvex and hemivariate in x

*k*

*, then f has*

*at most one minimum in x*

*k*. The converses do not hold. For example, the 2-variable Rosenbrock function has a unique minimum point but is not quasiconvex. The following 3-variable quadratic function

*f (x*1*, x*2*, x*3)G(1兾2)x^{2}1C(1兾2)x^{2}2C(1兾2)x^{2}3C*x*1*x*3C*x*2*x*3A*x*1*x*2

is convex in every pair of variables, but is not pseudoconvex. In particular,
*for xG(0, 0, 1兾2) and dG(1, 1,*−1), we have f′(x; d )G1兾2¤0, while
*f (xCd ) G−7兾8Ff(x)G1兾8. This example generalizes to any quadratic*
function

*f (x) G〈x, Qx〉.*

*where Q*∈

*R*

*is symmetric, not positive semidefinite, but whose 2B2*

^{NBN}*principal submatrices are positie semidefinite. Then, for any d satisfying*

〈d, Qd〉F*0 and any x satisfying*
0⁄〈x, Qd〉F−(1兾2)〈d, Qd〉,
we have that

*f*′(x; d )¤0, *while f (xCd )Ff (x).*

Thus, parts (a) and (c) of Theorem 4.1 may be viewed as extensions of two
results of Grippo and Sciandrone (Ref. 27, Propositions 5.2, 5.3) for the
*case of f*0*being continuously differentiable and each f**k* being the indicator
function of some closed convex set. In turn, the first of these results
*extended a result of Zadeh (Ref. 26) for which f**k*≡*0 for all k. Part (b) makes*
*a less restrictive assumption on f than part (a), though its assumption on*
the BCD method is more restrictive. Part (b) is sharp in the sense that it is
*false if instead we assume that f is convex in every coordinate block. This*

is because the Powell 3-variable example (Ref. 28) is convex in each variable;

see Ref. 27, Section 6 for further discussions of the example. We will see an application (Example 6.4) in which part (b) applies but not part (a) nor (c).

**5. Convergence Analysis: II**

*The convergence analysis of the previous section assumes f to be con-*
tinuous on a bounded level set and makes no use of the special structure (1)
*of f. In this section, we show that this assumption can be relaxed by*
*exploiting the special structure (1), provided that f is quasiconvex and hemi-*
variate in each coordinate block. More precisely, we will make the following
*assumptions on f, f*0*, f*1*, . . . , f**N*:

(B1) *f*0*is continuous on dom f*0.

*(B2) For each k∈{1, . . . , N} and (x**j*)*j≠k**, the function x**k*>

*f (x*1*, . . . , x**N*) is quasiconvex and hemivariate.

(B3) *f*0*, f*1*, . . . , f**N*are lsc.

We will see some applications (Ref. 6, Section 3.4.3 and Examples 6.1–

*6.3) for which f satisfies this weaker assumption although it is not strictly*
convex. In addition, we will make one of the following technical assump-
*tions on f*0:

*(C1) dom f*0 *is open and f*0 tends to S at every boundary point of
*dom f*0.

*(C2) dom f*0G*Y*1B*· · ·BY**N**, for some Y**k*⊆

*R*

^{n}

^{k}*, kG1, . . . , N.*

*In contrast to Assumption C1, Assumption C2 allows f*0to have a finite
*value on bdry(dom f ). We will see in Example 6.2 a nonseparable function*
*f*0 that satisfies Assumptions B1–B3 and C2, but not C1. We show below
that Assumptions B1–B3, together with either Assumption C1 or C2, ensure
that every cluster point of the iterates generated by the BCD method is a
*coordinate minimum point of f. The proof of this result is patterned after*
an argument given by Bertsekas and Tsitsiklis (Ref. 6, pp. 220–221; also see
*Ref. 27), but is complicated by the fact that f is not necessarily differentiable*
(or even continuous) on its effective domain.

**Proposition 5.1.** *Suppose that f, f*0*, f*1*, . . . , f**N*satisfy Assumptions B1–

*B3 and that f*0satisfies either Assumption C1 or C2. Also, assume that the
*sequence {x** ^{r}*G

*(x*

*1*

^{r}*, . . . , x*

^{r}*N*)}

*r G0, 1,...*generated by the BCD method using

*the essentially cyclic rule is defined. Then, either { f (x*

*)}↓ −S, or else every*

^{r}*cluster point zG(z*1

*, . . . , z*

*N*

*) is a coordinatewise minimum point of f.*

**Proof.** *Since f (x*^{0}*)FS and f (x*^{rC1}*)⁄f (x*^{r}*) for all r, then either*
*{ f (x** ^{r}*)}↓ −S

*, or else { f (x*

^{r}*)} converges to some limit and { f (x*

*)A*

^{rC1}*f (x*

*)}→*

^{r}*0. Consider the latter case and let z be any cluster point of {x*

*}.*

^{r}*Since f is lsc by Assumption B3, we have*
*f (z) ⁄ lim*

*r→*S

*f (x** ^{r}*)FS,

*so z*∈dom f. We show below that z satisfies (4) for kG1, . . . , N.

First, we claim that, for any infinite subsequence

*{x** ^{r}*}

*r∈R*→

*z,*(10)

with

*R*

⊆{0, 1, . . .}, there holds that
*(x** ^{rC1}*}

*r∈R*→

*z.*(11)

We prove this by contradiction. Suppose that this were not true. Then, there exists an infinite subsequence

*R*

′of*R*

and a scalar (H0 such that
*兩兩x** ^{rC1}*A

*x*

*兩兩¤(,*

^{r}*for all r∈*

*R*

′.
By further passing to a subsequence, if necessary, we can assume that there
*is some nonzero vector d for which*

*{(x** ^{rC1}*A

*x*

*)*

^{r}*兾兩兩x*

*A*

^{rC1}*x*

*兩兩}*

^{r}*r∈R*′→

*d,*(12)

*and that the same coordinate block, say x*

*s*

*, is chosen t the (rC1)st iteration*

*for all r∈*

*R*

′. Moreover, (10) implies that { f0*(x*

*)}*

^{r}*r∈R*

*and { f*

*k*

*(x*

^{r}*k*)}

*r∈R*

*, kG*

*1, . . . , N, are bounded from below, which together with the convergence of*

*{ f (x*

^{r}*)}G{ f*0

*(x*

*)C∑*

^{r}

^{N}*k G1*

*f*

*k*

*(x*

^{r}*k*

*)} implies that { f*0

*(x*

*)}*

^{r}*r∈R*

*and { f*

*k*

*(x*

^{r}*k*)}

*r∈R*,

*k G1, . . . , N, are bounded. Hence, by further passing to a subsequence, if*necessary, we can assume that there is some scalarθ for which

*{ f*0*(x*^{r}*)Cf**s**(x*^{r}*s*)}*r∈R*′→θ. (13)

Fix anyλ∈[0, (]. Let

*zˆ GzC*λ*d,* (14)

*and for each r∈*

*R*

′, let
*xˆ** ^{r}*G

*x*

*Cλ*

^{r}

^{(x}

^{rC1}^{A}

*x*

*)*

^{r}*兾兩兩x*

*A*

^{rC1}*x*

*兩兩. (15) Then, by (10), (12), and (14),*

^{r}*{xˆ** ^{r}*}

*r∈R*′→

*zˆ.*(16)

*For each r∈*

*R*

′, we see from (2) that x

^{rC1}*is obtained from x*

*by minimizing*

^{r}*f with respect to x*

*s*, while the other coordinates are held fixed. Since

λ*兾兩兩x** ^{rC1}*A

*x*

*兩兩⁄λ兾(⁄1,*

^{r}*so xˆ*^{r}*lies on the line segment joining x*^{r}*with x** ^{rC1}*, this together with

*f (x*

^{rC1}*)⁄f (x*

*) and the quasiconvexity of*

^{r}*x*

*s*>

*f (x*

*1*

^{r}*, . . . , x*

^{r}*sA1*

*, x*

*s*,

*x*

^{r}*sC1*

*, . . . , x*

^{r}*N*) implies

*f (xˆ*^{r}*)⁄f (x** ^{r}*), ∀r∈

*R*

′.
*Since f is lsc, this and (16) imply zˆ∈dom f. Also, this and (1) and the obser-*
*vation that x*^{r}*and xˆ*^{r}*differ only in their sth coordinate block imply*

*f*0*(xˆ*^{r}*)Cf**s**(xˆ*^{r}*s**)⁄f*0*(x*^{r}*)Cf**s**(x*^{r}*s*), ∀r∈

*R*

′.
This combined with (13) yields

*r→*Slim*, r∈R*′*sup{ f*0*(xˆ*^{r}*)Cf**s**(xˆ*^{r}*s*)}⁄θ. (17)
Also, since

*{ f (x*^{rC1}*)Af (x** ^{r}*)}

*r∈R*′→0, we have equivalently that

*{ f*0*(x*^{rC1}*)Cf**s**(x*^{rC1}*s* *)Af*0*(x*^{r}*)Af**s**(x*^{r}*s*)}*r∈R*′→0,
so (13) implies

*{ f*0*(x*^{rC1}*)Cf**s**(x*^{rC1}*s* )}*r∈R*′→θ^{.} ^{(18)}
Let

δ^{G}*f*0*(zˆ)Cf**s**(zˆ**s*)Aθ.

*Since f*0 *and f**s*are lsc, we have from (16), (17) thatδ^{⁄}0. We claim that in
fact δ^{G}0. Suppose that this were not true, so that δ^{H}0. By (16) and the
*observation that, for all r*∈

*R*

′, xˆ

^{r}*and x*

^{r}*differ in only their sth coordinate*block, we have

*{(x** ^{r}*1

*, . . . , x*

^{r}*sA1*

*, zˆ*

*s*

*, x*

^{r}*sC1*

*, . . . , x*

^{r}*N*)}

*r∈R*′→

*zˆ.*(19)

*Moreover, the vector on the left-hand side of (19) is in dom f*0

*for all r*∈

*R*

′
*sufficiently large. Since zˆ*∈dom f0, this is certainly true under Assumption

*C1; under Assumption C2, this is also true because x*

*∈dom f0*

^{r}*for all r and*

*dom f*0 has a product structure corresponding to the coordinate blocks.

*Then, (18) together with (19) and the continuity of f*0 *on dom f*0 implies
*that, for all r*∈

*R*

′sufficiently large, there holds that
*f*0*(x** ^{r}*1

*, . . . , x*

^{r}*sA1*

*, zˆ*

*s*

*, x*

^{r}*sC1*

*, . . . , x*

^{r}*N*

*)Cf*

*s*

*(zˆ*

*s*)

⁄*f*0*(x*^{rC1}*)Cf**s**(x*^{rC1}*s* )Cδ兾2,

*or equivalently [via (1) and the observation that x*^{r}*and x** ^{rC1}*differ in only

*their sth coordinate block],*

*f (x** ^{r}*1

*, . . . , x*

^{r}*sA1*

*, zˆ*

*s*

*, x*

^{r}*sC1*

*, . . . , x*

^{r}*N*

*)⁄f (x*

*)Cδ兾2,*

^{rC1}*a contradiction to the fact that x*^{rC1}*is obtained from x*^{r}*by minimizing f*
*with respect to the sth coordinate block, while the other coordinates are*
held fixed. Hence,δ^{G}0 and therefore

*f*0*(zˆ)Cf**s**(zˆ**s*)Gθ.

Since the choice ofλ was arbitrary, we obtain [also using (14)]

*f*0*(zC*λ*d )Cf**s**(z**s*Cλ*d**s*)Gθ, ∀λ∈[0, (],

*where d**s**denotes the sth coordinate block of d. Since x*^{r}*and x** ^{rC1}* differ in

*only their sth coordinate block for all r∈*

*R*

′, then all coordinate blocks of
*d, except d*

*s*, are zero [see (12)], and the above relation, together with (1),

*shows that f (zC*λ

*d ) is constant (and finite) for all*λ∈[0, (], a contradiction

*to Assumption B2, namely, that f is hemivariate in the sth coordinate block.*

Hence, (11) holds.

*Since (11) holds for any subsequence {x** ^{r}*}

*r∈R*

*of {x*

^{r}*} converging to z,*

*we can apply (11) to the subsequence {x*

*}*

^{rC1}*r∈R*to conclude that

*{x*

*}*

^{rC2}*r∈R*→

*z and so on, yielding*

*{x** ^{rCj}*}

*r∈R*→

*z,*∀j G0, 1, . . . , T, (20)

*where T is the bound specified in the essentially cyclic rule.*

We claim that (20), together with Assumption C1 or C2, implies
*f*0*(z)Cf**k**(z**k**)⁄f*0*(z*1*, . . . , z**kA1**, x**k**, z**kC1**, . . . , z**N**)Cf**k**(x**k*), (21)
*for all x*_{k}*and all k∈{1, . . . , N}. To see this, fix any k*∈{1, . . . , N}. Since
the coordinate blocks are chosen according to the essentially cyclic rule,
*there exists some j∈{1, . . . , T} and an infinite subsequence*

*R*

′ ⊆*R*

such
*that the coordinate block x*

*k*

*is chosen at the (rCj)th iteration for all*

*r∈*

*R*

′. Then, for each r∈*R*

′, x

^{rCj}*k*

*minimizes f*0

*(x*

*1*

^{rCj}*, . . . , x*

^{rCj}*kA1*

*, x*

*k*,

*x*

^{rCj}*kC1*

*, . . . , x*

^{rCj}*N*

*)Cf*

*k*

*(x*

*k*

*) over all x*

*k*[see (1), (2), (3)], so that

*f*0*(x*^{rCj}*)Cf**k**(x*^{rCj}*k*)

⁄*f*0*(x** ^{rCj}*1

*, . . . , x*

^{rCj}*kA1*

*, x*

*k*

*, x*

^{rCj}*kC1*

*, . . . , x*

^{rCj}*N*

*)Cf*

*k*

*(x*

*k*), ∀x

*k*. (22)

*Fix any x*

*k*∈dom f

*k*

*such that (z*1

*, . . . , z*

*kA1*

*, x*

*k*

*, z*

*kC1*

*, . . . , z*

*N*)∈dom f0.

*Suppose that Assumption C1 holds, so dom f*0

*is open. Since z∈dom f*0,

then (20) implies that

*(x** ^{rCj}*1

*, . . . , x*

^{rCj}*kA1*

*, x*

*k*

*, x*

^{rCj}*kC1*

*, . . . , x*

^{rCj}*N*)∈dom f0,

*for all r*∈

*R*

′sufficiently large.
*Passing to the limit as r*→^{S}*, r∈*

*R*

′, and using the lsc property of f*k*

*and the continuity of f*0 *on the open set dom f*0, we obtain from (20)
and (22) that (21) holds. Suppose instead that Assumption C2 holds, so

*dom f*0G*Y*1B*· · ·BY**N*, *for some Y*1⊆ ℜ^{n}^{1}*, . . . , Y**N*⊆ ℜ^{n}* ^{N}*.

*Then, the first quantity on the right-hand side of (22) is finite for all r*∈ ℜ′. Passing to the limit as r→

^{S}

*, r∈ℜ′, and using the lsc property of f*

*k*

*and the continuity of f*0 *on dom f*0, we obtain from (20) and (22) that
*(21) holds. If x**k*∉dom f*k* *or (z*1*, . . . , z**kA1**, x**k**, z**kC1**, . . . , z**N*)∉dom f0, then
the right-hand side of (21) has the extended value S, so (21) holds
*trivially. Since the above choice of k was arbitrary, this shows that (21)*
*holds for all x**k* *and all k∈{1, . . . , N}. Then, it follows from (1) that (4)*

*holds for all kG1, . . . , N.* 䊐

Proposition 5.1 extends a result of Grippo and Sciandrone (Ref. 27,
*Proposition 5.1) for the special case where each f**k* is the indicator func-
*tion for some closed convex set and f*0 is continuously differentiable and
(block) coordinatewise strictly pseudoconvex. In turn, the latter result is
an extension of a result of Bertsekas and Tsitsiklis (Ref. 6, Proposition
*3.9 in Section 3.3.5), which assumes further f*0 to be convex. As a cor-
ollary of Proposition 5.1, we obtain the following convergence result for
the BCD method.

**Theorem 5.1.** *Suppose that f, f*0*, f*1*, . . . , f**N* satisfy Assumptions B1–

*B3 and that f*0 satisfies either Assumption C1 or C2. Also, assume that
*{x: f (x)⁄f (x*^{0}*)} is bounded. Then, the sequence {x** ^{r}*} generated by the BCD
method using the essentially cyclic rule is defined, bounded, and every

*cluster point is a coordinatewise minimum point of f.*

Theorem 5.1 extends a result of Auslender [see Theorem 1.2(a) in Ref.

*4, p. 95] for the special case where f*_{k}*is convex for all k, dom f*0G
*Y*1B*· · ·BY**N**for some closed convex sets Y**k*⊆ ℜ^{n}^{k}*, kG1, . . . , N, and f*0 is
*strongly convex and continuous on dom f*0.

**6. Applications**

We describe four interesting applications of the BCD method below.

*In all applications, the objective function f is not necessarily strictly convex*
nor differentiable everywhere on its effective domain.

**Example 6.1. Proximal Minimization Algorithm.** Let ψ:ℜ* ^{n}*> ℜ∪

{S} be a proper (i.e.,ψ兾≡S*) lsc function. Fix any scalar cH0, and consider*
*the proper lsc function f defined by*

*f (x, y) Gc兩兩xAy兩兩*^{2}Cψ*(x).*

Clearly, this function has the form (1) with
*f*0*(x, y)Gc兩兩xAy兩兩*^{2}, *f*1Gψ, *f*2≡0.

*Applying the BCD method to f yields a method whereby f (x, y) is alter-*
*nately minimized with respect to x and y. This method has the form*

*x** ^{rC1}*Garg min

*x* *c兩兩xAx** ^{r}*兩兩

^{2}Cψ

*(x),*

*r G0, 1, . . . ,*

*which is the proximal minimization algorithm with fixed parameter c for*
minimizing ψ; see Ref. 6, Section 3.4.3 and Refs. 36–37 and references
therein.

*It is easily seen that f, f*0*, f*1*, f*2*satisfy Assumptions B1–B3 and that f*0

*satisfies Assumptions A1 and C1. Moreover, f is regular everywhere on*
*dom f. Then, by Proposition 5.1, if*ψ *is bounded below (so, f is bounded*
*below), then every cluster point z of the iterates generated by the above*
proximal minimization algorithm is a stationary point ofψ^{, i.e.,}

ψ′(z; d )¤0, *for all d.*

*Notice that Theorem 4.1 is not applicable here, since f need not be continu-*
ous on its level sets.

**Example 6.2. Arimoto–Blahut Algorithm.** *Let P**ij**, iG1, . . . , n, jG*
*1, . . . , m, be given nonnegative scalars satisfying*

∑

*j*

*P**ij*G1, *for all i.*

*The P**ij* *may be viewed as probabilities. Consider the proper lsc function f*
defined by

*f (x, y) Gf*0*(x, y)Cf*1*(x)Cf*2*( y),*

where

*f*0*(x, y)G*冦^{j G1}^{∑}^{m}^{i G1}^{∑}^{n}^{P}^{ij}^{x}^{i}^{φ}^{( y}^{ij}^{兾x}^{i}^{),} *if x¤0, yH0,*

S, otherwise,

*f*1*(x)G*冦^{0,}^{S}^{,} ^{otherwise,}^{if}^{i G1}^{∑}^{n}^{x}^{i}^{G}^{1,}

*f*2*(y)G*冦^{0,} ^{if}^{i G1}^{∑}^{n}^{y}^{ij}^{G}^{1,} ∀j G1, . . . , m.

S, otherwise,

withφ*(t)G−log(t). In our notation, x is a vector in*ℜ^{n}*whose ith coordinate*
*is x**i**, and y is a vector in*ℜ^{nm}*whose ((iA1)mCj)th coordinate is y**ij*. Apply-
*ing the BCD method to f yields a method whereby f (x, y) is alternately*
*minimized with respect to x and y. This in turn can be seen to be the*
Arimoto–Blahut algorithm for computing the capacity of a discrete
memoryless communication channel (Refs. 11–12).

*It can be verified that f, f*0*, f*1*, f*2 are convex and satisfy Assumptions
*B1–B3. Convexity of f*0*follows from observing that (a, b) > a*φ*(b兾a) is con-*
*vex. Moreover, f has compact level sets and is continuous on each level set,*
*and f*0*satisfies Assumptions A2 and C2. Notice that f is not strictly convex*
*and f*0 does not satisfy Assumption A1 or C1. Thus, by Lemma 3.1 and
Theorem 5.1 or Theorem 4.1(c), the sequence of iterates generated by the
Arimoto–Blahut algorithm is bounded and each cluster point is a stationary
*point of f. By the convexity of f, this is in fact a minimum point of f. This*
result matches those obtained in Refs. 11–12. Analogous convergence
results are obtained for variants of the Arimoto–Blahut algorithm, whereby
we use, for example,

φ*(t)Gt log(t) or* φ*(t)G1兾t.*

**Example 6.3. Han Algorithm.** *Let f be the proper lsc convex function*
studied by Han [Ref. 30, (D′)],

*f (x*1*, . . . , x**N*)G(1*兾2)兩兩x*1C*· · ·Cx**N*A*d*兩兩^{2}C∑^{N}

*k G1*

*f**k**(x**k*),

*where d is a given vector in*ℜ^{m}*and each f**k*:ℜ* ^{m}*>ℜ∪{S} is a proper lsc

*convex function. Also see Ref. 18 for a special case where f*

*k*is the support

*function of a closed convex set. Clearly, f is of the form (1) with*
*f*0*(x*1*, . . . , x**N*)G(1兾2)兩兩x1C*· · ·Cx**N*A*d*兩兩^{2}.

*Han proposed in Ref. 30 an algorithm for minimizing f, which may be*
viewed as an instance of the BCD method using the cyclic rule, as was
shown in Ref. 22.

*It is seen easily that f, f*0*, f*1*, . . . , f**N* satisfy Assumptions B1–B3 and
*that f*0 satisfies Assumptions A1 and C1. Thus, by Lemma 3.1 and Prop-
*osition 5.1 [also see the remark following (3)], if f has a minimum point,*
then the iterates generated by the Han algorithm are defined and every
*cluster point is a minimum point of f. This result matches Proposition 4.3*
in Ref. 30. On the other hand, by using the convexity of the functions,
stronger convergence results can be obtained; see Refs. 22, 38.

**Example 6.4. Blind Source Separation.** In Ref. 33, Zibulevsky and
Pearlmutter studied an optimization formulation of the blind source separ-
ation, whereby an error term of the form

(1兾2σ^{2})兩兩ASAX兩兩^{2}*F*C∑

*j,t*

*f**j*^{t}*(s*^{t}*j*),
is minimized with respect to

*A∈ℜ*^{mBn}*and SG[s*^{t}*j*]*j G1,...,n,t G1,...,T*∈ℜ* ^{nBT}*.

*Here, X*∈ℜ* ^{mBT}* are the given data;

*兩兩·兩兩*

*F*denotes the Frobenious norm;

σ^{H}*0; and each f**j** ^{t}*:ℜ>[0, S] is a proper convex function that is continu-
ous on its effective domain and has bounded level sets. In Ref. 31, the

*particular choice of f*

*j*

^{t}*(·)G兩·兩 is used. To ensure the existence of an optimal*solution, it was suggested in Ref. 33 that constraints such as

*兩兩A**i*兩兩⁄1, *i G1, . . . , m,* (23)

*be imposed, where A**i**denotes the ith row of A. The objective function of*
*this problem has the form (1) with NG1CnT,*

*f*0*(A, s*^{1}1*, . . . , s*^{T}*n*)G(1兾2σ^{2})兩兩ASAX兩兩^{2}*F*,
*f*1*(A)G*冦^{0,} ^{if}^{兩兩A}^{i}^{兩兩⁄1,} *i G1, . . . , m,*

S, else,

*and f**j*^{t}*, jG1, . . . , n, tG1, . . . , T, as given. Notice that minimizing f with*
*respect to A entails minimizing a convex quadratic function over the*
*Cartesian product of m Euclidean balls, while minimizing f with respect to*
*each s*^{t}*j* entails minimizing the sum of a convex quadratic function of one

variable with a convex function of one variable. Thus, the BCD method
*applied to this f can be implemented fairly inexpensively. If we replace (23)*
by the single ball constraint

*兩兩A兩兩**F*⁄ρ,

for some fixed ρ^{H}*0, then minimizing f with respect to A can be solved*
efficiently using e.g. the More´–Sorenson method.

*It is not difficult to see that f is continuous on its effective domain and*
*has compact level sets. Moreover, f is convex in (s*^{1}1*, . . . , s*^{T}*n**), f*1 is convex,
*and f*0satisfies Assumption A1. Thus, by Lemma 3.1 and Theorem 4.1(b),
the iterates generated by the BCD method using the cyclic rule are defined
*and every cluster point is a stationary point of f. Notice that f is not pseudo-*
*convex in every pair of coordinate blocks and that f need not have at most*
*one minimum in each s*^{t}*j*, so neither Theorem 5.1, nor part (a) of Theorem
4.1, nor part (c) of Theorem 4.1 is applicable here.

*Instead of treating each s*^{t}*j*as a coordinate block, we can treat alterna-
*tively SG[s*^{t}*j*]*j,t**as a coordinate block. However, minimizing f with respect*
*to S is more difficult. In the case of f**j*^{t}*(·)G兩·兩, this would require solving a*
large convex quadratic programming problem. A comparison of a primal–

dual interior-point method and the BCD method for solving such a problem is given in Ref. 32.

**References**

1. O^{RTEGA}, J. M., and R^{HEINBOLDT}*, W. C., Iteratiûe Solution of Nonlinear Equa-*
*tions in Seûeral Variables, Academic Press, New York, NY, 1970.*

2. H^{ILDRETH}*, C., A Quadratic Programming Procedure, Naval Research Logistics*
Quarterly, Vol. 4, pp. 79–85, 1957; see also Erratum, Naval Research Logistics
Quarterly, Vol. 4, p. 361, 1957.

3. W^{ARGA}*, J., Minimizing Certain Conûex Functions, SIAM Journal on Applied*
Mathematics, Vol. 11, pp. 588–593, 1963.

4. A^{USLENDER}*, A., Optimisation Me´thodes Nume´riques, Masson, Paris, France,*
1976.

5. B^{ERTSEKAS}*, D. P., Nonlinear Programming, 2nd Edition, Athena Scientific,*
Belmont, Massachusetts, 1999.

6. B^{ERTSEKAS}, D. P., and T^{SITSIKLIS}*, J. N., Parallel and Distributed Computation:*

*Numerical Methods, Prentice-Hall, Englewood Cliffs, New Jersey, 1989.*

7. CENSOR, Y., and ZENIOS*, S. A., Parallel Optimization: Theory, Algorithms, and*
*Applications, Oxford University Press, Oxford, United Kingdom, 1997.*

8. L^{UENBERGER}*, D. G., Linear and Nonlinear Programming, Addison–Wesley,*
Reading, Massachusetts, 1973.

9. P^{OLAK}*, E., Computational Methods in Optimization: A Unified Approach,*
Academic Press, New York, NY, 1971.

10. Z^{ANGWILL}*, W. I., Nonlinear Programming, Prentice-Hall, Englewood Cliffs,*
New Jersey, 1969.

11. A^{RIMOTO}*, S., An Algorithm for Computing the Capacity of Arbitrary DMCs,*
IEEE Transactions on Information Theory, Vol. 18, pp. 14–20, 1972.

12. B^{LAHUT}*, R., Computation of Channel Capacity and Rate Distortion Functions,*
IEEE Transactions on Information Theory, Vol. 18, pp. 460–473, 1972.

13. H^{OWSON}, H. R., and S^{ANCHO}*, N. G. F., A New Algorithm for the Solution of*
*Multistate Dynamic Programming Problems, Mathematical Programming,*
Vol. 8, pp. 104–116, 1975.

14. K^{ORSAK}, A. J., and L^{ARSON}*, R. E., A Dynamic Programming Successiûe*
*Approximations Technique with Conûergence Proofs, Automatica, Vol. 6,*
pp. 253–260, 1970.

15. Z^{UO}, Z. Q., and W^{U}*, C. P., Successiûe Approximation Technique for a Class of*
*Large-Scale NLP Problems and Its Application to Dynamic Programming,*
Journal of Optimization Theory and Applications, Vol. 62, pp. 515–527, 1989.

16. STERN*, T. E., A Class of Decentralized Routing Algorithms Using Relaxation,*
IEEE Transactions on Communications, Vol. 25, pp. 1092–1102, 1977.

17. B^{REGMAN}*, L. M., The Relaxation Method of Finding the Common Point of Con-*
û*ex Sets and Its Application to the Solution of Problems in Conûex Programming,*
USSR Computational Mathematics and Mathematical Physics, Vol. 7, pp. 200–

217, 1967.

18. H^{AN}*, S. P., A Successiûe Projection Method, Mathematical Programming,*
Vol. 40, pp. 1–14, 1988.

19. KIWIEL*, K. C., Free-Steering Relaxation Methods for Problems with Strictly*
*Conûex Costs and Linear Constraints, Mathematics of Operations Research,*
Vol. 22, pp. 326–349, 1997.

20. L^{UO}, Z. Q., and T^{SENG}*, P., On the Conûergence Rate of Dual Ascent Methods*
*for Strictly Conûex Minimization, Mathematics of Operations Research, Vol. 18,*
pp. 846–867, 1993.

21. T^{SENG}*, P., Dual Ascent Methods for Problems with Strictly Conûex Costs and*
*Linear Constraints: A Unified Approach, SIAM Journal on Control and*
Optimization, Vol. 28, pp. 214–242, 1990.

22. T^{SENG}*, P., Dual Coordinate Ascent Methods for Nonstrictly Conûex Minimiz-*
*ation, Mathematical Programming, Vol. 59, pp. 231–247, 1993.*

23. M^{ANGASARIAN}, O. L., and D^{E}L^{EONE}*, R., Parallel Successiûe Oûerrelaxation*
*Methods for Symmetric Linear Complementarity Problems and Linear Programs,*
Journal of Optimization Theory and Applications, Vol. 54, pp. 437–446, 1987.

24. C^{EA}, J., and G^{LOWINSKI}*, R., Sur des Methodes d’Optimisation par Relaxation,*
Revue Franc¸aise d’Automatique, Informatique et Recherche Ope´rationnelle,
Vol. R3, pp. 5–32, 1973.

25. S^{ARGENT}, R. W. H., and S^{EBASTIAN}*, D. J., On the Conûergence of Sequential*
*Minimization Algorithms, Journal of Optimization Theory and Applications,*
Vol. 12, pp. 567–575, 1973.

26. Z^{ADEH}*, N., A Note on the Cyclic Coordinate Ascent Method, Management*
Science, Vol. 16, pp. 642–644, 1970.

27. G^{RIPPO}, L., and S^{CIANDRONE}*, M., On the Conûergence of the Block Nonlinear*
*Gauss–Seidel Method under Conûex Constraints, Operations Research Letters,*
Vol. 26, pp. 127–136, 2000.

28. P^{OWELL}*, M. J. D., On Search Directions for Minimization Algorithms, Math-*
ematical Programming, Vol. 4, pp. 193–201, 1973.

29. L^{UO}, Z. Q., and T^{SENG}*, P., Error Bounds and Conûergence Analysis of Feasible*
*Descent Methods: A General Approach, Annals of Operations Research, Vol. 46,*
pp. 157–178, 1993.

30. H^{AN}*, S. P., A Decomposition Method and Its Application to Conûex Program-*
*ming, Mathematics of Operations Research, Vol. 14, pp. 237–248, 1989.*

31. B^{OFILL}, P., and Z^{IBULEVSKY}*, M., Sparse Undetermined ICA: Estimating the*
*Mixing Matrix and the Sources Separately, Technical Report UPC-DAC-2000-*
7, Universitat Polite`cnica de Catalunya, Barcelona, Spain, 1999.

32. S^{ARDY}, S., B^{RUCE}, A., and T^{SENG}*, P., Block Coordinate Relaxation Methods*
*for Nonparametric Waûelet Denoising, Journal of Computational and Graphical*
Statistics, Vol. 9, pp. 361–379, 2000.

33. Z^{IBULEVSKY}, M., and P^{EARLMUTTER}*, B., Blind Source Separation by Sparse*
*Decomposition, Technical Report CS99-1, Computer Science Department,*
University of New Mexico, Albuquerque, New Mexico, 1999.

34. M^{ANGASARIAN}*, O. L., Nonlinear Programming, McGraw-Hill, New York, NY,*
1969.

35. R^{OCKAFELLAR}*, R. T., Conûex Analysis, Princeton University Press, Princeton,*
New Jersey, 1970.

36. M^{ARTINET}*, B., Determination Approche´e d’un Point Fixe d’une Application*
*Pseudo-Contractante: Cas de l ’Application Prox, Comptes Rendus des Se´ances*
de l’Acade´mie des Sciences, Vol. 274A, pp. 163–165, 1972.

37. R^{OCKAFELLAR}*, R. T., Augmented Lagrangians and Applications of the Proximal*
*Point Algorithm in Conûex Programming, Mathematics of Operations Research,*
Vol. 1, pp. 97–116, 1976.

38. B^{AUSCHKE}, H. H., and L^{EWIS}*, A. S., Dykstra’s Algorithm with Bregman Projec-*
*tions: A Conûergence Proof, Optimization (to appear).*