and the exponential multiplier method for convex symmetric cone programming

(1)

DOI 10.1007/s10589-008-9227-0

An entropy-like proximal algorithm

and the exponential multiplier method for convex symmetric cone programming

Jein-Shan Chen· Shaohua Pan

Received: 28 April 2006 / Revised: 6 December 2008 / Published online: 18 December 2008

Abstract We introduce an entropy-like proximal algorithm for the problem of min- imizing a closed proper convex function subject to symmetric cone constraints. The algorithm is based on a distance-like function that is an extension of the Kullback- Leiber relative entropy to the setting of symmetric cones. Like the proximal algorithms for convex programming with nonnegative orthant cone constraints, we show that, under some mild assumptions, the sequence generated by the proposed algorithm is bounded and every accumulation point is a solution of the considered problem. In addition, we also present a dual application of the proposed algorithm to the symmetric cone linear program, leading to a multiplier method which is shown to possess similar properties as the exponential multiplier method (Tseng and Bertsekas in Math. Program. 60:1–19,1993) holds.

Keywords Symmetric cone optimization· Proximal-like method · Entropy-like distance· Exponential multiplier method

1 Introduction

Symmetric cone programming provides a unified framework for linear programming, second-order cone programming and semidefinite programming, which arise from a

J.-S. Chen is a member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan.

J.-S. Chen (

⁾

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail:jschen@math.ntnu.edu.tw

S. Pan

School of Mathematical Sciences, South China University of Technology, Guangzhou 510640, China e-mail:shhpan@scut.edu.cn

(2)

wide range of applications in engineering, economics, management science, optimal control, combinatorial optimization, and other fields; see [1,16,28] and references therein. Recently, symmetric cone programming, especially symmetric cone linear programming (SCLP), has attracted the attention of some researchers with a focus on the development of interior point methods similar to those for linear programming;

see [9,10,23]. Although interior point methods were successfully applied for SCLPs, it is worthwhile to explore other solution methods for general convex symmetric cone optimization problems.

LetA = (V, ◦, ·, ·) be a Euclidean Jordan algebra, where (V, ·, ·) is a finite dimensional inner product space over the real fieldR and “◦” denotes the Jordan product which will be defined in the next section. LetK be the symmetric cone in V. In this paper, we consider the following convex symmetric cone programming (CSCP):

min f (x)

s.t. x_K0, (1)

where f : V → (−∞, ∞] is a closed proper convex function, and x _K 0 means x∈ K. In general, for any x, y ∈ V, we write x _K yif x− y ∈ K and write x _Ky if x−y ∈ int(K). A function is closed if and only if it is lower semi-continuous, and a function is proper if f (x) <∞ for at least one x ∈ V and f (x) > −∞ for all x ∈ V.

Notice that the CSCP is a special class of convex programs, and hence in princi- ple it can be solved via general convex programming methods. One such method is the proximal point algorithm for minimizing a convex function f (x) onRⁿ which generates a sequence{x^k}k∈Nvia the following iterative scheme:

x^k= argmin

x∈Rⁿ

f (x)+ 1

μkx − x^k⁻¹²₂

, (2)

where μk>0 and 2denotes the Euclidean norm inRⁿ. This method was first introduced by Martinet [17], based on the Moreau proximal approximation of f (see [18]), and further developed and studied by Rockafellar [20,21]. Later, several gen- eralizations of the proximal point algorithm have been considered where the usual quadratic proximal term in (2) is replaced by a nonquadratic distance-like function;

see, for example, [5,7,8,14,25]. Among others, the algorithms using an entropy-like distance [13,14,25,26] for minimizing a convex function f (x) subject to x∈ Rⁿ₊, generate the iterates by

x⁰∈ Rⁿ₊₊

x^k= argminx∈Rⁿ₊

f (x)+_μ¹_kdϕ(x, x^k⁻¹)

, (3)

where dϕ(·, ·) : Rⁿ₊₊× Rⁿ₊₊→ Rⁿ₊is the entropy-like distance defined by

d_ϕ(x, y)=

n i=1

y_iϕ(x_i/y_i) (4)

(3)

with ϕ satisfying certain conditions; see [13,14,25,26]. An important choice of ϕ is the function ϕ(t)= t ln t − t + 1, for which the corresponding dϕgiven by

d_ϕ(x, y)=

n i=1

x_iln xi− xiln yi+ yi− xi

(5)

is the popular Kullback-Leibler entropy from statistics and that is the “entropy” ter- minology stems from. One key feature of entropic proximal methods is that they generate a sequence staying in the interior ofRⁿ₊automatically, and thus eliminates the combinatorial nature of the problem. One of the main applications of such proximal methods is to the dual of smooth convex programs, yielding twice continuously differentiable nonquadratic augmented Lagrangians and thereby allowing the usage of Newton’s methods.

The main purpose of this paper is to propose an interior proximal-like method and the corresponding dual augmented Lagrangian method for the CSCP (1). Specifically, by using the Euclidean Jordan algebraic techniques, we extend the entropy-like proximal algorithm defined by (3)–(4) with ϕ(t)= t ln t − t + 1 to the solution of (1). For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to an exponential multiplier method shown to possess properties analogous to the method proposed by [3,27] for convex programming over nonnegative orthant coneRⁿ₊.

The paper is organized as follows. Section2 reviews some basic concepts and materials on Euclidean Jordan algebras which are needed in the analysis of the algorithm. In Sect.3, we introduce a distance-like function H to measure how close between two points in the symmetric coneK and investigate some related properties. Furthermore, we outline a basic proximal-like algorithm with the measure func- tion H . The convergence analysis of the algorithm is the main content of Sect.4. In Sect.5, we consider a dual application of the algorithm to the SCLP and establish the convergence results for the corresponding multiplier method. We close this paper with some remarks in Sect.6.

2 Preliminaries on Euclidean Jordan algebra

This section recalls some concepts and results on Euclidean Jordan algebras that will be used in the subsequent sections. More detailed expositions of Euclidean Jordan algebras can be found in Koecher’s lecture notes [15] and Faraut and Korányi’s mono- graph [11].

Let V be a finite dimensional inner space endowed with a bilinear mapping (x, y)→ x ◦ y : V × V → V. For a given x ∈ V, let L(x) be the linear operator ofV defined by

L(x)y := x ◦ y for every y ∈ V.

The pair (V, ◦) is called a Jordan algebra if, for all x, y ∈ V, (i) x◦ y = y ◦ x,

(4)

(ii) x◦ (x²◦ y) = x²◦ (x ◦ y), where x²:= x ◦ x.

In a Jordan algebra (V, ◦), x ◦ y is said to be the Jordan product of x and y. Note that a Jordan algebra is not associative, i.e., x◦ (y ◦ z) = (x ◦ y) ◦ z may not hold in general. If for some element e∈ V, x ◦ e = e ◦ x = x for all x ∈ V, then e is called a unit element of the Jordan algebra (V, ◦). The unit element, if exists, is unique.

A Jordan algebra does not necessarily have a unit element. For x∈ V, let ζ(x) be the degree of the minimal polynomial of x, which can be equivalently defined as

ζ (x):= min

k: {e, x, x²,· · · , x^k} are linearly dependent . Then the rank of (V, ◦) is well defined by r := max{ζ(x) : x ∈ V}.

A Jordan algebra (V, ◦), with a unit element e ∈ V, defined over the real field R is called a Euclidean Jordan algebra or formally real Jordan algebra, if there exists a positive definite symmetric bilinear form onV which is associative; in other words, there exists onV an inner product denoted by ·, ·_Vsuch that for all x, y, z∈ V:

(iii) x ◦ y, z_V= y, x ◦ z_V.

In a Euclidean Jordan algebraA = (V, ◦, ·, ·_V), we define the set of squares as K :=

x²: x ∈ V .

By [11, Theorem III. 2.1],K is a symmetric cone. This means that K is a self-dual closed convex cone with nonempty interior and for any two elements x, y∈ int(K), there exists an invertible linear transformationT : V → V such that T (K) = K and T (x) = y.

Here are two popular examples of Euclidean Jordan algebras. LetSⁿbe the space of n×n real symmetric matrices with the inner product given by X, Y _Sⁿ:= Tr(XY ), where XY is the matrix multiplication of X and Y and Tr(XY ) is the trace of XY . Then, (Sⁿ,◦, ·, ·Sⁿ)is a Euclidean Jordan algebra with the Jordan product defined by

X◦ Y := (XY + Y X)/2, X, Y ∈ Sⁿ.

In this case, the unit element is the identity matrix I inSⁿand the cone of squaresK is the set of all positive semidefinite matrices inSⁿ. LetRⁿbe the Euclidean space of dimension n with the usual inner productx, yRⁿ= x^Ty. For any x= (x1, x2), y= (y1, y2)∈ R × Rⁿ⁻¹, define x◦ y := (x^Ty, x1y2+ y1x2)^T. Then (Rⁿ,◦, ·, ·_Rⁿ)is a Euclidean Jordan algebra, also called the quadratic forms algebra. In this algebra, the unit element e= (1, 0, . . . , 0)^T andK = {x = (x1, x₂)∈ R × Rⁿ⁻¹: x1≥ x22}.

Recall that an element c∈ V is said to be idempotent if c²= c. Two idempotents cand q are said to be orthogonal if c◦ q = 0. One says that {c1, c2, . . . , ck} is a complete system of orthogonal idempotents if

c²_j= cj, c_j◦ ci= 0 if j = i for all j, i = 1, 2, . . . , k, and

k j=1

c_j= e.

(5)

An idempotent is said to be primitive if it is nonzero and cannot be written as the sum of two other nonzero idempotents. We call a complete system of orthogonal primitive idempotents a Jordan frame. Then we have the following spectral decomposition theorem.

Theorem 2.1 [11, Theorem III. 1.2] Suppose thatA = (V, ◦, ·, ·_V)is a Euclidean Jordan algebra and the rank ofA is r. Then for any x ∈ V, there exists a Jordan frame {c1, c2, . . . , cr} and real numbers λ1(x), λ2(x), . . . , λr(x), arranged in the decreas- ing order λ1(x)≥ λ2(x)≥ · · · ≥ λr(x), such that x=_r

j=1λj(x)cj. The numbers λj(x)(counting multiplicities), which are uniquely determined by x, are called the eigenvalues,_r

j=1λj(x)cjthe spectral decomposition of x, and tr(x)=_r

j=1λj(x) the trace of x.

From [11, Proposition III.1.5], a Jordan algebra (V, ◦) with a unit element e ∈ V is Euclidean if and only if the symmetric bilinear form tr(x◦ y) is positive definite.

Therefore, we may define another inner product onV by

x, y := tr(x ◦ y) ∀x, y ∈ V.

By the associativity of tr(·) [11, Proposition II. 4.3], the inner product·, · is associa- tive, i.e., for all x, y, z∈ V, there holds that x, y ◦ z = y, x ◦ z. Thus, the operator L(x) for each x ∈ V is symmetric with respect to the inner product ·, · in the sense that

L(x)y, z = y, L(x)z ∀y, z ∈ V.

In the sequel, we let

x, x =

_r

j=1

λ²_j(x)

1/2

∀x ∈ V,

and denote by λmin(·) and λmax(·) the smallest and the largest eigenvalue of x, respectively. Then, by Lemma 13 and the proof of Lemma 14 in [23], we can prove the following lemma.

Lemma 2.1 Let x, y∈ V, then we can bound the minimum eigenvalue of x + y as follows:

λmin(x)+ λmin(y)≤ λmin(x+ y) ≤ λmin(x)+ λmax(y).

Let g: R → R be a scalar-valued function. Then, it is natural to define a vector- valued function associated with the Euclidean Jordan algebra (V, ◦, ·, ·_V)by

g^sc(x):= g(λ1(x))c₁+ g(λ2(x))c₂+ · · · + g(λr(x))c_r, (6) where x∈ V has the spectral decomposition x =_r

j=1λ_j(x)c_j. This function is called the Löwner operator in [24] and was shown to have the following important property.

(6)

Lemma 2.2 [24, Theorem 13] For any x=_r

j=1λ_j(x)c_j, let g^scbe defined by (6).

Then g^scis (continuously) differentiable at x if and only if g is (continuously) differ- entiable at all λj(x). Furthermore, the derivative of g^scat x, for any h∈ V, is given by

(g^sc)(x)h=

r j=1

g(λj(x))cj, hcj+

1≤j<l≤r

4[λi(x), λj(x)]gcj◦ (cl◦ h)

with

[λi(x), λ_j(x)]g:=g(λ_i(x))− g(λj(x))

λi(x)− λj(x) ∀ i, j = 1, 2, . . . , r and i = j.

In fact, the Jacobian (g^sc)(·) is a linear and symmetric operator, and can be written as

(g^sc)(x)=

r j=1

g(λ_j(x))Q(cj)+ 2

r i,j=1,i =j

[λi(x), λ_j(x)]gL(cj)L(ci), (7)

whereQ(x) := 2L²(x)− L(x²)for any x∈ V is called the quadratic representation ofV.

Finally, we recall the spectral function generated by a symmetric function. LetP denote the set of all permutations of r-dimensional vectors. A subset ofR^r is said to be symmetric if it remains unchanged under every permutation ofP. Let S be a symmetric set inR^r. A real-valued function f : S → R is said to be symmetric if for every permutation P ∈ P and each s ∈ S, there holds that f (P s) = f (s). For any x∈ V with the spectral decomposition x =_r

j=1λj(x)cj, define K:= {x ∈ V | λ(x) ∈ S} with λ(x) = (λ1(x), . . . , λr(x))^T being the spectral vector of x. Then F: K → R defined by

F (x):= f (λ(x)) (8)

is called the spectral function generated by f . From Theorem 41 of [2], F is (strictly) convex if f is (strictly) convex.

Unless otherwise stated, in the rest of this paper, the notationA = (V, ◦, ·, ·) represents a Euclidean Jordan algebra of rank r and dim(V) = n. For a closed proper convex function f: V → (−∞, +∞], we denote by domf := {x ∈ V | f (x) < +∞}

the domain of f . The subdifferential of f at x0∈ V is the convex set

∂f (x0)= {ξ ∈ V | f (x) ≥ f (x0)+ ξ, x − x0 ∀ x ∈ V} . (9) Sincex, y = tr(x ◦ y) for any x, y ∈ V, the above subdifferential set is equivalent to

∂f (x0)=

ξ∈ V | f (x) ≥ f (x0)+ tr

ξ◦ (x − x0)

, x∈ V

. (10)

For a sequence{x^k}k∈N, the notation N denotes the set of natural numbers.

(7)

3 Entropy-like proximal algorithm

To solve the CSCP (1), we suggest the following proximal-like minimization algorithm:

x⁰ _K0

x^k= argminx_K0

f (x)+_μ¹_kH (x, x^k⁻¹)

, (11)

where μk>0 and H: V × V → (−∞, +∞] is defined by

H (x, y):=

tr(x◦ ln x − x ◦ ln y + y − x) ∀ x ∈ int(K), y ∈ int(K),

+∞ otherwise. (12)

This algorithm is indeed a proximal-type one except that the classical quadratic term x − x^k⁻¹²₂is replaced by the distance-like function H to guarantee that{x^k}k∈N⊂ int(K), thus leading to an interior proximal-like method (see Proposition4.1).

By the definition of Löwner operator, clearly, the function H (x, y) is well-defined for all x, y∈ int(K). Moreover, the domain of x ∈ int(K) can be extended to x ∈ K by adopting the convention 0 ln 0≡ 0. The function H is a natural extension of the distance-like entropy function in (5), and is used to measure the “distance” between two points inK. In fact, H will become the entropy function dϕin (5) if the Euclid- ean Jordan algebraA is specified as (Rⁿ,◦, ·, ·_Rⁿ)with “◦” denoting the compo- nentwise product of two vectors inRⁿand·, ·_Rⁿthe usual Euclidean inner product.

As shown by Proposition3.1below, most of the important properties, but not all, of d_ϕ also hold for H .

The following two technical lemmas will be used to investigate the favorable properties of the distance measure H . Lemma3.1states an extension of Von Neu- mann inequality to Euclidean Jordan algebras, and Lemma3.2gives the properties of tr(x◦ ln x).

Lemma 3.1 [2] For any x, y ∈ V, we have tr(x ◦ y) ≤ _r

j=1λj(x)λj(y)= λ(x)^Tλ(y), where λ(x) and λ(y) are the spectral vectors of x and y, respectively.

Lemma 3.2 For any x∈ K, let (x) := tr(x ◦ ln x). Then, we have the following results.

(a) (x) is the spectral function generated by the symmetric entropy function φ (u)=_r

j=1u_jln uj ∀u ∈ R^r₊. (13) (b) (x) is continuously differentiable on int(K) with ∇(x) = ln x + e.

(c) The function (x) is strictly convex overK.

Proof (a) Suppose that x has the spectral decomposition x=_r

j=1λj(x)cj. Let g(t )= t ln t ∀ t ∈ R.

(8)

From Sect. 2, it follows that the vector-valued function x◦ln x is the Löwner function g^sc(x), i.e., g^sc(x)= x ◦ ln x. Clearly, g^scis well-defined for any x∈ K and

g^sc(x)= x ◦ ln x =

r j=1

λ_j(x)ln(λj(x))c_j.

Therefore,

(x)= tr(x ◦ ln x) = tr(g^sc(x))=

r j=1

λ_j(x)ln(λj(x))= φ(λ(x))

with φ: R^r₊→ R given by (13). Since the function φ is symmetric, (x) is the spectral function generated by the symmetric function φ in view of (8).

(b) From Lemma2.2, g^sc(x)= x ◦ ln x is continuously differentiable on int(K).

Thus, (x) is also continuously differentiable on int(K) because is the compo- sition of the trace function (clearly continuously differentiable) and g^sc. Now, it re- mains to find its gradient formula. From the fact that tr(x◦ y) = x, y, we have

(x)= tr(x ◦ ln x) = x, ln x.

Applying the chain rule for inner product of two functions, we then obtain

∇(x) = ln x + (∇ ln x)x = ln x + (ln x)x. (14) On the other hand, from formula (7) it follows that for any h∈ V,

(ln x)h=

r j=1

1

λ_j(x)cj, hcj+

1≤j<l≤r

ln(λj(x))− ln(λl(x))

λ_j(x)− λl(x) cj◦ (cl◦ h).

By this and the spectral decomposition of x, it is easy to compute that

(ln x)x=

r j=1

1

λj(x)cj, xcj+

1≤j<l≤r

λj(x)− λl(x) c_j◦ (cl◦ x)

=

r j=1

1

λ_j(x)λj(x)cj+

1≤j<l≤r

λ_j(x)− λl(x) λl(x)cj◦ cl

= e.

Combining with (14), we readily obtain the desired result.

(c) Note that the function φ in (13) is strictly convex overRⁿ₊. Therefore, the conclusion immediately follows from part (a) and Theorem 41 of [2]. Next we study the favorable properties of the distance-like function H . These properties play a crucial role in the convergence analysis of the algorithm defined by (11)–(12).

(9)

Proposition 3.1 Let H (x, y) be defined by (12). Then the following results hold.

(a) H (x, y) is continuous onK × int(K) and H(·, y) is strictly convex for any y ∈ int(K).

(b) For any fixed y∈ int(K), H(·, y) is continuously differentiable on int(K) with

∇xH (x, y)= ln x − ln y.

(c) H (x, y)≥ 0 for any x ∈ K and y ∈ int(K), and H(x, y) = 0 if and only if x = y.

(d) H (x, y)≥ d(λ(x), λ(y)) ≥ 0 for any x ∈ K, y ∈ int(K), where d(·, ·) is defined by

d(u, v)=

n i=1

u_iln ui− uiln vi+ vi− ui

∀ u ∈ R^r₊, v∈ R^r₊₊. (15)

(e) For fixed y∈ int(K), the level sets LH(x, γ ):= {x ∈ K | H(x, y) ≤ γ } are bounded for all γ ≥ 0, and for fixed x ∈ K, the level sets LH(y, γ ):= {y ∈ int(K) | H (x, y) ≤ γ } are bounded for all γ ≥ 0.

Proof (a) Since x◦ln x, x ◦ln y are continuous in x ∈ K and y ∈ int(K), and the trace function is also continuous, the function H is continuous overK×int(K). Notice that H (x, y)= (x) − tr(x ◦ ln y) + tr(y) − tr(x), (16)

(x)is strictly convex overK by Lemma3.2(c), and the other terms on the right hand side of (16) are clearly convex for any fixed y∈ int(K). Therefore, H(·, y) is strictly convex for any fixed y∈ int(K).

(b) From the expression of H (x, y) given by (16) and Lemma3.2(b), obviously, the function H (·, y) is continuously differentiable in int(K). Moreover,

∇xH (x, y)= ∇x(x)− ln y − e = ln x − ln y.

(c) From the definition of (x) and its gradient formula shown as in Lemma3.2(b),

(x)− (y) − (y), x− y

= tr(x ◦ ln x) − tr(y ◦ ln y) − ln y + e, x − y

= tr(x ◦ ln x) − tr(x ◦ ln y) − tr(x) + tr(y)

= H (x, y) (17)

for any x∈ K and y ∈ int(K). In addition, the strict convexity of implies that

(x)− (y) − (y), x− y ≥ 0

and the equality holds if and only if x= y. The two sides readily give the desired result.

(10)

(d) Using the definition of H and Lemma3.1, we have for all x∈ K and y ∈ int(K),

H (x, y)= tr(x ◦ ln x + y − x) − tr(x ◦ ln y)

≥ tr(x ◦ ln x + y − x) −

r j=1

λj(x)ln(λj(y))

=

r j=1

λ_j(x)ln(λj(x))+ λj(y)− λj(x) −

r j=1

λ_j(x)ln(λj(y))

=

r j=1

λ_j(x)ln(λj(x))− λj(x)ln(λj(y))+ λj(y)− λj(x)

= d(λ(x), λ(y)).

The nonnegativity of d(λ(x), λ(y)) is direct by the definition of d(·, ·) in (15).

(e) For fixed y∈ int(K), from part (d) we have LH(x, γ )⊆ {x ∈ K | d(λ(x), λ(y))≤ γ } for all γ ≥ 0. Since the sets {u ∈ R^r₊| d(u, v) ≤ γ } are bounded for all γ≥ 0 by [26, Lemma 2.3], we have from the continuity of λ(·) that the sets LH(x, γ ) are bounded for all γ≥ 0. Similarly, for fixed x ∈ Kⁿ, using Lemma 2.1(i) of [26]

the sets LH(y, γ )are bounded for all γ≥ 0.

Proposition 3.2 Let H (x, y) be defined by (12). Suppose {x^k}k∈N ⊆ K and {y^k}k∈N⊂ int(K) are bounded sequences such that H (x^k, y^k)→ 0. Then, as k → ∞, we have

(a) λj(x^k)− λj(y^k)→ 0 for all j = 1, 2, . . . , r.

(b) tr(x^k− y^k)→ 0.

Proof (a) From Proposition 3.1(d), H (x^k, y^k) ≥ d(λ(x^k), λ(y^k))≥ 0. Hence, H (x^k, y^k)→ 0 implies d(λ(x^k), λ(y^k))→ 0. By the definition of d(·, ·) given by (15),

d(λ(x^k), λ(y^k))=

r j=1

λ_j(y^k)ϕ

λ_j(x^k)/λ_j(y^k)

with ϕ(t)= t ln t − t + 1 (t ≥ 0). Since ϕ(t) ≥ 0 for any t ≥ 0, each term of the above sum is nonnegative, and consequently, d(λ(x^k), λ(y^k))→ 0 implies

λ_j(y^k)ϕ

λ_j(x^k)/λ_j(y^k)

→ 0, j = 1, 2, . . . , r.

This is equivalent to saying that

λ_j(x^k)ln(λj(x^k))− λj(x^k)ln(λj(y^k))+ λj(y^k)− λj(x^k)→ 0, j = 1, 2, . . . , r.

Since{λj(x^k)} and {λj(y^k)} are bounded, using Lemma A.1 of [6] then yields that λ_j(x^k)− λj(y^k)→ 0 for all j = 1, 2, . . . , r.

(11)

(b) Since tr(x^k−y^k)=_r

j=1(λ_j(x^k)−λj(y^k)), the result follows from part (a). To close this section, we present two useful relations for the function H , which can be easily verified by using the definition of H and recalling the nonnegativity of H .

Proposition 3.3 Let H (x, y) be defined by (12). For all x, y∈ int(K) and z ∈ K, we have

(a) H (z, x)− H (z, y) = tr(z ◦ ln y − z ◦ ln x + x − y).

(b) tr((z− y) ◦ (ln y − ln x)) = H (z, x) − H(z, y) − H(y, x) ≤ H (z, x) − H (z, y).

4 Convergence analysis of the algorithm

The convergence analysis of the entropy-like proximal algorithm defined by (11)–

(12) is similar to that of the proximal point method [12] for convex minimization problems and the proximal-like algorithm [4] using Bregman functions. In this section, we will present a global convergence rate estimate for the algorithm in terms of function values. For this purpose, we need to make the following assumptions for the CSCP (1):

(A) inf{f (x) | x _K0} := f∗>−∞ and domf ∩ int(K) = ∅.

In addition, we denote byX^∗:= {x ∈ K | f (x) = f∗} the solution set of (1).

First, we show that the algorithm in (11)–(12) is well defined, i.e., it generates a sequence{x^k}k∈N⊂ int(K). This is a direct consequence of Proposition4.1below.

For the proof of Proposition4.1, we need the following technical lemma.

Lemma 4.1 For any x_K0, y _K0 and tr(x◦ y) = 0, we always have x = 0.

Proof From the self-duality ofK and [11, Proposition I. 1.4], we have that u∈ int(K) ⇐⇒ u, v > 0, ∀ 0 = v ∈ K,

which together with tr(x◦ y) = x, y immediately implies x = 0. Proposition 4.1 For any fixed y ∈ int(K) and μ > 0, let Fμ(·, y) := f (·) + μ⁻¹H (·, y). Then, under assumption (A), we have the following results.

(a) The function Fμ(·, y) has bounded level sets.

(b) There exists a unique x(y, μ)∈ int(K) such that x(y, μ):= argmin

x_K0

Fμ(x, y)

and

μ⁻¹(ln y− ln x(y, μ)) ∈ ∂f (x(y, μ)). (18)

(12)

Proof (a) Since Fμ(·, y) is convex, to show that Fμ(·, y) has bounded level sets, it suffices to show that for any ν≥ f_∗>−∞, the level set L(ν) := { x | Fμ(x, y)≤ ν}

is bounded. Let ν:= (ν − f_∗)μ. Clearly, we have L(ν)⊂ LH(x, ν). Moreover, by Proposition3.1(e), LH(x, ν)is bounded. Therefore, L(ν) is bounded.

(b) From part (a), Fμ(·, y) has bounded level sets, which in turn implies the existence of the minimum point x(y, μ). Also, the strict convexity of Fμ(x, y) by Proposition 3.1(a) guarantees the uniqueness. Under assumption (A), using Proposition 3.1(b) and the optimality condition for the minimization problem argmin_x

K0Fμ(x, y)gives that 0∈ ∂f (x(y, μ)) + μ⁻¹

ln x(y, μ)− ln y

+ ∂δ(x(y, μ)|K), (19) where δ(z|K) = 0 if z _K0 and+∞ otherwise. We will show that ∂δ(x(y, μ)|K) = {0} and hence the desired result follows. Notice that for any x^k∈ int(K) with x^k→ x∈ bd(K),

ln x^k =

_r

j=1

[ln(λj(x^k))]²

1/2

→ +∞,

where the second relation is due to the continuity of λj(·) and ln t. Consequently,

∇xH (x^k, y) ^k ^k

This by [19, pp. 251] means that H (·, y) is essentially smooth, and then ∂xH (x, y)=

∅ for all x ∈ bd(K) by [19, Theorem 26.1]. Together with (19), we must have x(y, μ)∈ int(K). Furthermore, from [19, pp. 226], it follows that

∂δ(z|K) = {v ∈ V | v _K0, tr(v◦ z) = 0}.

Using Lemma4.1then yields ∂δ(x(y, μ)|K) = {0}. Thus, the proof is completed. Next we establish several important properties for the algorithm defined by (11)–

(12), from which our convergence result will follow.

Proposition 4.2 Let{x^k}k∈Nbe the sequence generated by the algorithm defined by (11)–(12), and let σm:=_m

k=1μ_k. Then, the following results hold.

(a) The sequence{f (x^k)}k∈Nis nonincreasing.

(b) μk(f (x^k)− f (x)) ≤ H(x, x^k⁻¹)− H(x, x^k)for all x∈ K.

(c) σm(f (x^m)− f (x)) ≤ H (x, x⁰)− H(x, x^m)for all x∈ K.

(d) {H (x, x^k)}k∈N is nonincreasing for any x∈ X^∗, if the optimal setX^∗ = ∅.

(e) H (x^k, x^k⁻¹)→ 0 and tr(x^k− x^k⁻¹)→ 0, if the optimal set X^∗ = ∅.

Proof (a) From the definition of x^kin (11), it follows that

f (x^k)+ μ⁻¹_k H (x^k, x^k⁻¹)≤ f (x^k⁻¹)+ μ⁻¹_k H (x^k⁻¹, x^k⁻¹).

(13)

Since H (x^k, x^k⁻¹)≥ 0 and H(x^k⁻¹, x^k⁻¹)= 0 (see Proposition3.1(c)), we obtain f (x^k)≤ f (x^k⁻¹) ∀ k ∈ N,

and therefore the sequence{f (x^k)}k∈Nis nonincreasing.

(b) From (18),−μ⁻¹_k (ln x^k− ln x^k⁻¹)∈ ∂f (x^k). Plugging this into the formula of

∂f (x^k)given by (10), we have f (x)≥ f (x^k)− μ⁻¹_k tr

(x− x^k)◦ (ln x^k− ln x^k⁻¹)

∀ x ∈ V.

Consequently, for any x∈ K, μ_k· (f (x^k)− f (x)) ≤ tr

(x− x^k)◦ (ln x^k− ln x^k⁻¹)

= H(x, x^k⁻¹)− H(x, x^k)− H (x^k, x^k⁻¹)

≤ H(x, x^k⁻¹)− H(x, x^k), (20) where the equality holds due to Proposition3.3(b), and the last inequality follows from the nonnegativity of H (x, y).

(c) First, summing up the inequalities in part (b) over k= 1, 2, 3, . . . , m yields that

m k=1

μ_kf (x^k)− σmf (x)≤ H(x, x⁰)− H (x, x^m) ∀ x ∈ K. (21)

On the other hand, since f (x^m)≤ f (x^k)by part (a), multiplying this inequality by μ_kand summing up the inequalities over k= 1, 2, 3, . . . , m then gives that

m k=1

μkf (x^k)≥

m k=1

μkf (x^m)= σmf (x^m). (22)

Now, combining (22) with (22), we readily obtain the desired result.

(d) Since f (x^k)− f (x) ≥ 0 for all x ∈ X^∗, the result immediately follows from part (b).

(e) From part (d), the sequence{H(x, x^k)}k∈N is nonincreasing for any x∈ X^∗. This together with H (x, x^k)≥ 0 implies that {H(x, x^k)}k∈Nis convergent. Thus,

H (x, x^k⁻¹)− H(x, x^k)→ 0 ∀ x ∈ X^∗. (23) On the other hand, from (20) and part (d), we have

0≤ μk(f (x^k)− f (x)) ≤ H(x, x^k⁻¹)− H (x, x^k)− H (x^k, x^k⁻¹), ∀ x ∈ X^∗, which in turn implies

H (x^k, x^k⁻¹)≤ H(x, x^k⁻¹)− H (x, x^k).

Using (23) and the nonnegativity of H (x^k, x^k⁻¹), the first result is then obtained.

Since {x^k}k∈N ⊂ {y ∈ int(K) | H (x, y) ≤ H (x, x⁰)} with x ∈ X^∗, the sequence

(14)

{x^k}k∈N is bounded. Thus, the second result follows by the first result and Propo-

sition3.2(b).

By now, we have proved that the algorithm in (11)–(12) is well-defined and sat- isfies some favorable properties. With these properties, we are ready to establish the convergence results of the algorithm which is one of the main purposes of this paper.

Proposition 4.3 Let{x^k}k∈N be the sequence generated by the algorithm in (11)–

(12), and let σm:=_m

k=1μk. Then, under assumptions (A), the following results hold.

(a) f (x^m)− f (x) ≤ σm⁻¹H (x, x⁰)for all x∈ K.

(b) If σm→ +∞, then limm→+∞f (x^m)= f_∗.

(c) If the optimal setX^∗ = ∅, then the sequence {x^k}k∈Nis bounded, and if, in addi- tion, σm→ +∞, every accumulation point is a solution of the CSCP (1).

Proof (a) This result follows by Proposition 4.2(c) and the nonnegativity of H (x, x^m).

(b) Since σm→ +∞, passing the limit to the inequality in part (a), we have lim sup

m→+∞f (x^m)≤ f (x) ∀ x ∈ K, which particularly implies

lim sup

m→+∞f (x^m)≤ inf{f (x) : x ∈ K}.

This together with the fact that f (x^m)≥ f_∗= inf{f (x) : x ∈ K} yields the result.

(c) From Proposition3.1(d), H (x,·) has bounded level sets for every x ∈ K. Also, from Proposition4.2(d),{H(x, x^k)}k∈N is nonincreasing for all x∈ X^∗ifX^∗ = ∅.

Thus, the sequence{x^k}k∈Nis bounded and therefore has an accumulation point. Let ˆx ∈ K be an accumulation point of {x^k}k∈N. Then{x^k^j} → ˆx for some kj → +∞.

Since f is lower semi-continuous, we have f (ˆx) = lim infk_j→+∞f (x^k^j)(see [22, p. 8]). On the other hand, we also have f (x^k^j)→ f∗, hence f (ˆx) = f∗which says

that ˆx is a solution of (1).

Proposition4.3(a) indicates an estimate for global rate of convergence which is similar to the one obtained for the proximal-like algorithms of convex minimization over nonnegative orthant cone. Analogously, as remarked in [6, Remark 4.1], global convergence of{x^k} itself to a solution of (1) can be achieved under the assumption that{x^k} ⊂ int(K) has a limit point in int(K). Nonetheless, this assumption is rather stringent.

5 Exponential multiplier method for SCLP

In this section, we give a dual application of the entropy-like proximal algorithm in (11)–(12), leading to an exponential multiplier method for the symmetric cone linear

(15)

program

(SCLP) min

−b^Ty: c −

m i=1

y_ia_i_K0, y∈ R^m

,

where c∈ V, ai∈ V and b ∈ R^m. For the sake of notation, we denote the feasible set of (SCLP) byF := {y ∈ R^m: A(y) _K0}, where A : R^m→ V is a linear operator defined by

A(y):= c −

m i=1

yiai ∀y ∈ R^m. (24)

In addition, we make the following assumptions for the problem (SCLP):

(A1) The optimal solution set of (SCLP) is nonempty and bounded;

(A2) Slater’s condition holds, i.e., there exists a ˆy ∈ R^msuch that A(ˆy) _K0.

Notice that the Lagrangian function associated with (SCLP) is defined as follows:

L(y, x)=

−b^Ty− tr[x ◦ A(y)] if x ∈ K,

+∞ otherwise.

Therefore, the dual problem corresponding to the SCLP is given by (DSCLP) max

x_K0 inf

y∈R^mL(y, x), which can be explicitly written as

(DSCLP)

⎧⎪

⎨

⎪⎩

max−tr(c ◦ x) s.t. tr(ai◦ x) = bi,

x_K0.

i= 1, 2, . . . , m,

From the standard convex analysis arguments in [19], it follows that under assumption (A2) there is no duality gap between (SCLP) and (DSCLP), and furthermore, the solution set of (DSCLP) is nonempty and compact.

To solve the problem (SCLP), we introduce the following multiplier-type al- gorithm: given x⁰∈ int(K), generate the sequence {y^k}k∈N ⊆ R^m and {x^k}k∈N⊂ int(K) by

y^k= argmin

y∈R^m

−b^Ty+ μ⁻¹_k tr

exp

−μkA(y)+ ln x^k⁻¹

, (25)

x^k= exp

−μkA(y^k)+ ln x^k⁻¹

, (26)

where the parameters μk satisfy μk> ¯μ > 0 for all k ∈ N. The algorithm can be viewed as a natural extension of the exponential multiplier method used in convex programs over nonnegative orthant cones (see, e.g., [3,27]) to symmetric cones. In

(16)

the rest of this section, we will show that the algorithm defined by (25)–(26) possesses similar properties.

We first present two technical lemmas, where Lemma5.1collects some properties of the Löwner operator exp(z) and Lemma5.2characterizes the recession cone ofF.

Lemma 5.1

(a) For any z∈ V, there always holds that exp(z) ∈ int(K).

(b) The function exp(z) is continuously differentiable everywhere with (exp(z))e= ∇z(exp(z))e= exp(z) ∀ z ∈ V.

(b) The function tr[exp(_m

i=1yiai)] is continuously differentiable everywhere with

∇ytr

exp

_m

i=1

y_ia_i

= A^T

exp

_m

i=1

y_ia_i

,

where y∈ R^mand ai∈ V for all i = 1, 2, . . . , m, and A^T : V → R^mbe a linear transformation defined by A^T(x)= (a1, x, . . . , am, x)^T for any x∈ V.

Proof (a) The result is clear since for any z∈ V with z =r

j=1λ_j(z)c_j, all eigen- values of exp(z), given by exp(λj(z))for j= 1, 2, . . . , r, are positive.

(b) By Lemma2.2, clearly, exp(z) is continuously differentiable everywhere and

(exp(z))h=

r j=1

exp(λj(z))cj, hcj

+

1≤j<l≤r

4exp(λj(z))− exp(λl(z))

λj(z)− λl(z) c_j◦ (cl◦ h) for any h∈ V. From this formula, we particularly have

(exp(z))e=

r j=1

exp(λj(z))cj, ecj

+

1≤j<l≤r

4exp(λj(z))− exp(λl(z))

λ_j(z)− λl(z) c_j◦ (cl◦ e)

=

r j=1

exp(λj(z))c_j= exp(z).

(c) The first part is direct by the continuous differentiability of trace function and part (b), and the second part follows from the differential chain rule. Lemma 5.2 LetF_∞denote the recession cone of the feasible setF. Then,

F_∞=

d∈ R^m: A(d) − c _K0 .

(17)

Proof Assume that d∈ R^msuch that A(d)− c _K0. If d= 0, clearly, d ∈ F_∞. If d = 0, we take any y ∈ F. From the definition of the linear operator A, for any τ > 0,

A(y+ τd) = c −

m i=1

(yi+ τdi)ai= A(y) + τ(A(d) − c) _K0.

This, by the definition of recession direction [19, pp. 61], shows that d∈ F∞. Thus, we prove that{d ∈ R^m: A(d) − c _K0} ⊆ F_∞.

Now take any d∈ F_∞ and y∈ F. Then A(y + τd) _K 0 for any τ > 0, and therefore, λmin[A(y + τd)] ≥ 0. This must imply λmin(A(d)− c) ≥ 0. If not, using the fact that

λ_min[A(y + τd)] = λmin[A(y) + τ(A(d) − c)]

≤ λmin(τ (A(d)− c)) + λmax(A(y))

= τλmin(A(d)− c) + λmax(A(y))

where the inequality is due to Lemma2.1, and letting τ→ +∞, we then have λ_min[A(y + τd)] → −∞,

contradicting the fact that λmin[A(y + τd)] ≥ 0. Consequently, A(d) − c _K 0. To- gether with the above discussions, we show thatF∞= {d ∈ R^m| A(d) − c _K0}. Next we establish the convergence of the algorithm in (25)–(26). We first prove that the sequence generated is well-defined, which is implied by the following lemma.

Lemma 5.3 For any y∈ R^mand μ > 0, let F: R^m→ R be defined by F (y):= −b^Ty+ μ⁻¹tr

exp(−μA(y) + ln x^k⁻¹)

. (27)

Then under assumption (A1) the minimum set of F is nonempty and bounded.

Proof We prove that F is coercive by contradiction. Suppose not, i.e., some level set of F is not bounded. Then, there exists a sequence{y^k} ⊆ R^msuch that

y^k → +∞, lim

k→+∞

y^k

y^k= d = 0 and F (y^k)≤ δ (28) for some δ ∈ R. Since exp(−μA(y) + ln x^k⁻¹)∈ int(K) for any y ∈ R^m by Lemma5.1(a),

tr

exp

− μA(y) + ln x^k⁻¹

=

r j=1

exp

λ_j

− μA(y) + ln x^k⁻¹

>0.

(18)

Therefore, F (y^k)≤ δ implies that the following two inequalities hold

−b^Ty^k< δ, (29)

r j=1

exp

λ_j

− μA(y^k)+ ln x^k⁻¹

≤ μ(δ + b^Ty^k). (30)

Due to the nonnegativity of the exponential function, (30) is equivalent to saying that exp

λ_j

− μA(y^k)+ ln x^k⁻¹

≤ μ(δ + b^Ty^k) ∀ j = 1, 2, . . . , r. (31)

Dividing byy^k on the both sides and using the monotonicity of ln t (t > 0) then gives

λ_j

−μA(y^k)+ ln x^k⁻¹

− ln(y^k)

≤ ln

μ(δ+ b^Ty^k)/y^k

≤μ(δ+ b^Ty^k)

y^k − 1 ∀ j = 1, 2, . . . , r.

where the last inequality is due to ln t≤ t − 1 (t > 0). Now dividing y^k on the both sides again and using the homogeneity of the function λj(·) yields

λj

−μA(y^k)

y^k +ln x^k⁻¹ y^k

−ln(y^k)

y^k ≤μ(δ+ b^Ty^k) y^k² − 1

y^k ∀ j = 1, 2, . . . , r.

Passing to the limit k→ +∞ on the both sides and noting that y^k → +∞, we get

λ_j

μ

m i=1

d_ia_i

≤ 0 ∀ j = 1, 2, . . . , r,

which, by the homogeneity of λj(·) again, implies

μλ_j(A(d)− c) = μλj

−

m i=1

d_ia_i

≥ 0 ∀ j = 1, 2, . . . , r.

Consequently, A(d)− c _K 0. From Lemma5.2,F∞= {d ∈ R^m: A(d) − c _K0}.

This together with−b^Td≤ 0 shows that there exists a nonzero d ∈ R^m such that d∈ F∞but−b^Td≤ 0, obviously contradicting assumption (A1). Thus, we complete

the proof.

To analyze the convergence of the algorithm defined by (25)–(26), we also need the following lemma which states that the sequence{x^k}k∈N generated by (25)–(26) is exactly the one given by the algorithm in (11)–(12) when applied to the dual problem (DSCLP).