### An entropy-like proximal algorithm and the exponential multiplier method for symmetric cone programming

Jein-Shan Chen ^{1}
Department of Mathematics
National Taiwan Normal University

Taipei 11677, Taiwan

Shaohua Pan^{2}

School of Mathematical Sciences South China University of Technology

Guangzhou 510640, China

April 28, 2006 (revised May 16, 2007)

Abstract. We introduce an entropy-like proximal algorithm for the problem of minimizing a closed proper convex function subject to the symmetric cone constraint. The algorithm is based on a distance-like function that is an extension of the Kullback-Leiber relative entropy to the setting of symmetric cones. Like the proximal algorithm for convex programming with nonnegative orthant cone constraint, we show that, under some mild assumptions, the sequence generated by the proposed algorithm is bounded and every accumulation point is a solution of the considered problem. In addition, we also present a dual application of the proposed algorithm to the symmetric cone linear program, leading to a multiplier method which is shown to enjoy properties similar to the exponential multiplier method in [29].

Key words. Symmetric cone optimization, proximal-like method, entropy-like distance, exponential multiplier method.

### 1 Introduction

Symmetric cone programming (SCP) provides a unified framework for various mathemati- cal programming including the linear programming (LP), the second-order cone program- ming (SOCP) and the semidefinite programming (SDP). Such programming problems arise

1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan. E-mail: jschen@math.ntnu.edu.tw.

2E-mail:shhpan@scut.edu.cn.

naturally in a wide range of applications in engineering, economics, optimal control, man- agement science, combinatorial optimization, and other fields; see [1, 16, 30] and references therein. Recently, the symmetric cone optimization problem, especially the symmetric cone linear programming (SCLP), has recently attracted the attention of some researchers with a focus on the development of interior point methods similar to those used in linear programming; see [9, 10, 25]. Although interior-point methods were successfully applied for solving the SCLP, it is worthwhile to explore other solution methods for more general convex symmetric cone optimization problems.

*Let A = (V, ◦, h·, ·i) be a Euclidean Jordan algebra, where (V, h·, ·i) is a finite dimen-*
*sional inner product space over real field R and “◦” denotes the Jordan product which will*
*be defined in the next section. Let K be the symmetric cone in V (will be introduced in Sec.*

2). In this paper, we consider the following convex symmetric cone programming (CSCP):

*min f (x)*

*s.t.* *x º*_{K}*0,* (1)

*where f : V → (−∞, ∞] is a closed proper convex function, and x º*_{K}*0 means x ∈ K. In*
*general, for any x, y ∈ V, we write x º*_{K}*y if x − y ∈ K and write x Â*_{K}*y if x − y ∈ int(K).*

A function is closed if and only if it is lower semi-continuous (l.s.c. for short) and a function
*is proper if f (x) < ∞ for at least one x ∈ V and f (x) > −∞ for all x ∈ V.*

Notice that the CSCP is indeed a special class of convex programs, and therefore it can
in principle be solved via general convex programming methods. One such method is the
*proximal point algorithm for minimizing a convex function f (x) on R** ^{n}* which generates a

*sequence {x*

^{k}*}*

*by the iterative scheme as below:*

_{k∈N}*x** ^{k}* = argmin

*x∈R*^{n}

½

*f (x) +* 1

*µ*_{k}*kx − x*^{k−1}*k*^{2}

¾

*,* (2)

*where µ*_{k}*is a sequence of positive numbers and k · k denotes the Euclidean norm in R** ^{n}*.
This method was originally introduced by Martinet [17], based on the Moreau proximal

*approximation of f (see [18]), and then further developed and studied by Rockafellar [22,*23]. Later, several generalizations of the proximal algorithm have been considered where the usual quadratic proximal term in (2) is replaced by nonquadratic distance-like functions;

see, e.g. [5, 7, 8, 14, 27]. Among others, the proximal algorithms using an entropy-
*like distance [27, 14, 13, 28], designed for minimizing a convex function f (x) subject to*
*nonnegative orthant constraints x ∈ R*^{n}_{+}, generate the iterates by

*x*^{0} *∈ R*^{n}_{++}

*x** ^{k}* = argmin

*x∈R*^{n}_{+}

½

*f (x) +* 1

*µ*_{k}*d*_{ϕ}*(x, x** ^{k−1}*)

¾

*,* (3)

*where d*_{ϕ}*(·, ·) : R*^{n}_{++}*× R*^{n}_{++}*→ R*^{n}_{+} is the entropy-like distance defined by
*d**ϕ**(x, y) =*

X*n*
*i=1*

*y**i**ϕ(x**i**/y**i*) (4)

*with ϕ satisfying certain conditions; see [13, 14, 27, 28]. An important choice of ϕ is the*
*case that ϕ(t) = t ln t − t + 1 for which the corresponding d** _{ϕ}* given by

*d*_{ϕ}*(x, y) =*
X*n*

*i=1*

h

*x*_{i}*ln x*_{i}*− x*_{i}*ln y*_{i}*+ y*_{i}*− x** _{i}*
i

(5)
is the well known Kullback-Leibler entropy from statistics and that is the “entropy” termi-
nology stems from. One key feature of entropic proximal methods is that they automatically
generate a sequence staying the interior of R^{n}_{+}*, i.e. {x*^{k}*}*_{k∈N}*⊂ R*^{n}_{++}, and hence eliminates
the combinatorial nature of the problem. One of the main applications of these proximal
methods is to the dual of smooth convex programs, yielding twice continuously differen-
tiable nonquadratic augmented Lagrangians and thus allowing the use of Newton’s method.

The main motivation of this paper is to develop unified proximal-type algorithms and
their corresponding dual augmented Lagrangian methods for the convex symmetric cone
optimization problems. Specifically, by using the Euclidean Jordan algebraic techniques,
we introduce an interior proximal-type algorithm for the CSCP which can be viewed as an
*extension of the entropy-like proximal algorithm defined by (3)-(4) with ϕ(t) = t ln t−t+1.*

For the proposed algorithm, we establish its convergence properties, and also present a dual application to the SCLP, leading to an exponential multiplier method which is shown to possess properties analogous to the method proposed by [3, 29] for convex programming over nonnegative orthant cones.

The paper is organized as follows. Section 2 reviews some basic concepts and materials
on Euclidean Jordan algebras which are needed in the analysis of the algorithm. In Section
*3, we introduce a distance-like function H to measure how close between two points in*
*the symmetric cone K and show some related properties. In addition, we also outline a*
*basic proximal-like algorithm with the measure function H. The convergence analysis of*
the basic algorithm is studied in Section 4. In Section 5, we consider a dual application of
the algorithm to the SCLP and establish the convergence results for the resulted multiplier
method. We close with some final remarks in Section 6.

### 2 Preliminaries on Euclidean Jordan Algebra

In this section, we recall some concepts and results on Euclidean Jordan algebras that will be used in the subsequent sections. More detailed expositions of Euclidean Jordan algebras can be found in Koecher’s lecture notes [15] and the monograph by Faraut and Kor´anyi [11].

*Let V be a finite-dimensional vector space endowed with a bilinear mapping (x, y) → x◦y*
*from V × V into V. For a given x ∈ V, let L(x) be the linear operator of V defined by*

*L(x)y := x ◦ y for every y ∈ V.*

*The pair (V, ◦) is called a Jordan algebra if, for all x, y ∈ V,*

*(i) x ◦ y = y ◦ x,*

*(ii) x ◦ (x*^{2}*◦ y) = x*^{2}*◦ (x ◦ y), where x*^{2} *:= x ◦ x.*

*In a Jordan algebra (V, ◦), x ◦ y is said to be the Jordan product of x and y. Note that*
*a Jordan algebra is not associative, i.e., x ◦ (y ◦ z) = (x ◦ y) ◦ z may not hold in general.*

*If for some element e ∈ V, x ◦ e = e ◦ x = x for all x ∈ V, then e is called a unit element*
*of the Jordan algebra (V, ◦). The unit element, if exists, is unique. A Jordan algebra does*
*not necessarily have a unit element. For x ∈ V, let ζ(x) be the degree of the minimal*
*polynomial of x, which can be equivalently defined as*

*ζ(x) := min*©

*k : {e, x, x*^{2}*, · · · , x*^{k}*} are linearly dependent*ª
*.*
*Then the rank of (V, ◦) is well defined by r := max{ζ(x) : x ∈ V}.*

*A Jordan algebra (V, ◦), with a unit element e ∈ V, defined over the real field R is*
*called a Euclidean Jordan algebra or formally real Jordan algebra, if there exists a positive*
definite symmetric bilinear form on V which is associative; in other words, there exists on
*V an inner product denoted by h·, ·i*_{V} *such that for all x, y, z ∈ V:*

*(iii) hx ◦ y, zi*_{V} *= hy, x ◦ zi*_{V}.

*In a Euclidean Jordan algebra A = (V, ◦, h·, ·i*_{V}), we define the set of squares as
*K :=*©

*x*^{2} *: x ∈ V*ª
*.*

*By [11, Theorem III. 2.1], K is a symmetric cone. This means that K is a self-dual closed*
*convex cone with nonempty interior and for any two elements x, y ∈ int(K), there exists*
*an invertible linear transformation T : V → V such that T (K) = K and T (x) = y.*

Here are two popular examples of Euclidean Jordan algebras. Let S* ^{n}* be the space of

*n × n real symmetric matrices with the inner product given by hX, Y i*

_{S}

^{n}*:= Tr(XY ), where*

*XY is the usual matrix multiplication of X and Y and Tr(XY ) is the trace of matrix XY .*Then, (S

^{n}*, ◦, h·, ·i*S

*) is a Euclidean Jordan algebra with the Jordan product given by*

^{n}*X ◦ Y := (XY + Y X)/2,* *X, Y ∈ S*^{n}*.*

*In this case, the unit element is the identity matrix I in S** ^{n}* and the cone of squares

*K is the set of all positive semidefinite matrices (denoted as S*

^{n}_{+}) in S

*. Let R*

^{n}*be the*

^{n}*Euclidean space of dimension n with the usual inner product hx, yi*

_{R}

^{n}*= x*

^{T}*y. For any*

*x = (x*

_{1}

*, x*

_{2}

*), y = (y*

_{1}

*, y*

_{2}

*) ∈ R × R*

*, define*

^{n−1}*x ◦ y := (x*^{T}*y, x*_{1}*y*_{2}*+ y*_{1}*x*_{2})^{T}*.*

Then (R^{n}*, ◦, h·, ·i*_{R}^{n}*) is a Euclidean Jordan algebra, also called the quadratic forms algebra.*

*In this algebra, the unit element e = (1, 0)*^{T}*and the set of squares K = {x = (x*_{1}*, x*_{2}*) ∈*

*R × R*^{n−1}*: kx*2*k ≤ x*1*}.*

*Recall that an element c ∈ V is said to be idempotent if c*^{2} *= c. Two idempotents c and*
*q are said to be orthogonal if c ◦ q = 0. One says that {c*_{1}*, c*_{2}*, · · · , c*_{k}*} is a complete system*
of orthogonal idempotents if

*c*^{2}_{j}*= c*_{j}*,* *c*_{j}*◦ c*_{i}*= 0 if j 6= i for all j, i = 1, 2, · · · , k,* and P_{k}

*j=1**c*_{j}*= e.*

*An idempotent is said to be primitive if it is nonzero and cannot be written as the sum of two*
other nonzero idempotents. We call a complete system of orthogonal primitive idempotents
*a Jordan frame. Then, we have the second version of the spectral decomposition theorem.*

*Theorem 2.1 [11, Theorem III. 1.2] Suppose that A = (V, ◦, h·, ·i*_{V}*) is a Euclidean Jor-*
*dan algebra and the rank of A is r. Then for any x ∈ V, there exists a Jordan frame*
*{c*_{1}*, c*_{2}*, · · · , c*_{r}*} and real numbers λ*_{1}*(x), λ*_{2}*(x), · · · , λ*_{r}*(x), arranged in the decreasing order*
*λ*_{1}*(x) ≥ λ*_{2}*(x) ≥ · · · ≥ λ*_{r}*(x), such that x =*P_{r}

*j=1**λ*_{j}*(x)c*_{j}*.*

*The numbers λ**j**(x) (counting multiplicities), which are uniquely determined by x, are called*
*the eigenvalues. We let λ*_{min}*(·) and λ*_{max}*(·) denote the smallest and the largest eigenvalues,*
*respectively. The trace of x, denoted as tr(x), is given by tr(x) :=*P_{r}

*j=1**λ*_{j}*(x).*

*From [11, Proposition III.1.5], a Jordan algebra (V, ◦) with a unit element e ∈ V is*
*Euclidean if and only if the symmetric bilinear form tr(x ◦ y) is positive definite, we may*
define another inner product on V by

*hx, yi := tr(x ◦ y), ∀ x, y ∈ V.*

*By the associativity of tr(·) [11, Proposition II. 4.3], the inner product h·, ·i is associative,*
*i.e., for all x, y, z ∈ V, there holds that hx, y ◦ zi = hy, x ◦ zi. Thus, the linear operator*
*L(x) for each x ∈ V is symmetric with respect to the inner product h·, ·i in the sense that*

*hL(x)y, zi = hy, L(x)zi,* *∀ y, z ∈ V.*

*In addition, we let k · k be the norm on V induced by the inner product h·, ·i, i.e.,*
*kxk :=* p

*hx, xi =*³P_{r}

*j=1**λ*^{2}_{j}*(x)*

´_{1/2}

*, ∀ x ∈ V.*

*Now we can present two important inequalities of the eigenvalue function λ*_{min}*(·).*

*Lemma 2.1 Let x, y ∈ V, then we can bound the minimum eigenvalue of x + y as follows:*

*λ*_{min}*(x) + λ*_{min}*(y) ≤ λ*_{min}*(x + y) ≤ λ*_{min}*(x) + λ*_{max}*(y).*

Proof. The first inequality can be found in [25, Lemma 14]. Thus, we only prove the second inequality. From [25, Lemma 13], we know

*λ*_{min}*(x) = min*

*u∈V*

*hu, x ◦ ui*

*hu, ui* *and λ*_{max}*(x) = max*

*u∈V*

*hu, x ◦ ui*
*hu, ui* *.*
Hence, there holds

*λ*min*(x + y) = min*

*u∈V*

*hu, (x + y) ◦ ui*

*hu, ui* = min

*u∈V*

*hu, x ◦ ui + hu, y ◦ ui*
*hu, ui*

*≤ min*

*u∈V*

½*hu, x ◦ ui*

*hu, ui* + max

*u∈V*

*hu, y ◦ ui*
*hu, ui*

¾

*= λ*_{min}*(x) + λ*_{max}*(y).*

Consequently, the second inequality follows. *2*

*Let φ : R → R be a scalar valued function. Then, it is natural to define a vector-valued*
*function associated with the Euclidean Jordan algebra (V, ◦, h·, ·i*_{V}) by

*φ*^{sc}*(x) := φ(λ*_{1}*(x))c*_{1}*+ φ(λ*_{2}*(x))c*_{2}*+ · · · + φ(λ*_{r}*(x))c*_{r}*,* (6)
*where x ∈ V has the spectral decomposition x =*P_{r}

*j=1**λ*_{j}*(x)c*_{j}*. This function φ*^{sc} is called
the L¨owner operator and shown to enjoy the following property [26].

*Lemma 2.2 [26, Theorem 13] For any x =* P_{r}

*j=1**λ*_{j}*(x)c*_{j}*, let φ*^{sc} *be defined as in (6).*

*Then φ*^{sc} *is (continuously) differentiable at x if and only if for each j ∈ {1, 2, · · · , r}, φ is*
*(continuously) differentiable at λ**j**(x). The derivative of φ*^{sc} *at x, for any h ∈ V, is given*
*by*

*(φ*^{sc})^{0}*(x)h =*
X*r*

*j=1*

*[φ*^{[1]}*(λ(x))]*_{jj}*hc*_{j}*, hic** _{j}*+ X

*1≤j<l≤r*

*4[φ*^{[1]}*(λ(x))]*_{jl}*c*_{j}*◦ (c*_{l}*◦ h)*
*with*

*[φ*^{[1]}*(λ(x))]** _{ij}* :=

*φ(λ**i**(x)) − φ(λ**j**(x))*

*λ*_{i}*(x) − λ*_{j}*(x)* *if λ*_{i}*(x) 6= λ*_{j}*(x)*
*φ*^{0}*(λ**i**(x))* *if λ**i**(x) = λ**j**(x)*

*,* *i, j = 1, 2, · · · , r.*

*In fact, the Jacobian (φ*^{sc})^{0}*(·) is a linear and symmetric operator, and can be written as*
*(φ*^{sc})^{0}*(x) =*

X*r*
*j=1*

*ϕ*^{0}*(λ*_{j}*(x))Q(c** _{j}*) + 2
X

*r*

*i,j=1,i6=j*

*[φ*^{[1]}*(λ(x))]*_{ij}*L(c*_{j}*)L(c*_{i}*),* (7)

*where Q(x) := 2L*^{2}*(x) − L(x*^{2}*) for any x ∈ V is called the quadratic representation of V.*

We next introduce the real-valued spectral functions on the Euclidean Jordan algebra.

*Let P be the set of all permutations of r-dimensional vectors. A subset of R** ^{r}* is said to

*be symmetric if it remains unchanged under every permutation of P (we adopt the same*notations used in [2])).

*Definition 2.1 Let S ⊆ R*^{r}*be a symmetric set. A real-valued function f : S → R is*
*symmetric if for every permutation P ∈ P and each s ∈ S, we have f (P s) = f (s).*

*For any x ∈ V with the spectral decomposition x =*P_{r}

*j=1**λ**j**(x)c**j*, we define
*K := {x ∈ V | λ(x) ∈ S},*

*where λ(x) = (λ*_{1}*(x), · · · , λ*_{r}*(x))*^{T}*is the spectral vector of x and S ⊆ R** ^{r}* is a symmetric set.

*Let f : S → R is a symmetric function. Then the function F : K → R given by*

*F (x) := f (λ(x))* (8)

*is the spectral function generated by f . Moreover, from [2, Theorem 41], we know that F*
*is (strictly) convex if f is (strictly) convex.*

*Unless otherwise stated, in the sequel, A = (V, ◦, h·, ·i) represents a Euclidean Jordan*
*algebra of rank r and dim(V) = n. For a closed proper convex function f : V → (−∞, +∞],*
*we denote the domain of f by dom(f ) := {x ∈ V | f (x) < +∞}. The subdifferential of f*
*at x*_{0} *∈ V is the convex set*

*∂f (x*_{0}*) = {ξ ∈ V | f (x) ≥ f (x*_{0}*) + hξ, x − x*_{0}*i ∀ x ∈ V} .* (9)
*Since hx, yi = tr(x ◦ y) for any x, y ∈ V, the above subdifferential set is equivalent to*

*∂f (x*_{0}*) = {ξ ∈ V | f (x) ≥ f (x*_{0}*) + tr(ξ ◦ (x − x*_{0}*)) ∀ x ∈ V} .* (10)
*For a sequence {x*^{k}*}**k∈N**, the notation N denotes the set of natural numbers.*

### 3 An entropy-like proximal algorithm for SCP

To solve the convex symmetric optimization problem CSCP, we in this section suggest an entropy-like proximal minimization algorithm defined as follows:

*x*^{0} *Â** _{K}* 0

*x*

*= argmin*

^{k}*x º** _{K}*0

½

*f (x) +* 1

*µ*_{k}*H(x, x** ^{k−1}*)

¾

*,* (11)

*where µ*_{k}*> 0 and H : V × V → (−∞, +∞] is defined by*
*H(x, y) :=*

½ *tr(x ◦ ln x − x ◦ ln y + y − x) ∀ x ∈ K, y ∈ int(K),*

*+∞* *otherwise.* (12)

This algorithm is indeed a proximal-type one, except that, instead of using the classical
*quadratic term kx − x*^{k−1}*k*^{2}*, we adopt the distance-like function H to guarantee that the*
*generated sequence {x*^{k}*}*_{k∈N}*satisfies x*^{k}*∈ int(K) for all k, thus leading to an “interior”*

proximal method (see Proposition 4.1).

*By the definition of L¨owner operator in Section 2, it is clear that the function H(x, y)*
*is well-defined for all x, y ∈ int(K). Moreover, the domain of x ∈ int(K) can be extended*
*to x ∈ K by adopting the convention that 0 ln 0 ≡ 0. The function H is a natural extension*
of the distance-like entropy function given as in (5), and is used to measure the “distance”

*between two points in K. In fact, H will become the entropy function d** _{ϕ}* given as in
(5) if the Euclidean Jordan algebra A is specified as (R

^{n}*, ◦, h·, ·i*

_{R}

^{n}*) with “◦” denoting the*componentwise product of two vectors in R

^{n}*and h·, ·i*

_{R}

*being the standard dot product*

^{n}*ha, bi*R

*=P*

^{n}

_{n}*i=1**a**i**b**i* *for every a, b ∈ R** ^{n}*. As shown by Proposition 3.1 below, most of the

*important properties, but not all, hold for d*

*ϕ*

*(·, ·) also hold for H(·, ·).*

In what follows, we present several technical lemmas that will be used in investigating
*some favorable properties of the distance measure H. We start with the extension to*
*Euclidean Jordan algebras of Von Neumann inequality, whose proof can be found in [2].*

*Lemma 3.1 For any x, y ∈ V, we have tr(x ◦ y) ≤* P_{r}

*j=1**λ**j**(x)λ**j**(y) = λ(x)*^{T}*λ(y), where*
*λ(x) and λ(y) are the spectral vectors of x and y, respectively.*

*Lemma 3.2 For any x ∈ K, let Φ(x) := tr(x ◦ ln x). Then, we have the following results.*

*(a) Φ(x) is the spectral function generated by the symmetric entropy function*
*Φ(u) =*b P_{r}

*j=1**u**j**ln u**j**,* *∀u ∈ R*^{r}_{+}*.* (13)
*(b) Φ(x) is continuously differentiable on x ∈ int(K) with ∇Φ(x) = ln x + e.*

*(c) The function Φ(x) is strictly convex over x ∈ K*^{n}*.*

*Proof. (a) Suppose that x has the spectral decomposition x =*P_{r}

*j=1**λ*_{j}*(x)c** _{j}*. Let

*φ(t) = t ln t, t ∈ R.*

*Then, from Section 2, we know that the vector-valued function x◦ln x is exactly the L¨owner*
*function φ*^{sc}*(x), i.e., φ*^{sc}*(x) = x ◦ ln x. Clearly, φ*^{sc} *is well-defined for any x ∈ K and*

*φ*^{sc}*(x) = x ◦ ln x =*
X*r*

*j=1*

*λ*_{j}*(x) ln(λ*_{j}*(x))c*_{j}*.*

Therefore

*Φ(x) = tr(φ*^{sc}*(x)) =*
X*r*

*j=1*

*λ*_{j}*(x) ln(λ*_{j}*(x)) = bΦ(λ(x)),*

where bΦ : R^{r}_{+} *→ R is given by (13). By Definition 2.1, clearly, the function b*Φ is symmetric.

*Consequently, Φ(x) is the spectral function generated by the symmetric function b*Φ in view
of (8).

*(b) From Lemma 2.2, we know that φ*^{sc}*(x) = x ◦ ln x is continuously differentiable on*
*x ∈ int(K). Thus, Φ(x) is also continuously differentiable on x ∈ int(K) because Φ is*
*the composition of the trace function (clearly continuously differentiable) and φ*^{sc}. Now, it
*remains to find its gradient formula. From the fact that tr(x ◦ y) = hx, yi, we have*

*Φ(x) = tr(x ◦ ln x) = hx, ln xi.*

Applying the chain rule for inner product of two functions, we then obtain

*∇Φ(x) = ln x + (∇ ln x)x = ln x + (ln x)*^{0}*x.* (14)
*On the other hand, from the formula (7) it follows that for any h ∈ V,*

*(ln x)*^{0}*h =*
X*r*

*j=1*

1

*λ*_{j}*(x)hc*_{j}*, hic** _{j}* + X

*1≤j<l≤r*

*ln(λ*_{j}*(x)) − ln(λ*_{l}*(x))*

*λ*_{j}*(x) − λ*_{l}*(x)* *c*_{j}*◦ (c*_{l}*◦ h).*

By this, it is easy to compute that
*(ln x)*^{0}*x =*

X*r*
*j=1*

1

*λ*_{j}*(x)hc*_{j}*, xic** _{j}* + X

*1≤j<l≤r*

*ln(λ**j**(x)) − ln(λ**l**(x))*

*λ*_{j}*(x) − λ*_{l}*(x)* *c*_{j}*◦ (c*_{l}*◦ x)*

=
X*r*

*j=1*

1

*λ*_{j}*(x)λ*_{j}*(x)c** _{j}*+ X

*1≤j<l≤r*

*ln(λ*_{j}*(x)) − ln(λ*_{l}*(x))*

*λ*_{j}*(x) − λ*_{l}*(x)* *λ*_{l}*(x)c*_{j}*◦ c*_{l}

*= e.*

This together with equation (14) yields the desired results.

(c) Note that the function bΦ in (13) is strictly convex over R^{n}_{+}. Therefore, the conclusion
immediately follows from part (a) and [2, Theorem 41]. *2*

Now we are in a position to summarize some of favorable properties of the distance-like
*function H. These properties play an important role in the convergence analysis (as will*
be seen in the next section) of the algorithm (11)-(12).

*Proposition 3.1 Let H(x, y) be given as in (12). Then the following results hold.*

*(a) H(x, y) is continuous on K × int(K) and H(·, y) is strictly convex for any y ∈ int(K).*

*(b) For any fixed y ∈ int(K), H(·, y) is continuously differentiable on int(K) with*

*∇*_{x}*H(x, y) = ln x − ln y.*

*(c) H(x, y) ≥ 0 for any x ∈ K and y ∈ int(K); moreover, H(x, y) = 0 if and only if x = y.*

*(d) H(x, y) ≥ d(λ(x), λ(y)) for any x ∈ K, y ∈ int(K), where d(·, ·) is defined by*

*d(u, v) :=*

X*n*
*i=1*

h

*u*_{i}*ln u*_{i}*− u*_{i}*ln v*_{i}*+ v*_{i}*− u** _{i}*
i

*∀ u ∈ R*^{r}_{+}*, v ∈ R*^{r}_{++}*.* (15)

*(e) The level sets of H(·, y) and H(x, ·) are respectively bounded for any fixed y ∈ int(K)*
*and x ∈ K.*

*Proof. (a) Since x ◦ ln x, x ◦ ln y are continuous in x ∈ K and y ∈ int(K), and the trace*
*function is also continuous, it gives that H is continuous over K × int(K). On the other*
*hand, from definition of H, it follows that*

*H(x, y) = Φ(x) − tr(x ◦ ln y) + tr(y) − tr(x).* (16)
*Notice that Φ(x) is strictly convex over K by Lemma 3.2 (c) and the other terms in the*
*above expression are clearly convex for any fixed y ∈ int(K). Therefore, H(·, y) is strictly*
*convex for any fixed y ∈ int(K).*

*(b) From the expression of H(x, y) given as (16) and Lemma 3.2 (b), clearly, the function*
*H(·, y) is continuously differentiable in int(K). Moreover,*

*∇*_{x}*H(x, y) = ∇*_{x}*Φ(x) − ln y − e = ln x − ln y.*

*(c) From the definition of Φ(x) and its gradient formula shown as in Lemma 3.2 (b),*
*Φ(x) − Φ(y) − hΦ*^{0}*(y), x − yi*

*= tr(x ◦ ln x) − tr(y ◦ ln y) − hln y + e, x − yi*

*= tr(x ◦ ln x) − tr(x ◦ ln y) − tr(x) + tr(y)*

*= H(x, y)* (17)

*for any x ∈ K and y ∈ int(K). On the other hand, the strict convexity of Φ shown in*
Lemma 3.2 (c) implies that

*Φ(x) − Φ(y) − hΦ*^{0}*(y), x − yi ≥ 0*

*and the equality holds if and only if x = y. The two sides readily give the desired result.*

(d) First, from Lemma 3.1, it follows that

*tr(x ◦ ln y) ≤*
X*r*

*j=1*

*λ**j**(x)λ**j**(ln y) =*
X*r*
*j=1*

*λ**i**(x) ln(λ**j**(y)),* *∀ x ∈ V, y ∈ int(K).*

*Applying this inequality, we thus obtain for all x ∈ K and y ∈ int(K),*
*H(x, y) = tr(x ◦ ln x + y − x) − tr(x ◦ ln y)*

*≥ tr(x ◦ ln x + y − x) −*
X*r*

*j=1*

*λ*_{j}*(x) ln(λ*_{j}*(y))*

=
X*r*

*j=1*

h

*λ*_{j}*(x) ln(λ*_{j}*(x)) + λ*_{j}*(y) − λ*_{j}*(x)*
i

*−*
X*r*

*j=1*

*λ*_{j}*(x) ln(λ*_{j}*(y))*

=
X*r*

*j=1*

h

*λ**j**(x) ln(λ**j**(x)) − λ**j**(x) ln(λ**j**(y)) + λ**j**(y) − λ**j**(x)*
i

*= d(λ(x), λ(y)).*

*(e) For a fixed y ∈ int(K), let L**γ**(x) := {x ∈ K | H(x, y) ≤ γ} where γ ≥ 0. By part (c),*
*we have L*_{γ}*(x) ⊆ {x ∈ K | d(λ(x), λ(y)) ≤ γ}. Since d(·, v) has bounded level sets for any*
*v ∈ R*^{r}_{++} *(see [13, Proposition 4]) and λ(·) is continuous, L*_{γ}*(x) are bounded for all γ ≥ 0.*

*Similarly, H(x, ·) has bounded level sets for any x ∈ K.* *2*

*Proposition 3.2 Let H(x, y) be given as in (12). Suppose {x*^{k}*}*_{k∈N}*⊆ K and {y*^{k}*}*_{k∈N}*⊂*
*int(K) are bounded sequences such that H(x*^{k}*, y*^{k}*) → 0. Then, as k → +∞, we have*
*(a) λ*_{j}*(x*^{k}*) − λ*_{j}*(y*^{k}*) → 0 for all j = 1, 2, · · · , r.*

*(b) tr(x*^{k}*− y*^{k}*) → 0.*

*Proof. (a) By Proposition 3.1 (d) and the nonnegativity of d(·, ·),*
*H(x*^{k}*, y*^{k}*) ≥ d(λ(x*^{k}*), λ(y*^{k}*)) ≥ 0.*

*Hence, H(x*^{k}*, y*^{k}*) → 0 implies d(λ(x*^{k}*), λ(y*^{k}*)) → 0. By the definition of d(·, ·) given by (15),*
*d(λ(x*^{k}*), λ(y** ^{k}*)) =

X*r*
*j=1*

*λ*_{j}*(y*^{k}*)ϕ*¡

*λ*_{j}*(x*^{k}*)/λ*_{j}*(y** ^{k}*)¢

*with ϕ(t) = t ln t − t + 1 (t ≥ 0). Since ϕ(t) ≥ 0 for any t ≥ 0, each term of the above sum*
*is nonnegative. Therefore, d(λ(x*^{k}*), λ(y*^{k}*)) → 0 implies that*

*λ*_{j}*(y*^{k}*)ϕ*

³

*λ*_{j}*(x*^{k}*)/λ*_{j}*(y** ^{k}*)

´

*→ 0,* *j = 1, 2, · · · , r,*
which is also equivalent to

*λ*_{j}*(x*^{k}*) ln(λ*_{j}*(x*^{k}*)) − λ*_{j}*(x*^{k}*) ln(λ*_{j}*(y*^{k}*)) + λ*_{j}*(y*^{k}*) − λ*_{j}*(x*^{k}*) → 0,* *j = 1, 2, · · · , r.*

*Since {λ*_{j}*(x*^{k}*)} and {λ*_{j}*(y*^{k}*)} are bounded, using [6, Lemma A.1] yields that λ*_{j}*(x*^{k}*) −*
*λ*_{j}*(y*^{k}*) → 0 for all j = 1, 2, · · · , r.*

*(b) Since tr(x*^{k}*− y** ^{k}*) =
X

*r*

*j=1*

*(λ*_{j}*(x*^{k}*) − λ*_{j}*(y** ^{k}*)), the desired results follows by part (a).

*2*

*Finally, we present two useful relations for the function H, which can be easily verified by*
*direct substitution, using the definition of H given as in (12), and recalling the nonnegativity*
*of H (see Proposition 3.1 (c)).*

*Proposition 3.3 Let H(x, y) be given in (12). For all x, y ∈ int(K) and z ∈ K, we have*
*(a) H(z, x) − H(z, y) = tr(z ◦ ln y − z ◦ ln x + x − y).*

(b) tr

³

*(z − y) ◦ (ln y − ln x)*

´

*= H(z, x) − H(z, y) − H(y, x) ≤ H(z, x) − H(z, y).*

### 4 Convergence analysis of the algorithm

The analysis of entropy-like proximal algorithm for the CSCP defined by (11)-(12) is similar to that developed for convex minimization problem by G¨uler [12] for quadratical proximal methods and its extension derived by Chen and Teboulle [4] for the proximal-like algorithm using Bregman functions. In what follows, we present a global convergence rate estimate for the entropy-like proximal algorithm (11)-(12) in terms of function values.

We make the following assumptions for the problem CSCP:

*(A1) inf{f (x) | x º*_{K}*0} := f*_{∗}*> −∞,*
*(A2) dom(f ) := {x ∈ V | f (x) < ∞}*T

*int(K) 6= ∅.*

*In addition, we denote the optimal solution set of CSCP by X*^{∗}*:= {x ∈ K | f (x) = f**∗**}.*

Before proving the convergence results, we have to show that the algorithm given in
*(11)-(12) is well-defined, i.e., it generates a sequence {x*^{k}*}*_{k∈N}*⊂ int(K). A technical lemma*
as below is needed for the proof.

*Lemma 4.1 For any x º*_{K}*0, y Â*_{K}*0 and tr(x ◦ y) = 0, we always have x = 0.*

*Proof. It is clear that tr(x ◦ y) = 0 implies hx, yi = 0. From the self-duality of K and [11,*
Proposition I. 1.4], we know that

*u ∈ int(K) ⇐⇒ hu, vi > 0, ∀ 0 6= v ∈ K.*

*This immediately implies the desired result that x = 0.* *2*

*Proposition 4.1 Consider the proximal algorithm defined as in (11)-(12), for any y ∈*
*int(K) and µ > 0, we have the following results.*

*(a) Under assumption (A1), the function f (·) + µ*^{−1}*H(·, y) has bounded level sets.*

*(b) If, in addition, assumption (A2) holds, then there exists a unique x(y) ∈ int(K) satis-*
*fying*

*x(y) = argmin*

*x º** _{K}*0

½

*f (x) +* 1

*µH(x, y)*

¾

*,* (18)

*and the minimum is attained at x(y) ∈ int(K) satisfying*

*µ*^{−1}*(ln y − ln x(y)) ∈ ∂f (x(y)),* (19)
*where ∂f is the subdifferential of f given as in (9).*

*Proof. (a) Fix a y ∈ int(K) and µ > 0. Let F*_{y}*(x) : V → R be a function given by*
*F*_{y}*(x) := f (x) + µ*^{−1}*H(x, y).*

By Proposition 3.1 (e), the level sets

*L**γ**(x) := { x ∈ K | H(x, y) ≤ γ}* (20)

*are bounded for all γ ≥ 0 and any fixed y ∈ int(K). This implies that F*_{y}*(x) has bounded*
*level sets by assumption (A1) (since, suppose not, there exists some γ such that L*_{γ}*(x) is*
not bounded).

*(b) From part (a), F*_{y}*(x) has bounded level sets, which implies the existence of the minimum*
*point x(y). Moreover, the strict convexity of F*_{y}*(x) by Proposition 3.1 (a) guarantees the*
uniqueness. Under assumption (A2), using Proposition 3.1 (b) and the optimality condition
for (18) gives that

*0 ∈ ∂f (x(y)) + µ*^{−1}

³

*ln x(y) − ln y*

´

*+ ∂δ(x(y) | K),* (21)

*where δ(z| K) = 0 if z º*_{K}*0 and +∞ otherwise. We will show that ∂δ(x(y)| K) = 0 and*
*hence the desired result follows. We observe that for any x*^{k}*∈ int(K) with x*^{k}*→ x ∈ bd(K),*

*k ln x*^{k}*k =*
ÃX*r*

*j=1*

*[ln(λ*_{j}*(x** ^{k}*))]

^{2}

!_{1/2}

*→ +∞,*

*where the second relation is due to the continuity of λ*_{j}*(·) and ln t. Consequently,*
*k∇*_{x}*H(x*^{k}*, y)k = k ln x*^{k}*− ln yk ≥ k ln x*^{k}*k − k ln yk → +∞.*

*This by [21, pp. 251] means that H(·, y) is essentially smooth, and hence ∂**x**H(x, y) = ∅ for*
*all x ∈ bd(K) by [21, Theorem 26.1]. Thus, from (21), we must have x(y) ∈ int(K) (since,*
*if not, there is ∂*_{x}*H(x, y) = ∅ which contradicts (21)). Furthermore, from [21, pp. 226] it*
follows that

*∂δ(z| K) = {v ∈ V | v ¹*_{K}*0, tr(v ◦ z) = 0}.*

*Using Lemma 4.1, we then obtain ∂δ(x(y)| K) = {0}. Thus, the proof is completed.* *2*
The next result provides the key properties of the algorithm (11)-(12) from which our
convergence result will follow.

*Proposition 4.2 Let {x*^{k}*}*_{k∈N}*be the sequence generated by the algorithm defined as in*
*(11)-(12), and let σ** _{m}* :=P

_{m}*k=1**µ*_{k}*. Then, the following results hold.*

*(a) The sequence {f (x*^{k}*)}*_{k∈N}*is nonincreasing.*

*(b) µ*_{k}*(f (x*^{k}*) − f (x)) ≤ H(x, x*^{k−1}*) − H(x, x*^{k}*) for all x ∈ K.*

*(c) σ*_{m}*(f (x*^{m}*) − f (x)) ≤ H(x, x*^{0}*) − H(x, x*^{m}*) for all x ∈ K.*

*(d) {H(x, x*^{k}*)}*_{k∈N}*is nonincreasing for any x ∈ X*^{∗}*, if the optimal set X*^{∗}*6= ∅.*

*(e) H(x*^{k}*, x*^{k−1}*) → 0 and tr(x*^{k}*− x*^{k−1}*) → 0, if the optimal set X*^{∗}*6= ∅.*

*Proof. (a) By the definition of x** ^{k}* given as in (11), we know that

*f (x*

^{k}*) + µ*

^{−1}

_{k}*H(x*

^{k}*, x*

^{k−1}*) ≤ f (x*

^{k−1}*) + µ*

^{−1}

_{k}*H(x*

^{k−1}*, x*

^{k−1}*).*

*Since H(x*^{k}*, x*^{k−1}*) ≥ 0 and H(x*^{k−1}*, x** ^{k−1}*) = 0 (see Proposition 3.1 (c)), we obtain

*f (x*

^{k}*) ≤ f (x*

^{k−1}*) ∀ k ∈ N.*

*(b) From (19), we have −µ*^{−1}_{k}*(ln x*^{k}*− ln x*^{k−1}*) ∈ ∂f (x** ^{k}*). Plugging this into the formula of

*∂f (x** ^{k}*) given as in (10), we obtain

*f (x) ≥ f (x*

^{k}*) − µ*

^{−1}

_{k}*· tr*

³

*(x − x*^{k}*) ◦ (ln x*^{k}*− ln x** ^{k−1}*)

´

*∀ x ∈ K.*

This yields

*µ*_{k}*· (f (x*^{k}*) − f (x)) ≤ tr*³

*(x − x*^{k}*) ◦ (ln x*^{k}*− ln x** ^{k−1}*)´

*= H(x, x*^{k−1}*) − H(x, x*^{k}*) − H(x*^{k}*, x** ^{k−1}*) (22)

*≤ H(x, x*^{k−1}*) − H(x, x*^{k}*),*

where the equality and the last inequality hold due to Proposition 3.3 (b).

*(c) First, summing up the inequalities as part (b) over k = 1, 2, 3, · · · , m, yields that*
X*m*

*k=1*

*µ*_{k}*f (x*^{k}*) − σ*_{m}*f (x) ≤ H(x, x*^{0}*) − H(x, x*^{m}*) ∀ x ∈ K.* (23)

*On the other hand, note that f (x*^{m}*) ≤ f (x** ^{k}*) from part(a). Multiplying this inequality by

*µ*

*gives*

_{k}*µ*_{k}*f (x*^{m}*) ≤ µ*_{k}*f (x*^{k}*).*

Then summing up leads to
X*m*

*k=1*

*µ**k**f (x*^{k}*) ≥*
X*m*
*k=1*

*µ**k**f (x*^{m}*) = σ**m**f (x*^{m}*).* (24)

Now, combining (23) and (24) yields the desired result.

*(d) Since f (x*^{k}*) − f (x) = f (x*^{k}*) − f*_{∗}*≥ 0 for all x ∈ X** ^{∗}*, the result immediately follows from
part (b).

*(e) From part (d), we know that the sequence {H(x, x*^{k}*)}** _{k∈N}* is nonincreasing for any

*x ∈ X*

^{∗}*. This together with H(x, x*

^{k}*) ≥ 0 implies that {H(x, x*

^{k}*)}*

*is convergent. Thus,*

_{k∈N}*H(x, x*

^{k−1}*) − H(x, x*

^{k}*) → 0 ∀ x ∈ X*

^{∗}*.*(25) On the other hand, from (22) in the proof of part (b) as well as part (d), we have

*0 ≤ µ*_{k}*(f (x*^{k}*) − f (x)) ≤ H(x, x*^{k−1}*) − H(x, x*^{k}*) − H(x*^{k}*, x*^{k−1}*),* *∀ x ∈ X*^{∗}*.*
Hence,

*H(x*^{k}*, x*^{k−1}*) ≤ H(x, x*^{k−1}*) − H(x, x*^{k}*).*

*Using the equation (25) and the nonnegativity of H(x*^{k}*, x** ^{k−1}*), the first result is obtained.

*Since {x*^{k}*}*_{k∈N}*⊂ {y ∈ int(K) | H(x, y) ≤ H(x, x*^{0}*)} with x ∈ X*^{∗}*, the sequence {x*^{k}*}** _{k∈N}* is
bounded. Thus, the second result follows by the first result and Proposition 3.2 (b).

*2*

By now, we have proved that the algorithm (11)-(12) is well-defined and satisfies some favorable properties. With these properties, we are ready to show the convergence results of the proposed algorithm which is one of the main purposes of this paper.

*Proposition 4.3 Let {x*^{k}*}*_{k∈N}*be the sequence generated by the algorithm in (11)-(12), and*
*let σ** _{m}* :=P

_{m}*k=1**µ*_{k}*. Then, under assumptions (A1) and (A2), the following results hold.*

*(a) f (x*^{m}*) − f (x) ≤ σ*_{m}^{−1}*H(x, x*^{0}*) for all x ∈ K.*

*(b) If σ**m* *→ +∞, then lim**m→+∞**f (x*^{m}*) = f**∗**.*

*(c) If the optimal set X*^{∗}*6= ∅, the sequence {x*^{k}*}**k∈N* *is bounded and every accumulation*
*point is a solution of the problem CSCP.*

*Proof. (a) This result follows by Proposition 4.2 (c) and the nonnegativity of H(x, x** ^{m}*).

*(b) Since σ**m* *→ +∞, passing to the limit in part (a), we have*
lim sup

*m→+∞*

*f (x*^{m}*) ≤ f (x) ∀ x ∈ K.*

*This together with the fact f (x*^{m}*) ≥ f** _{∗}* yields the desired result.

*(c) From Proposition 3.1 (d), H(x, ·) has bounded level sets for every x ∈ K. Also, from*
*Proposition 4.2 (d), {H(x, x*^{k}*)}*_{k∈N}*is nonincreasing for all x ∈ X*^{∗}*if X*^{∗}*6= ∅. Thus, the*
*sequence {x*^{k}*}** _{k∈N}* is bounded and therefore has an accumulation point. Let ˆ

*x ∈ K be an*

*accumulation point of {x*

^{k}*}*

_{k∈N}*. Then {x*

^{k}

^{j}*} → ˆx for some k*

_{j}*→ +∞. Since f is lower*

*semi-continuous, we have f (ˆx) = lim inf*

_{k}

_{j}

_{→+∞}*f (x*

^{k}*) (see [24, page 8]). On the other hand,*

^{j}*we also have f (x*

^{k}

^{j}*) → f*

_{∗}*, hence f (ˆx) = f*

*which says that ˆ*

_{∗}*x is a solution of CSCP.*

*2*

Proposition 4.3 (a) indicates an estimate for global rate of convergence which is similar
to the one obtained by proximal-like algorithms in convex minimization. Analogously, as
*remarked in [6, Remark 4.1], global convergence of {x*^{k}*} itself to an optimal solution of*
*(1) can be achieved under the assumption that {x*^{k}*} ⊂ int(K) has a limit point in int(K).*

Nonetheless, this assumption is rather stringent.

### 5 The exponential multiplier method for SCLP

In this section, we present a dual application of the entropy-like proximal algorithm (11)- (12), leading to an exponential multiplier method for the symmetric cone linear program

(SCLP) min (

*−b*^{T}*y : c −*
X*m*

*i=1*

*y*_{i}*a*_{i}*º*_{K}*0, y ∈ R** ^{m}*
)

*,* (26)

*where c, a*_{i}*∈ V for i = 1, 2, · · · , m and b ∈ R** ^{m}*. For convenience, we denote the feasible set

*of (SCLP) by S = {y ∈ R*

^{m}*: A(y) º*

_{K}*0}, where A(y) : R*

^{m}*→ V is defined by*

*A(y) := c −*P_{m}

*i=1**y*_{i}*a*_{i}*.* (27)

We make the following assumptions for the problem (SCLP):

(S1) The optimal solution set of (SCLP) is nonempty and bounded;

(S2) Slater’s condition holds, i.e., there exists a ˆ*y ∈ R*^{m}*such that A(ˆy) Â** _{K}* 0.

The Lagrangian function associated with (SCLP) is given as follows:

*L(y, x) =*

½ *−b*^{T}*y − tr [x ◦ A(y)]* *if x ∈ K,*

*+∞* *otherwise.*

Therefore, the dual problem corresponding to (SCLP) is given by

(DSCLP) max

*xº** _{K}*0 inf

*y∈R*^{m}*L(y, x),*
which can be explicitly written as

(DSCLP)

*max −tr(c ◦ x)*

*s.t.* *tr(a*_{i}*◦ x) = b*_{i}*, i = 1, 2, · · · , m,*
*x º*_{K}*0.*

By standard convex analysis arguments from [21], we know that under the assumption (S2), there is no duality gap between (SCLP) and (DSCLP). Moreover, the set of optimal solution of (DSCLP) is nonempty and compact.

To solve the problem (SCLP), we introduce the following multiplier-type algorithm:

*Algorithm 5.1 Given an x*^{0} *∈ int(K), generate the sequence {y*^{k}*}*_{k∈N}*⊆ R*^{m}*and {x*^{k}*}*_{k∈N}*⊂*
*int(K) by*

*y** ^{k}* = argmin

*y∈R*^{m}

n

*−b*^{T}*y + µ*^{−1}* _{k}* tr
h

exp

³

*− µ*_{k}*A(y) + ln x*^{k−1}

´io

*,* (28)

*x** ^{k}* = exp¡

*−µ*_{k}*A(y*^{k}*) + ln x** ^{k−1}*¢

*,* (29)

*where the penalty parameters µ*_{k}*satisfy µ*_{k}*> ¯µ > 0 for all k ∈ N.*

The algorithm can be viewed as a natural extension of the exponential multiplier method used in convex programs over nonnegative orthant cones (see, e.g., [3, 29]) to symmetric cones. In the rest of this section, we will show that the algorithm (28)-(29) shares similar properties.

We first present two technical lemmas. One of them is to collect some properties of the
*L¨owner operator exp(z), while the other is to characterize the recession cone of the set S.*

*Lemma 5.1 (a) For any z ∈ V, there always holds that exp(z) ∈ int(K).*

*(b) The function exp(z) is continuously differentiable everywhere with*
*(exp(z))*^{0}*e = ∇**z**(exp(z))e = exp(z) ∀ z ∈ V.*

*(b) The function tr[exp(*P_{m}

*i=1**y*_{i}*a*_{i}*)] is continuously differentiable everywhere with*

*∇** _{y}*tr
h

exp(P_{m}

*i=1**y*_{i}*a** _{i}*)
i

*= A** ^{T}* exp(P

_{m}*i=1**y*_{i}*a** _{i}*)

*where y ∈ R*^{m}*and a*_{i}*∈ V for i = 1, 2, · · · , m, and A*^{T}*∈ R*^{m×n}*be the matrix cor-*
*responding to the linear transformation that maps x ∈ V to the m-vector whose i*^{th}*component is ha*_{i}*, xi.*

*Proof. (a) The result is clear since for any z ∈ V with z =* P_{r}

*j=1**λ*_{j}*(z)c** _{j}*, all eigenvalues

*of exp(z), given by exp(λ*

_{j}*(z)) for j = 1, 2, · · · , r, are strictly positive.*

*(b) By Theorem 2.2, clearly, exp(z) is continuously differentiable everywhere and*

*(exp(z))*^{0}*h =*
X*r*

*j=1*

*exp(λ*_{j}*(z))hc*_{j}*, hic** _{j}*+ X

*1≤j<l≤r*

4*exp(λ*_{j}*(z)) − exp(λ*_{l}*(z))*

*λ**j**(z) − λ**l**(z)* *c*_{j}*◦ (c*_{l}*◦ h)*

*for any h ∈ V. From this formula, we particularly have that*
*(exp(z))*^{0}*e =*

X*r*
*j=1*

*exp(λ*_{j}*(z))hc*_{j}*, eic** _{j}*+ X

*1≤j<l≤r*

4*exp(λ*_{j}*(z)) − exp(λ*_{l}*(z))*

*λ**j**(z) − λ**l**(z)* *c*_{j}*◦ (c*_{l}*◦ e)*

=
X*r*

*j=1*

*exp(λ*_{j}*(z))c*_{j}*= exp(z).*

(c) From the continuous differentiability of trace function and part (b), we immediately obtain the first part. Notice that

tr h

exp(P_{m}

*i=1**y*_{i}*a** _{i}*)
i

= D

exp(P_{m}

*i=1**y*_{i}*a*_{i}*), e*
E

*.*

Therefore, applying the differential chain rule readily yields the desired result. *2*

*Lemma 5.2 Let S*_{∞}*denote the recession cone of the feasible set S. Then,*
*S*_{∞}*= {d ∈ R*^{m}*: A(d) − c º*_{K}*0} .*

*Proof. Assume that d ∈ R*^{m}*such that A(d) − c º*_{K}*0. If d = 0, clearly, d ∈ S*_{∞}*. If d 6= 0,*
*we take any y ∈ S. Then A(y) º*_{K}*0, and moreover, for any τ > 0,*

*A(y + τ d) = c −*
X*m*

*i=1*

*(y*_{i}*+ τ d*_{i}*)a*_{i}*= A(y) + τ (A(d) − c) º*_{K}*0.*

*This, by the definition of recession direction [21, pp. 61], shows that d ∈ S**∞*. Therefore,
*{d ∈ R*^{m}*: A(d) − c º*_{K}*0} ⊆ S**∞*.

*Now take any d ∈ S*_{∞}*and y ∈ S. Then A(y + τ d) º*_{K}*0 for any τ > 0, and therefore*
*λ*_{min}*[A(y + τ d)] ≥ 0. This implies that λ*_{min}*(A(d) − c) ≥ 0. If not, using the fact that*

*λ*_{min}*[A(y + τ d)] = λ*_{min}*[A(y) + τ (A(d) − c)]*

*≤ λ*_{min}*(τ (A(d) − c)) + λ*_{max}*(A(y))*

*= τ λ*min*(A(d) − c) + λ(A(y)),*

*where the inequality is due to Lemma 2.1, and letting τ → +∞, we then obtain*
*λ*_{min}*[A(y + τ d)] → −∞,*

*which contradicts to λ*min*[A(y + τ d)] ≥ 0. Thus, we have that A(d) − c º** _{K}* 0. Combining

*with the above discussions, we get S*

_{∞}*= {d ∈ R*

^{m}*|A(d) − c º*

_{K}*0}.*

*2*

Now we begin with the convergence of the algorithm (28)-(29). We first prove that the sequence generated by (28)-(29) is well-defined, which is implied by the following lemma.

*Lemma 5.3 For any y ∈ R*^{m}*and µ > 0, let F : R*^{m}*→ R be defined by*
*F (y) := −b*^{T}*y + µ** ^{−1}*tr£

*exp(−µA(y) + ln x** ^{k−1}*)¤

(30)
*where µ > 0. Then under assumption (S1) the minimum set of F is nonempty and bounded.*

*Proof. By assumption (S1), it suffices to prove F (y) has bounded level sets. Suppose not,*
*then there exists a sequence {y*^{k}*} ⊆ R** ^{m}* such that

*ky*^{k}*k → +∞,* lim

*k→+∞*

*y*^{k}

*ky*^{k}*k* *= d 6= 0 and F (y*^{k}*) ≤ δ.* (31)
*Since by Lemma 5.1(a) exp(−µA(y) + ln x*^{k−1}*) ∈ int(K) for any y ∈ R** ^{m}*, we have

tr h

exp

³

*− µA(y) + ln x*^{k−1}

´i

=
X*r*

*j=1*

exp h

*λ*_{j}

³

*− µA(y) + ln x*^{k−1}

´i

*> 0.*

*Thus, F (y*^{k}*) ≤ δ implies that the following two inequalities hold:*

*−b*^{T}*y*^{k}*< δ,*
X*r*

*j=1*

exp h

*λ*_{j}

³

*− µA(y*^{k}*) + ln x*^{k−1}

´i

*≤ µ(δ + b*^{T}*y*^{k}*).*

From this, we immediately obtain that

*−b*^{T}*d ≤ 0,* (32)

exp h

*λ*_{j}

³

*− µA(y*^{k}*) + ln x*^{k−1}

´i

*≤ µ(δ + b*^{T}*y*^{k}*),* *∀ j = 1, 2, · · · , r.* (33)
*Using the monotonicity of ln t (t > 0) and the homogeneity of λ*_{j}*(·), we can rewrite (33) as*

*λ** _{j}*
µ

*−µA(y** ^{k}*)

*ky*^{k}*k* + *ln x*^{k−1}*ky*^{k}*k*

¶

*≤* *ln(µ(δ + b*^{T}*y** ^{k}*))

*ky*^{k}*k* *≤* *µ(δ + b*^{T}*y*^{k}*) − 1*

*ky*^{k}*k* *,* *∀ j = 1, 2, · · · , r,*
*where the second inequality is due to ln t ≤ t − 1 (t > 0). Passing to the limit on the both*
*sides of the inequality and using the homogeneity of λ*_{j}*(·) again, we then get*

*λ*_{j}*(A(d) − c) ≥ 0,* *∀ j = 1, 2, · · · , r,*
which in turn means that

*A(d) − c º*_{K}*0.* (34)

*From Lemma 5.2, we know that {d ∈ R*^{m}*: A(d) − c º*_{K}*0} = S** _{∞}*. Combining with (32),

*we thus show that there exists a nonzero d ∈ R*

^{m}*such that d ∈ S*

_{∞}*but −b*

^{T}*d ≤ 0. This*clearly contradicts the assumption (S1). Hence, the proof is completed.

*2*

To analyze the convergence of the algorithm (28)-(29), we also need the following result
*which states that the sequence {x*^{k}*}** _{k∈N}* generated by (28)-(29) is exactly the sequence
produced by the algorithm (11)-(12) when applied to the dual problem (DSCLP).