An entropy-like proximal algorithm and the exponential multiplier method for symmetric cone programming
Jein-Shan Chen 1 Department of Mathematics National Taiwan Normal University
Taipei 11677, Taiwan
Shaohua Pan2
School of Mathematical Sciences South China University of Technology
Guangzhou 510640, China
April 28, 2006 (revised May 16, 2007)
Abstract. We introduce an entropy-like proximal algorithm for the problem of minimizing a closed proper convex function subject to the symmetric cone constraint. The algorithm is based on a distance-like function that is an extension of the Kullback-Leiber relative entropy to the setting of symmetric cones. Like the proximal algorithm for convex programming with nonnegative orthant cone constraint, we show that, under some mild assumptions, the sequence generated by the proposed algorithm is bounded and every accumulation point is a solution of the considered problem. In addition, we also present a dual application of the proposed algorithm to the symmetric cone linear program, leading to a multiplier method which is shown to enjoy properties similar to the exponential multiplier method in [29].
Key words. Symmetric cone optimization, proximal-like method, entropy-like distance, exponential multiplier method.
1 Introduction
Symmetric cone programming (SCP) provides a unified framework for various mathemati- cal programming including the linear programming (LP), the second-order cone program- ming (SOCP) and the semidefinite programming (SDP). Such programming problems arise
1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan. E-mail: jschen@math.ntnu.edu.tw.
2E-mail:shhpan@scut.edu.cn.
naturally in a wide range of applications in engineering, economics, optimal control, man- agement science, combinatorial optimization, and other fields; see [1, 16, 30] and references therein. Recently, the symmetric cone optimization problem, especially the symmetric cone linear programming (SCLP), has recently attracted the attention of some researchers with a focus on the development of interior point methods similar to those used in linear programming; see [9, 10, 25]. Although interior-point methods were successfully applied for solving the SCLP, it is worthwhile to explore other solution methods for more general convex symmetric cone optimization problems.
Let A = (V, ◦, h·, ·i) be a Euclidean Jordan algebra, where (V, h·, ·i) is a finite dimen- sional inner product space over real field R and “◦” denotes the Jordan product which will be defined in the next section. Let K be the symmetric cone in V (will be introduced in Sec.
2). In this paper, we consider the following convex symmetric cone programming (CSCP):
min f (x)
s.t. x ºK 0, (1)
where f : V → (−∞, ∞] is a closed proper convex function, and x ºK 0 means x ∈ K. In general, for any x, y ∈ V, we write x ºK y if x − y ∈ K and write x ÂK y if x − y ∈ int(K).
A function is closed if and only if it is lower semi-continuous (l.s.c. for short) and a function is proper if f (x) < ∞ for at least one x ∈ V and f (x) > −∞ for all x ∈ V.
Notice that the CSCP is indeed a special class of convex programs, and therefore it can in principle be solved via general convex programming methods. One such method is the proximal point algorithm for minimizing a convex function f (x) on Rn which generates a sequence {xk}k∈N by the iterative scheme as below:
xk = argmin
x∈Rn
½
f (x) + 1
µkkx − xk−1k2
¾
, (2)
where µk is a sequence of positive numbers and k · k denotes the Euclidean norm in Rn. This method was originally introduced by Martinet [17], based on the Moreau proximal approximation of f (see [18]), and then further developed and studied by Rockafellar [22, 23]. Later, several generalizations of the proximal algorithm have been considered where the usual quadratic proximal term in (2) is replaced by nonquadratic distance-like functions;
see, e.g. [5, 7, 8, 14, 27]. Among others, the proximal algorithms using an entropy- like distance [27, 14, 13, 28], designed for minimizing a convex function f (x) subject to nonnegative orthant constraints x ∈ Rn+, generate the iterates by
x0 ∈ Rn++
xk = argmin
x∈Rn+
½
f (x) + 1
µkdϕ(x, xk−1)
¾
, (3)
where dϕ(·, ·) : Rn++× Rn++→ Rn+ is the entropy-like distance defined by dϕ(x, y) =
Xn i=1
yiϕ(xi/yi) (4)
with ϕ satisfying certain conditions; see [13, 14, 27, 28]. An important choice of ϕ is the case that ϕ(t) = t ln t − t + 1 for which the corresponding dϕ given by
dϕ(x, y) = Xn
i=1
h
xiln xi− xiln yi+ yi− xi i
(5) is the well known Kullback-Leibler entropy from statistics and that is the “entropy” termi- nology stems from. One key feature of entropic proximal methods is that they automatically generate a sequence staying the interior of Rn+, i.e. {xk}k∈N ⊂ Rn++, and hence eliminates the combinatorial nature of the problem. One of the main applications of these proximal methods is to the dual of smooth convex programs, yielding twice continuously differen- tiable nonquadratic augmented Lagrangians and thus allowing the use of Newton’s method.
The main motivation of this paper is to develop unified proximal-type algorithms and their corresponding dual augmented Lagrangian methods for the convex symmetric cone optimization problems. Specifically, by using the Euclidean Jordan algebraic techniques, we introduce an interior proximal-type algorithm for the CSCP which can be viewed as an extension of the entropy-like proximal algorithm defined by (3)-(4) with ϕ(t) = t ln t−t+1.
For the proposed algorithm, we establish its convergence properties, and also present a dual application to the SCLP, leading to an exponential multiplier method which is shown to possess properties analogous to the method proposed by [3, 29] for convex programming over nonnegative orthant cones.
The paper is organized as follows. Section 2 reviews some basic concepts and materials on Euclidean Jordan algebras which are needed in the analysis of the algorithm. In Section 3, we introduce a distance-like function H to measure how close between two points in the symmetric cone K and show some related properties. In addition, we also outline a basic proximal-like algorithm with the measure function H. The convergence analysis of the basic algorithm is studied in Section 4. In Section 5, we consider a dual application of the algorithm to the SCLP and establish the convergence results for the resulted multiplier method. We close with some final remarks in Section 6.
2 Preliminaries on Euclidean Jordan Algebra
In this section, we recall some concepts and results on Euclidean Jordan algebras that will be used in the subsequent sections. More detailed expositions of Euclidean Jordan algebras can be found in Koecher’s lecture notes [15] and the monograph by Faraut and Kor´anyi [11].
Let V be a finite-dimensional vector space endowed with a bilinear mapping (x, y) → x◦y from V × V into V. For a given x ∈ V, let L(x) be the linear operator of V defined by
L(x)y := x ◦ y for every y ∈ V.
The pair (V, ◦) is called a Jordan algebra if, for all x, y ∈ V,
(i) x ◦ y = y ◦ x,
(ii) x ◦ (x2◦ y) = x2◦ (x ◦ y), where x2 := x ◦ x.
In a Jordan algebra (V, ◦), x ◦ y is said to be the Jordan product of x and y. Note that a Jordan algebra is not associative, i.e., x ◦ (y ◦ z) = (x ◦ y) ◦ z may not hold in general.
If for some element e ∈ V, x ◦ e = e ◦ x = x for all x ∈ V, then e is called a unit element of the Jordan algebra (V, ◦). The unit element, if exists, is unique. A Jordan algebra does not necessarily have a unit element. For x ∈ V, let ζ(x) be the degree of the minimal polynomial of x, which can be equivalently defined as
ζ(x) := min©
k : {e, x, x2, · · · , xk} are linearly dependentª . Then the rank of (V, ◦) is well defined by r := max{ζ(x) : x ∈ V}.
A Jordan algebra (V, ◦), with a unit element e ∈ V, defined over the real field R is called a Euclidean Jordan algebra or formally real Jordan algebra, if there exists a positive definite symmetric bilinear form on V which is associative; in other words, there exists on V an inner product denoted by h·, ·iV such that for all x, y, z ∈ V:
(iii) hx ◦ y, ziV = hy, x ◦ ziV.
In a Euclidean Jordan algebra A = (V, ◦, h·, ·iV), we define the set of squares as K :=©
x2 : x ∈ Vª .
By [11, Theorem III. 2.1], K is a symmetric cone. This means that K is a self-dual closed convex cone with nonempty interior and for any two elements x, y ∈ int(K), there exists an invertible linear transformation T : V → V such that T (K) = K and T (x) = y.
Here are two popular examples of Euclidean Jordan algebras. Let Sn be the space of n × n real symmetric matrices with the inner product given by hX, Y iSn := Tr(XY ), where XY is the usual matrix multiplication of X and Y and Tr(XY ) is the trace of matrix XY . Then, (Sn, ◦, h·, ·iSn) is a Euclidean Jordan algebra with the Jordan product given by
X ◦ Y := (XY + Y X)/2, X, Y ∈ Sn.
In this case, the unit element is the identity matrix I in Sn and the cone of squares K is the set of all positive semidefinite matrices (denoted as Sn+) in Sn. Let Rn be the Euclidean space of dimension n with the usual inner product hx, yiRn = xTy. For any x = (x1, x2), y = (y1, y2) ∈ R × Rn−1, define
x ◦ y := (xTy, x1y2+ y1x2)T.
Then (Rn, ◦, h·, ·iRn) is a Euclidean Jordan algebra, also called the quadratic forms algebra.
In this algebra, the unit element e = (1, 0)T and the set of squares K = {x = (x1, x2) ∈
R × Rn−1: kx2k ≤ x1}.
Recall that an element c ∈ V is said to be idempotent if c2 = c. Two idempotents c and q are said to be orthogonal if c ◦ q = 0. One says that {c1, c2, · · · , ck} is a complete system of orthogonal idempotents if
c2j = cj, cj ◦ ci = 0 if j 6= i for all j, i = 1, 2, · · · , k, and Pk
j=1cj = e.
An idempotent is said to be primitive if it is nonzero and cannot be written as the sum of two other nonzero idempotents. We call a complete system of orthogonal primitive idempotents a Jordan frame. Then, we have the second version of the spectral decomposition theorem.
Theorem 2.1 [11, Theorem III. 1.2] Suppose that A = (V, ◦, h·, ·iV) is a Euclidean Jor- dan algebra and the rank of A is r. Then for any x ∈ V, there exists a Jordan frame {c1, c2, · · · , cr} and real numbers λ1(x), λ2(x), · · · , λr(x), arranged in the decreasing order λ1(x) ≥ λ2(x) ≥ · · · ≥ λr(x), such that x =Pr
j=1λj(x)cj.
The numbers λj(x) (counting multiplicities), which are uniquely determined by x, are called the eigenvalues. We let λmin(·) and λmax(·) denote the smallest and the largest eigenvalues, respectively. The trace of x, denoted as tr(x), is given by tr(x) :=Pr
j=1λj(x).
From [11, Proposition III.1.5], a Jordan algebra (V, ◦) with a unit element e ∈ V is Euclidean if and only if the symmetric bilinear form tr(x ◦ y) is positive definite, we may define another inner product on V by
hx, yi := tr(x ◦ y), ∀ x, y ∈ V.
By the associativity of tr(·) [11, Proposition II. 4.3], the inner product h·, ·i is associative, i.e., for all x, y, z ∈ V, there holds that hx, y ◦ zi = hy, x ◦ zi. Thus, the linear operator L(x) for each x ∈ V is symmetric with respect to the inner product h·, ·i in the sense that
hL(x)y, zi = hy, L(x)zi, ∀ y, z ∈ V.
In addition, we let k · k be the norm on V induced by the inner product h·, ·i, i.e., kxk := p
hx, xi =³Pr
j=1λ2j(x)
´1/2
, ∀ x ∈ V.
Now we can present two important inequalities of the eigenvalue function λmin(·).
Lemma 2.1 Let x, y ∈ V, then we can bound the minimum eigenvalue of x + y as follows:
λmin(x) + λmin(y) ≤ λmin(x + y) ≤ λmin(x) + λmax(y).
Proof. The first inequality can be found in [25, Lemma 14]. Thus, we only prove the second inequality. From [25, Lemma 13], we know
λmin(x) = min
u∈V
hu, x ◦ ui
hu, ui and λmax(x) = max
u∈V
hu, x ◦ ui hu, ui . Hence, there holds
λmin(x + y) = min
u∈V
hu, (x + y) ◦ ui
hu, ui = min
u∈V
hu, x ◦ ui + hu, y ◦ ui hu, ui
≤ min
u∈V
½hu, x ◦ ui
hu, ui + max
u∈V
hu, y ◦ ui hu, ui
¾
= λmin(x) + λmax(y).
Consequently, the second inequality follows. 2
Let φ : R → R be a scalar valued function. Then, it is natural to define a vector-valued function associated with the Euclidean Jordan algebra (V, ◦, h·, ·iV) by
φsc(x) := φ(λ1(x))c1+ φ(λ2(x))c2+ · · · + φ(λr(x))cr, (6) where x ∈ V has the spectral decomposition x =Pr
j=1λj(x)cj. This function φsc is called the L¨owner operator and shown to enjoy the following property [26].
Lemma 2.2 [26, Theorem 13] For any x = Pr
j=1λj(x)cj, let φsc be defined as in (6).
Then φsc is (continuously) differentiable at x if and only if for each j ∈ {1, 2, · · · , r}, φ is (continuously) differentiable at λj(x). The derivative of φsc at x, for any h ∈ V, is given by
(φsc)0(x)h = Xr
j=1
[φ[1](λ(x))]jjhcj, hicj+ X
1≤j<l≤r
4[φ[1](λ(x))]jlcj◦ (cl◦ h) with
[φ[1](λ(x))]ij :=
φ(λi(x)) − φ(λj(x))
λi(x) − λj(x) if λi(x) 6= λj(x) φ0(λi(x)) if λi(x) = λj(x)
, i, j = 1, 2, · · · , r.
In fact, the Jacobian (φsc)0(·) is a linear and symmetric operator, and can be written as (φsc)0(x) =
Xr j=1
ϕ0(λj(x))Q(cj) + 2 Xr i,j=1,i6=j
[φ[1](λ(x))]ijL(cj)L(ci), (7)
where Q(x) := 2L2(x) − L(x2) for any x ∈ V is called the quadratic representation of V.
We next introduce the real-valued spectral functions on the Euclidean Jordan algebra.
Let P be the set of all permutations of r-dimensional vectors. A subset of Rr is said to be symmetric if it remains unchanged under every permutation of P (we adopt the same notations used in [2])).
Definition 2.1 Let S ⊆ Rr be a symmetric set. A real-valued function f : S → R is symmetric if for every permutation P ∈ P and each s ∈ S, we have f (P s) = f (s).
For any x ∈ V with the spectral decomposition x =Pr
j=1λj(x)cj, we define K := {x ∈ V | λ(x) ∈ S},
where λ(x) = (λ1(x), · · · , λr(x))T is the spectral vector of x and S ⊆ Rr is a symmetric set.
Let f : S → R is a symmetric function. Then the function F : K → R given by
F (x) := f (λ(x)) (8)
is the spectral function generated by f . Moreover, from [2, Theorem 41], we know that F is (strictly) convex if f is (strictly) convex.
Unless otherwise stated, in the sequel, A = (V, ◦, h·, ·i) represents a Euclidean Jordan algebra of rank r and dim(V) = n. For a closed proper convex function f : V → (−∞, +∞], we denote the domain of f by dom(f ) := {x ∈ V | f (x) < +∞}. The subdifferential of f at x0 ∈ V is the convex set
∂f (x0) = {ξ ∈ V | f (x) ≥ f (x0) + hξ, x − x0i ∀ x ∈ V} . (9) Since hx, yi = tr(x ◦ y) for any x, y ∈ V, the above subdifferential set is equivalent to
∂f (x0) = {ξ ∈ V | f (x) ≥ f (x0) + tr(ξ ◦ (x − x0)) ∀ x ∈ V} . (10) For a sequence {xk}k∈N, the notation N denotes the set of natural numbers.
3 An entropy-like proximal algorithm for SCP
To solve the convex symmetric optimization problem CSCP, we in this section suggest an entropy-like proximal minimization algorithm defined as follows:
x0 ÂK 0 xk = argmin
x ºK0
½
f (x) + 1
µkH(x, xk−1)
¾
, (11)
where µk > 0 and H : V × V → (−∞, +∞] is defined by H(x, y) :=
½ tr(x ◦ ln x − x ◦ ln y + y − x) ∀ x ∈ K, y ∈ int(K),
+∞ otherwise. (12)
This algorithm is indeed a proximal-type one, except that, instead of using the classical quadratic term kx − xk−1k2, we adopt the distance-like function H to guarantee that the generated sequence {xk}k∈N satisfies xk ∈ int(K) for all k, thus leading to an “interior”
proximal method (see Proposition 4.1).
By the definition of L¨owner operator in Section 2, it is clear that the function H(x, y) is well-defined for all x, y ∈ int(K). Moreover, the domain of x ∈ int(K) can be extended to x ∈ K by adopting the convention that 0 ln 0 ≡ 0. The function H is a natural extension of the distance-like entropy function given as in (5), and is used to measure the “distance”
between two points in K. In fact, H will become the entropy function dϕ given as in (5) if the Euclidean Jordan algebra A is specified as (Rn, ◦, h·, ·iRn) with “◦” denoting the componentwise product of two vectors in Rn and h·, ·iRn being the standard dot product ha, biRn =Pn
i=1aibi for every a, b ∈ Rn. As shown by Proposition 3.1 below, most of the important properties, but not all, hold for dϕ(·, ·) also hold for H(·, ·).
In what follows, we present several technical lemmas that will be used in investigating some favorable properties of the distance measure H. We start with the extension to Euclidean Jordan algebras of Von Neumann inequality, whose proof can be found in [2].
Lemma 3.1 For any x, y ∈ V, we have tr(x ◦ y) ≤ Pr
j=1λj(x)λj(y) = λ(x)Tλ(y), where λ(x) and λ(y) are the spectral vectors of x and y, respectively.
Lemma 3.2 For any x ∈ K, let Φ(x) := tr(x ◦ ln x). Then, we have the following results.
(a) Φ(x) is the spectral function generated by the symmetric entropy function Φ(u) =b Pr
j=1ujln uj, ∀u ∈ Rr+. (13) (b) Φ(x) is continuously differentiable on x ∈ int(K) with ∇Φ(x) = ln x + e.
(c) The function Φ(x) is strictly convex over x ∈ Kn.
Proof. (a) Suppose that x has the spectral decomposition x =Pr
j=1λj(x)cj. Let φ(t) = t ln t, t ∈ R.
Then, from Section 2, we know that the vector-valued function x◦ln x is exactly the L¨owner function φsc(x), i.e., φsc(x) = x ◦ ln x. Clearly, φsc is well-defined for any x ∈ K and
φsc(x) = x ◦ ln x = Xr
j=1
λj(x) ln(λj(x))cj.
Therefore
Φ(x) = tr(φsc(x)) = Xr
j=1
λj(x) ln(λj(x)) = bΦ(λ(x)),
where bΦ : Rr+ → R is given by (13). By Definition 2.1, clearly, the function bΦ is symmetric.
Consequently, Φ(x) is the spectral function generated by the symmetric function bΦ in view of (8).
(b) From Lemma 2.2, we know that φsc(x) = x ◦ ln x is continuously differentiable on x ∈ int(K). Thus, Φ(x) is also continuously differentiable on x ∈ int(K) because Φ is the composition of the trace function (clearly continuously differentiable) and φsc. Now, it remains to find its gradient formula. From the fact that tr(x ◦ y) = hx, yi, we have
Φ(x) = tr(x ◦ ln x) = hx, ln xi.
Applying the chain rule for inner product of two functions, we then obtain
∇Φ(x) = ln x + (∇ ln x)x = ln x + (ln x)0x. (14) On the other hand, from the formula (7) it follows that for any h ∈ V,
(ln x)0h = Xr
j=1
1
λj(x)hcj, hicj + X
1≤j<l≤r
ln(λj(x)) − ln(λl(x))
λj(x) − λl(x) cj◦ (cl◦ h).
By this, it is easy to compute that (ln x)0x =
Xr j=1
1
λj(x)hcj, xicj + X
1≤j<l≤r
ln(λj(x)) − ln(λl(x))
λj(x) − λl(x) cj ◦ (cl◦ x)
= Xr
j=1
1
λj(x)λj(x)cj+ X
1≤j<l≤r
ln(λj(x)) − ln(λl(x))
λj(x) − λl(x) λl(x)cj ◦ cl
= e.
This together with equation (14) yields the desired results.
(c) Note that the function bΦ in (13) is strictly convex over Rn+. Therefore, the conclusion immediately follows from part (a) and [2, Theorem 41]. 2
Now we are in a position to summarize some of favorable properties of the distance-like function H. These properties play an important role in the convergence analysis (as will be seen in the next section) of the algorithm (11)-(12).
Proposition 3.1 Let H(x, y) be given as in (12). Then the following results hold.
(a) H(x, y) is continuous on K × int(K) and H(·, y) is strictly convex for any y ∈ int(K).
(b) For any fixed y ∈ int(K), H(·, y) is continuously differentiable on int(K) with
∇xH(x, y) = ln x − ln y.
(c) H(x, y) ≥ 0 for any x ∈ K and y ∈ int(K); moreover, H(x, y) = 0 if and only if x = y.
(d) H(x, y) ≥ d(λ(x), λ(y)) for any x ∈ K, y ∈ int(K), where d(·, ·) is defined by
d(u, v) :=
Xn i=1
h
uiln ui− uiln vi+ vi− ui i
∀ u ∈ Rr+, v ∈ Rr++. (15)
(e) The level sets of H(·, y) and H(x, ·) are respectively bounded for any fixed y ∈ int(K) and x ∈ K.
Proof. (a) Since x ◦ ln x, x ◦ ln y are continuous in x ∈ K and y ∈ int(K), and the trace function is also continuous, it gives that H is continuous over K × int(K). On the other hand, from definition of H, it follows that
H(x, y) = Φ(x) − tr(x ◦ ln y) + tr(y) − tr(x). (16) Notice that Φ(x) is strictly convex over K by Lemma 3.2 (c) and the other terms in the above expression are clearly convex for any fixed y ∈ int(K). Therefore, H(·, y) is strictly convex for any fixed y ∈ int(K).
(b) From the expression of H(x, y) given as (16) and Lemma 3.2 (b), clearly, the function H(·, y) is continuously differentiable in int(K). Moreover,
∇xH(x, y) = ∇xΦ(x) − ln y − e = ln x − ln y.
(c) From the definition of Φ(x) and its gradient formula shown as in Lemma 3.2 (b), Φ(x) − Φ(y) − hΦ0(y), x − yi
= tr(x ◦ ln x) − tr(y ◦ ln y) − hln y + e, x − yi
= tr(x ◦ ln x) − tr(x ◦ ln y) − tr(x) + tr(y)
= H(x, y) (17)
for any x ∈ K and y ∈ int(K). On the other hand, the strict convexity of Φ shown in Lemma 3.2 (c) implies that
Φ(x) − Φ(y) − hΦ0(y), x − yi ≥ 0
and the equality holds if and only if x = y. The two sides readily give the desired result.
(d) First, from Lemma 3.1, it follows that
tr(x ◦ ln y) ≤ Xr
j=1
λj(x)λj(ln y) = Xr j=1
λi(x) ln(λj(y)), ∀ x ∈ V, y ∈ int(K).
Applying this inequality, we thus obtain for all x ∈ K and y ∈ int(K), H(x, y) = tr(x ◦ ln x + y − x) − tr(x ◦ ln y)
≥ tr(x ◦ ln x + y − x) − Xr
j=1
λj(x) ln(λj(y))
= Xr
j=1
h
λj(x) ln(λj(x)) + λj(y) − λj(x) i
− Xr
j=1
λj(x) ln(λj(y))
= Xr
j=1
h
λj(x) ln(λj(x)) − λj(x) ln(λj(y)) + λj(y) − λj(x) i
= d(λ(x), λ(y)).
(e) For a fixed y ∈ int(K), let Lγ(x) := {x ∈ K | H(x, y) ≤ γ} where γ ≥ 0. By part (c), we have Lγ(x) ⊆ {x ∈ K | d(λ(x), λ(y)) ≤ γ}. Since d(·, v) has bounded level sets for any v ∈ Rr++ (see [13, Proposition 4]) and λ(·) is continuous, Lγ(x) are bounded for all γ ≥ 0.
Similarly, H(x, ·) has bounded level sets for any x ∈ K. 2
Proposition 3.2 Let H(x, y) be given as in (12). Suppose {xk}k∈N ⊆ K and {yk}k∈N ⊂ int(K) are bounded sequences such that H(xk, yk) → 0. Then, as k → +∞, we have (a) λj(xk) − λj(yk) → 0 for all j = 1, 2, · · · , r.
(b) tr(xk− yk) → 0.
Proof. (a) By Proposition 3.1 (d) and the nonnegativity of d(·, ·), H(xk, yk) ≥ d(λ(xk), λ(yk)) ≥ 0.
Hence, H(xk, yk) → 0 implies d(λ(xk), λ(yk)) → 0. By the definition of d(·, ·) given by (15), d(λ(xk), λ(yk)) =
Xr j=1
λj(yk)ϕ¡
λj(xk)/λj(yk)¢
with ϕ(t) = t ln t − t + 1 (t ≥ 0). Since ϕ(t) ≥ 0 for any t ≥ 0, each term of the above sum is nonnegative. Therefore, d(λ(xk), λ(yk)) → 0 implies that
λj(yk)ϕ
³
λj(xk)/λj(yk)
´
→ 0, j = 1, 2, · · · , r, which is also equivalent to
λj(xk) ln(λj(xk)) − λj(xk) ln(λj(yk)) + λj(yk) − λj(xk) → 0, j = 1, 2, · · · , r.
Since {λj(xk)} and {λj(yk)} are bounded, using [6, Lemma A.1] yields that λj(xk) − λj(yk) → 0 for all j = 1, 2, · · · , r.
(b) Since tr(xk− yk) = Xr j=1
(λj(xk) − λj(yk)), the desired results follows by part (a). 2
Finally, we present two useful relations for the function H, which can be easily verified by direct substitution, using the definition of H given as in (12), and recalling the nonnegativity of H (see Proposition 3.1 (c)).
Proposition 3.3 Let H(x, y) be given in (12). For all x, y ∈ int(K) and z ∈ K, we have (a) H(z, x) − H(z, y) = tr(z ◦ ln y − z ◦ ln x + x − y).
(b) tr
³
(z − y) ◦ (ln y − ln x)
´
= H(z, x) − H(z, y) − H(y, x) ≤ H(z, x) − H(z, y).
4 Convergence analysis of the algorithm
The analysis of entropy-like proximal algorithm for the CSCP defined by (11)-(12) is similar to that developed for convex minimization problem by G¨uler [12] for quadratical proximal methods and its extension derived by Chen and Teboulle [4] for the proximal-like algorithm using Bregman functions. In what follows, we present a global convergence rate estimate for the entropy-like proximal algorithm (11)-(12) in terms of function values.
We make the following assumptions for the problem CSCP:
(A1) inf{f (x) | x ºK 0} := f∗ > −∞, (A2) dom(f ) := {x ∈ V | f (x) < ∞}T
int(K) 6= ∅.
In addition, we denote the optimal solution set of CSCP by X∗ := {x ∈ K | f (x) = f∗}.
Before proving the convergence results, we have to show that the algorithm given in (11)-(12) is well-defined, i.e., it generates a sequence {xk}k∈N ⊂ int(K). A technical lemma as below is needed for the proof.
Lemma 4.1 For any x ºK 0, y ÂK 0 and tr(x ◦ y) = 0, we always have x = 0.
Proof. It is clear that tr(x ◦ y) = 0 implies hx, yi = 0. From the self-duality of K and [11, Proposition I. 1.4], we know that
u ∈ int(K) ⇐⇒ hu, vi > 0, ∀ 0 6= v ∈ K.
This immediately implies the desired result that x = 0. 2
Proposition 4.1 Consider the proximal algorithm defined as in (11)-(12), for any y ∈ int(K) and µ > 0, we have the following results.
(a) Under assumption (A1), the function f (·) + µ−1H(·, y) has bounded level sets.
(b) If, in addition, assumption (A2) holds, then there exists a unique x(y) ∈ int(K) satis- fying
x(y) = argmin
x ºK0
½
f (x) + 1
µH(x, y)
¾
, (18)
and the minimum is attained at x(y) ∈ int(K) satisfying
µ−1(ln y − ln x(y)) ∈ ∂f (x(y)), (19) where ∂f is the subdifferential of f given as in (9).
Proof. (a) Fix a y ∈ int(K) and µ > 0. Let Fy(x) : V → R be a function given by Fy(x) := f (x) + µ−1H(x, y).
By Proposition 3.1 (e), the level sets
Lγ(x) := { x ∈ K | H(x, y) ≤ γ} (20)
are bounded for all γ ≥ 0 and any fixed y ∈ int(K). This implies that Fy(x) has bounded level sets by assumption (A1) (since, suppose not, there exists some γ such that Lγ(x) is not bounded).
(b) From part (a), Fy(x) has bounded level sets, which implies the existence of the minimum point x(y). Moreover, the strict convexity of Fy(x) by Proposition 3.1 (a) guarantees the uniqueness. Under assumption (A2), using Proposition 3.1 (b) and the optimality condition for (18) gives that
0 ∈ ∂f (x(y)) + µ−1
³
ln x(y) − ln y
´
+ ∂δ(x(y) | K), (21)
where δ(z| K) = 0 if z ºK 0 and +∞ otherwise. We will show that ∂δ(x(y)| K) = 0 and hence the desired result follows. We observe that for any xk ∈ int(K) with xk → x ∈ bd(K),
k ln xkk = ÃXr
j=1
[ln(λj(xk))]2
!1/2
→ +∞,
where the second relation is due to the continuity of λj(·) and ln t. Consequently, k∇xH(xk, y)k = k ln xk− ln yk ≥ k ln xkk − k ln yk → +∞.
This by [21, pp. 251] means that H(·, y) is essentially smooth, and hence ∂xH(x, y) = ∅ for all x ∈ bd(K) by [21, Theorem 26.1]. Thus, from (21), we must have x(y) ∈ int(K) (since, if not, there is ∂xH(x, y) = ∅ which contradicts (21)). Furthermore, from [21, pp. 226] it follows that
∂δ(z| K) = {v ∈ V | v ¹K 0, tr(v ◦ z) = 0}.
Using Lemma 4.1, we then obtain ∂δ(x(y)| K) = {0}. Thus, the proof is completed. 2 The next result provides the key properties of the algorithm (11)-(12) from which our convergence result will follow.
Proposition 4.2 Let {xk}k∈N be the sequence generated by the algorithm defined as in (11)-(12), and let σm :=Pm
k=1µk. Then, the following results hold.
(a) The sequence {f (xk)}k∈N is nonincreasing.
(b) µk(f (xk) − f (x)) ≤ H(x, xk−1) − H(x, xk) for all x ∈ K.
(c) σm(f (xm) − f (x)) ≤ H(x, x0) − H(x, xm) for all x ∈ K.
(d) {H(x, xk)}k∈N is nonincreasing for any x ∈ X∗, if the optimal set X∗ 6= ∅.
(e) H(xk, xk−1) → 0 and tr(xk− xk−1) → 0, if the optimal set X∗ 6= ∅.
Proof. (a) By the definition of xk given as in (11), we know that f (xk) + µ−1k H(xk, xk−1) ≤ f (xk−1) + µ−1k H(xk−1, xk−1).
Since H(xk, xk−1) ≥ 0 and H(xk−1, xk−1) = 0 (see Proposition 3.1 (c)), we obtain f (xk) ≤ f (xk−1) ∀ k ∈ N.
(b) From (19), we have −µ−1k (ln xk− ln xk−1) ∈ ∂f (xk). Plugging this into the formula of
∂f (xk) given as in (10), we obtain f (x) ≥ f (xk) − µ−1k · tr
³
(x − xk) ◦ (ln xk− ln xk−1)
´
∀ x ∈ K.
This yields
µk· (f (xk) − f (x)) ≤ tr³
(x − xk) ◦ (ln xk− ln xk−1)´
= H(x, xk−1) − H(x, xk) − H(xk, xk−1) (22)
≤ H(x, xk−1) − H(x, xk),
where the equality and the last inequality hold due to Proposition 3.3 (b).
(c) First, summing up the inequalities as part (b) over k = 1, 2, 3, · · · , m, yields that Xm
k=1
µkf (xk) − σmf (x) ≤ H(x, x0) − H(x, xm) ∀ x ∈ K. (23)
On the other hand, note that f (xm) ≤ f (xk) from part(a). Multiplying this inequality by µk gives
µkf (xm) ≤ µkf (xk).
Then summing up leads to Xm
k=1
µkf (xk) ≥ Xm k=1
µkf (xm) = σmf (xm). (24)
Now, combining (23) and (24) yields the desired result.
(d) Since f (xk) − f (x) = f (xk) − f∗ ≥ 0 for all x ∈ X∗, the result immediately follows from part (b).
(e) From part (d), we know that the sequence {H(x, xk)}k∈N is nonincreasing for any x ∈ X∗. This together with H(x, xk) ≥ 0 implies that {H(x, xk)}k∈N is convergent. Thus, H(x, xk−1) − H(x, xk) → 0 ∀ x ∈ X∗. (25) On the other hand, from (22) in the proof of part (b) as well as part (d), we have
0 ≤ µk(f (xk) − f (x)) ≤ H(x, xk−1) − H(x, xk) − H(xk, xk−1), ∀ x ∈ X∗. Hence,
H(xk, xk−1) ≤ H(x, xk−1) − H(x, xk).
Using the equation (25) and the nonnegativity of H(xk, xk−1), the first result is obtained.
Since {xk}k∈N ⊂ {y ∈ int(K) | H(x, y) ≤ H(x, x0)} with x ∈ X∗, the sequence {xk}k∈N is bounded. Thus, the second result follows by the first result and Proposition 3.2 (b). 2
By now, we have proved that the algorithm (11)-(12) is well-defined and satisfies some favorable properties. With these properties, we are ready to show the convergence results of the proposed algorithm which is one of the main purposes of this paper.
Proposition 4.3 Let {xk}k∈N be the sequence generated by the algorithm in (11)-(12), and let σm :=Pm
k=1µk. Then, under assumptions (A1) and (A2), the following results hold.
(a) f (xm) − f (x) ≤ σm−1H(x, x0) for all x ∈ K.
(b) If σm → +∞, then limm→+∞f (xm) = f∗.
(c) If the optimal set X∗ 6= ∅, the sequence {xk}k∈N is bounded and every accumulation point is a solution of the problem CSCP.
Proof. (a) This result follows by Proposition 4.2 (c) and the nonnegativity of H(x, xm).
(b) Since σm → +∞, passing to the limit in part (a), we have lim sup
m→+∞
f (xm) ≤ f (x) ∀ x ∈ K.
This together with the fact f (xm) ≥ f∗ yields the desired result.
(c) From Proposition 3.1 (d), H(x, ·) has bounded level sets for every x ∈ K. Also, from Proposition 4.2 (d), {H(x, xk)}k∈N is nonincreasing for all x ∈ X∗ if X∗ 6= ∅. Thus, the sequence {xk}k∈N is bounded and therefore has an accumulation point. Let ˆx ∈ K be an accumulation point of {xk}k∈N. Then {xkj} → ˆx for some kj → +∞. Since f is lower semi-continuous, we have f (ˆx) = lim infkj→+∞f (xkj) (see [24, page 8]). On the other hand, we also have f (xkj) → f∗, hence f (ˆx) = f∗ which says that ˆx is a solution of CSCP. 2
Proposition 4.3 (a) indicates an estimate for global rate of convergence which is similar to the one obtained by proximal-like algorithms in convex minimization. Analogously, as remarked in [6, Remark 4.1], global convergence of {xk} itself to an optimal solution of (1) can be achieved under the assumption that {xk} ⊂ int(K) has a limit point in int(K).
Nonetheless, this assumption is rather stringent.
5 The exponential multiplier method for SCLP
In this section, we present a dual application of the entropy-like proximal algorithm (11)- (12), leading to an exponential multiplier method for the symmetric cone linear program
(SCLP) min (
−bTy : c − Xm
i=1
yiai ºK 0, y ∈ Rm )
, (26)
where c, ai ∈ V for i = 1, 2, · · · , m and b ∈ Rm. For convenience, we denote the feasible set of (SCLP) by S = {y ∈ Rm : A(y) ºK 0}, where A(y) : Rm → V is defined by
A(y) := c −Pm
i=1yiai. (27)
We make the following assumptions for the problem (SCLP):
(S1) The optimal solution set of (SCLP) is nonempty and bounded;
(S2) Slater’s condition holds, i.e., there exists a ˆy ∈ Rm such that A(ˆy) ÂK 0.
The Lagrangian function associated with (SCLP) is given as follows:
L(y, x) =
½ −bTy − tr [x ◦ A(y)] if x ∈ K,
+∞ otherwise.
Therefore, the dual problem corresponding to (SCLP) is given by
(DSCLP) max
xºK0 inf
y∈RmL(y, x), which can be explicitly written as
(DSCLP)
max −tr(c ◦ x)
s.t. tr(ai◦ x) = bi, i = 1, 2, · · · , m, x ºK 0.
By standard convex analysis arguments from [21], we know that under the assumption (S2), there is no duality gap between (SCLP) and (DSCLP). Moreover, the set of optimal solution of (DSCLP) is nonempty and compact.
To solve the problem (SCLP), we introduce the following multiplier-type algorithm:
Algorithm 5.1 Given an x0 ∈ int(K), generate the sequence {yk}k∈N ⊆ Rm and {xk}k∈N ⊂ int(K) by
yk = argmin
y∈Rm
n
−bTy + µ−1k tr h
exp
³
− µkA(y) + ln xk−1
´io
, (28)
xk = exp¡
−µkA(yk) + ln xk−1¢
, (29)
where the penalty parameters µk satisfy µk > ¯µ > 0 for all k ∈ N.
The algorithm can be viewed as a natural extension of the exponential multiplier method used in convex programs over nonnegative orthant cones (see, e.g., [3, 29]) to symmetric cones. In the rest of this section, we will show that the algorithm (28)-(29) shares similar properties.
We first present two technical lemmas. One of them is to collect some properties of the L¨owner operator exp(z), while the other is to characterize the recession cone of the set S.
Lemma 5.1 (a) For any z ∈ V, there always holds that exp(z) ∈ int(K).
(b) The function exp(z) is continuously differentiable everywhere with (exp(z))0e = ∇z(exp(z))e = exp(z) ∀ z ∈ V.
(b) The function tr[exp(Pm
i=1yiai)] is continuously differentiable everywhere with
∇ytr h
exp(Pm
i=1yiai) i
= AT exp(Pm
i=1yiai)
where y ∈ Rm and ai ∈ V for i = 1, 2, · · · , m, and AT ∈ Rm×n be the matrix cor- responding to the linear transformation that maps x ∈ V to the m-vector whose ith component is hai, xi.
Proof. (a) The result is clear since for any z ∈ V with z = Pr
j=1λj(z)cj, all eigenvalues of exp(z), given by exp(λj(z)) for j = 1, 2, · · · , r, are strictly positive.
(b) By Theorem 2.2, clearly, exp(z) is continuously differentiable everywhere and
(exp(z))0h = Xr
j=1
exp(λj(z))hcj, hicj+ X
1≤j<l≤r
4exp(λj(z)) − exp(λl(z))
λj(z) − λl(z) cj ◦ (cl◦ h)
for any h ∈ V. From this formula, we particularly have that (exp(z))0e =
Xr j=1
exp(λj(z))hcj, eicj+ X
1≤j<l≤r
4exp(λj(z)) − exp(λl(z))
λj(z) − λl(z) cj ◦ (cl◦ e)
= Xr
j=1
exp(λj(z))cj = exp(z).
(c) From the continuous differentiability of trace function and part (b), we immediately obtain the first part. Notice that
tr h
exp(Pm
i=1yiai) i
= D
exp(Pm
i=1yiai), e E
.
Therefore, applying the differential chain rule readily yields the desired result. 2
Lemma 5.2 Let S∞ denote the recession cone of the feasible set S. Then, S∞ = {d ∈ Rm : A(d) − c ºK 0} .
Proof. Assume that d ∈ Rm such that A(d) − c ºK 0. If d = 0, clearly, d ∈ S∞. If d 6= 0, we take any y ∈ S. Then A(y) ºK 0, and moreover, for any τ > 0,
A(y + τ d) = c − Xm
i=1
(yi + τ di)ai = A(y) + τ (A(d) − c) ºK 0.
This, by the definition of recession direction [21, pp. 61], shows that d ∈ S∞. Therefore, {d ∈ Rm : A(d) − c ºK 0} ⊆ S∞.
Now take any d ∈ S∞ and y ∈ S. Then A(y + τ d) ºK 0 for any τ > 0, and therefore λmin[A(y + τ d)] ≥ 0. This implies that λmin(A(d) − c) ≥ 0. If not, using the fact that
λmin[A(y + τ d)] = λmin[A(y) + τ (A(d) − c)]
≤ λmin(τ (A(d) − c)) + λmax(A(y))
= τ λmin(A(d) − c) + λ(A(y)),
where the inequality is due to Lemma 2.1, and letting τ → +∞, we then obtain λmin[A(y + τ d)] → −∞,
which contradicts to λmin[A(y + τ d)] ≥ 0. Thus, we have that A(d) − c ºK 0. Combining with the above discussions, we get S∞= {d ∈ Rm |A(d) − c ºK 0}. 2
Now we begin with the convergence of the algorithm (28)-(29). We first prove that the sequence generated by (28)-(29) is well-defined, which is implied by the following lemma.
Lemma 5.3 For any y ∈ Rm and µ > 0, let F : Rm → R be defined by F (y) := −bTy + µ−1tr£
exp(−µA(y) + ln xk−1)¤
(30) where µ > 0. Then under assumption (S1) the minimum set of F is nonempty and bounded.
Proof. By assumption (S1), it suffices to prove F (y) has bounded level sets. Suppose not, then there exists a sequence {yk} ⊆ Rm such that
kykk → +∞, lim
k→+∞
yk
kykk = d 6= 0 and F (yk) ≤ δ. (31) Since by Lemma 5.1(a) exp(−µA(y) + ln xk−1) ∈ int(K) for any y ∈ Rm, we have
tr h
exp
³
− µA(y) + ln xk−1
´i
= Xr
j=1
exp h
λj
³
− µA(y) + ln xk−1
´i
> 0.
Thus, F (yk) ≤ δ implies that the following two inequalities hold:
−bTyk < δ, Xr
j=1
exp h
λj
³
− µA(yk) + ln xk−1
´i
≤ µ(δ + bTyk).
From this, we immediately obtain that
−bTd ≤ 0, (32)
exp h
λj
³
− µA(yk) + ln xk−1
´i
≤ µ(δ + bTyk), ∀ j = 1, 2, · · · , r. (33) Using the monotonicity of ln t (t > 0) and the homogeneity of λj(·), we can rewrite (33) as
λj µ
−µA(yk)
kykk + ln xk−1 kykk
¶
≤ ln(µ(δ + bTyk))
kykk ≤ µ(δ + bTyk) − 1
kykk , ∀ j = 1, 2, · · · , r, where the second inequality is due to ln t ≤ t − 1 (t > 0). Passing to the limit on the both sides of the inequality and using the homogeneity of λj(·) again, we then get
λj(A(d) − c) ≥ 0, ∀ j = 1, 2, · · · , r, which in turn means that
A(d) − c ºK 0. (34)
From Lemma 5.2, we know that {d ∈ Rm : A(d) − c ºK 0} = S∞. Combining with (32), we thus show that there exists a nonzero d ∈ Rm such that d ∈ S∞ but −bTd ≤ 0. This clearly contradicts the assumption (S1). Hence, the proof is completed. 2
To analyze the convergence of the algorithm (28)-(29), we also need the following result which states that the sequence {xk}k∈N generated by (28)-(29) is exactly the sequence produced by the algorithm (11)-(12) when applied to the dual problem (DSCLP).