(1)A CLASS OF INTERIOR PROXIMAL-LIKE ALGORITHMS FOR CONVEX SECOND-ORDER CONE PROGRAMMING∗ SHAOHUA PAN† AND JEIN-SHAN CHEN‡ Abstract

(1)

A CLASS OF INTERIOR PROXIMAL-LIKE ALGORITHMS FOR CONVEX SECOND-ORDER CONE PROGRAMMING^∗

SHAOHUA PAN^† AND JEIN-SHAN CHEN^‡

Abstract. We propose a class of interior proximal-like algorithms for the second-order cone program, which is to minimize a closed proper convex function subject to general second-order cone constraints. The class of methods uses a distance measure generated by a twice continuously diﬀerentiable strictly convex function on (0, +∞), and includes as a special case the entropy-like proximal algorithm [Eggermont, Linear Algebra Appl., 130 (1990), pp. 25–42], which was originally proposed for minimizing a convex function subject to nonnegative constraints. Particularly, we consider an approximate version of these methods, allowing the inexact solution of subproblems. Like the entropy-like proximal algorithm for convex programming with nonnegative constraints, we, under some mild assumptions, establish the global convergence expressed in terms of the objective values for the proposed algorithm, and we show that the sequence generated is bounded, and every accumulation point is a solution of the considered problem. Preliminary numerical results are reported for two approximate entropy-like proximal algorithms, and numerical comparisons are also made with the merit function approach [Chen and Tseng, Math. Program., 104 (2005), pp. 293–327], which verify the eﬀectiveness of the proposed method.

Key words. proximal method, measure of distance, second-order cone, second-order cone- convexity

AMS subject classiﬁcations. 65K05, 90C30 DOI. 10.1137/070685683

1. Introduction. We consider the following convex second-order cone program- ming (CSOCP):

min f (ζ)

subject to (s.t.) Aζ + b_K 0, (1)

where f :R^m→ (−∞, +∞] is a closed proper convex function; A is an n × m matrix, with n≥ m; b is a vector in Rⁿ; xK0 means x∈ K; and K is the Cartesian product of second-order cones (SOCs), also called Lorentz cones [14]. In other words,

K = Kⁿ¹× Kⁿ²× · · · × Kⁿ^N, (2)

where N, n1, . . . , nN ≥ 1, n1+ n2+· · · + nN = n, and Kⁿⁱ :=

(x1, x2)∈ R × Rⁿⁱ⁻¹ | x1≥ x2 ,

with · denoting the Euclidean norm and K¹ denoting the set of nonnegative reals R+. The CSOCP, as an extension of the standard second-order cone programming, has a wide range of applications from engineering, control, and ﬁnance to robust optimization and combinatorial optimization; see [1, 21, 23] and references therein.

∗Received by the editors March 19, 2007; accepted for publication (in revised form) April 25, 2008; published electronically August 13, 2008.

http://www.siam.org/journals/siopt/19-2/68568.html

†School of Mathematical Sciences, South China University of Technology Guangzhou 510640, China ([email protected]). This author’s work is partially supported by the Doctoral Starting-up Foundation (B13B6050640) of GuangDong Province.

‡Department of Mathematics, National Taiwan Normal University Taipei 11677, Taiwan (jschen@

math.ntnu.edu.tw). Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Oﬃce. This author’s work is partially supported by the National Science Council of Taiwan.

883

(2)

Recently, the second-order cone programming (SOCP) and the SOC complemen- tarity problem have received much attention in optimization. There exist many methods for solving the CSOCP, including the smoothing methods [10, 15], the smoothing- regularization method [17], the semismooth Newton method [22], and the merit function approach [8]. All of these methods are proposed by using some SOC comple- mentarity function or merit function to reformulate the KKT optimality conditions of the CSOCP as a nonsmooth (or smoothing) system of equations or an unconstrained minimization problem. Notice that the CSOCP is a typical convex programming problem which has extensive applications. But, to the best of our knowledge, there are few convex programming methods developed for (or extended to) the CSOCP except the interior point method [33]. Hence, it is worthy to explore other types of convex programming methods for the CSOCP which are diﬀerent from the aforementioned methods.

One such method is the proximal point algorithm for minimizing a convex function f (ζ) overR^m, which generates a sequence{ζ^k} by the following iterative scheme:

ζ^k = argmin

ζ∈R^m

f (ζ) + 1

2μ_kζ − ζ^k⁻¹²

, (3)

where μ_k is a sequence of positive numbers. The method was originally introduced by Martinet [24] with the Moreau proximal approximation of f (see [25]), and then further developed by Rockafellar [30, 31]. Later, some researchers [5, 13, 32] proposed and studied nonquadratic proximal point algorithms by replacing the quadratic distance in (3) with a Bregman distance or an entropy-like distance.

The entropy-like proximal algorithm was designed for minimizing a convex func- tion f (ζ) subject to nonnegative constraints ζ≥ 0. In [12], Eggermont ﬁrst introduced the Kullback–Leibler relative entropy, deﬁned by

1d(ζ, ξ) =

m i=1

ζiln(ζi/ξi) + ζi− ξi ∀ζ ≥ 0, ξ > 0,

and established the following entropy-like proximal point algorithm:

ζ⁰> 0, ζ^k= argmin

ζ>0

f (ζ) + μk−1d(ζ^k⁻¹, ζ) (4) .

Later, Teboulle [32] proposed to replace the usual Kullback–Leibler relative entropy with a new type of distance-like function, called ϕ-divergence, to deﬁne the entropy- like proximal map. Let ϕ : R → (−∞, +∞] be a closed proper convex function satisfying certain conditions (see [18, 32]). The ϕ-divergence induced by ϕ is deﬁned as

dϕ(ζ, ξ) :=

m i=1

ξiϕ(ζi/ξi).

(5)

Based on the ϕ-divergence, Isume et al. [18, 19] generalized Eggermont’s algorithm as

ζ⁰> 0, ζ^k = argmin

ζ>0

f (ζ) + μk−1dϕ(ζ, ζ^k⁻¹) (6) ,

1The convention of 0 ln 0 = 0 is used throughout this paper.

(3)

and they obtained the convergence theorems under weaker assumptions. Clearly, when

ϕ(t) =− ln t + t − 1 (t > 0),

we have that d_ϕ(ζ, ξ) = d(ξ, ζ), and consequently the algorithm reduces to Egger- mont’s.

Observing that the proximal-like algorithm (6) associated with ϕ(t) =− ln t+t−1 inherits the features of the interior point method as well as the proximal point method, Auslender [2] extended the algorithm to general linearly constrained convex minimization problems and variational inequalities on polyhedra. Then, is it possible to extend the algorithm to nonpolyhedra symmetric conic optimization problems and establish the corresponding convergence results? In this paper, we will explore its extension to the setting of SOCs and establish a class of interior proximal-like algorithms for the CSOCP. We should mention that the algorithm (6) with the entropy function t ln t− t + 1 (t ≥ 0) was recently extended to convex semideﬁnite programming [11].

For simplicity, in the rest of this paper, we focus on the case whereK = Kⁿ. All of the analysis can be carried over to the general case whereK has the direct product structure as (2). It is known thatKⁿ is a closed convex cone with the interior given by

int(Kⁿ) :=

(x1, x2)∈ R × Rⁿ⁻¹ | x1>x2 .

For any x, y inRⁿ, we write x_Kn y if x−y ∈ Kⁿ; and write x _Kn y if x−y ∈ int(Kⁿ).

In other words, we have that x_Kn 0 if and only if x∈ Kⁿ and x _Kn 0 if and only if x∈ int(Kⁿ). We denoteF by the constraint set of the CSOCP, i.e.,

F :=

ζ∈ R^m| Aζ + b _Kn 0

. (7)

It is not diﬃcult to verify thatF is convex, and its interior int(F) is given by int(F) :=

ζ∈ R^m | Aζ + b _Kn 0

.

The proximal-like algorithm that we propose for the CSOCP is deﬁned as follows:

ζ⁰∈ int(F), ζ^k = argmin

ζ∈int(F)

f (ζ) + μ⁻¹_k D(Aζ + b, Aζ^k⁻¹+ b) (8) ,

where D : Rⁿ× Rⁿ → (−∞, +∞] is a closed proper convex function generated by a class of twice continuously differentiable strictly convex functions on (0, +∞), and the specific expression is given in section 3. The class of distance measures, as will be shown in section 3, includes as a special case the natural extension of dϕ(x, y), with ϕ(t) =− ln t + t − 1 to the SOCs. For the proximal-like algorithm (8), we par- ticularly consider an approximate version which allows an inexact minimization of the subproblem (8) and establish its global convergence results under some mild assumptions. Numerical results are reported for two approximate entropy-like proximal algorithms, which verify the effectiveness of the proximal method proposed. In addition, numerical comparisons with the merit function approach [8] indicate that the condition number of the Hessian matrix∇²f (ζ) has a great influence on the numerical performance of the proximal-like algorithm and the merit function approach, but the

(4)

former seems to have no direct relation with the dense degree of test problems, but the latter tends to more function evaluations as the density increases.

The outline of this paper is as follows. In section 2, we review some basic concepts and properties associated with SOCs. In section 3, we state the deﬁnition of D(x, y) and present some speciﬁc examples. Some favorable properties of D(x, y) are investigated in section 4. In section 5, we describe an approximate proximal-like algorithm allowing inexact minimization in (8) and establish the global convergence of the algorithm. In section 6, we report our numerical experiences for the proposed proximal-like algorithm by solving some convex SOCPs. Finally, we conclude this paper in section 7.

Throughout this paper, I represents an identity matrix of suitable dimension, andRⁿ denotes the space of n-dimensional real column vectors. For a differentiable function h on R, we denote h, h, and h by its first, second, and third derivative, respectively. Given a set S, we denote ¯S, int(S), and bd(S) by the closure, the interior and the boundary of S, respectively. Note that a function is closed if and only if it is lower semicontinuous, and a function is proper if f (ζ) <∞ for at least one ζ ∈ R^mand f (ζ) >−∞ for all ζ ∈ R^m. For a closed proper convex function f :R^m→ (−∞, +∞], we denote its domain by domf :={ ζ ∈ R^m| f(ζ) < ∞} and the subdifferential of f at ζ by

∂f (ζ) :=

w∈ R^m ^m

.

If f is diﬀerentiable at ζ, the notation∇f(ζ) represents the gradient at ζ of f.

2. Preliminaries. This section recalls some basic concepts and preliminary re- sults related to SOCs that will be used in the subsequent analysis. For any x = (x₁, x₂)∈ R × Rⁿ⁻¹ and y = (y₁, y₂)∈ R × Rⁿ⁻¹, we deﬁne their Jordan product as

x◦ y := (x, y, y1x2+ x1y2).

(9)

We write x² to mean x◦ x and write x + y to mean the usual componentwise addition of vectors. Then◦, +, and e = (1, 0, . . . , 0)^T ∈ Rⁿ have the following basic properties (see [14, 15]): (1) e◦ x = x for all x ∈ Rⁿ. (2) x◦ y = y ◦ x for all x, y ∈ Rⁿ. (3) x◦(x²◦y) = x²◦(x◦y) for all x, y ∈ Rⁿ. (4) (x+y)◦z = x◦z+y◦z for all x, y, z ∈ Rⁿ. The Jordan product is not associative. For example, for n = 3, let x = (1,−1, 1) and y = z = (1, 0, 1), then we have that (x◦ y) ◦ z = (4, −1, 4) = x ◦ (y ◦ z) = (4, −2, 4).

However, it is power associated, i.e., x◦ (x ◦ x) = (x ◦ x) ◦ x for all x ∈ Rⁿ. Thus, we may, without fear of ambiguity, write x^m for the product of m copies of x and x^m+n= x^m◦ xⁿfor all positive integers m and n. We stipulate that x⁰= e. Besides, Kⁿis not closed under Jordan product. For example, x = (1, 1, 0), y = (2,−1, 3) ∈ Kⁿ, but x◦ y = (1, 1, 3) ∈ Kⁿ.

For each x = (x₁, x₂)∈ R × Rⁿ⁻¹, the determinant and the trace of x are deﬁned by

det(x) = x²₁− x2², tr(x) = 2x1. (10)

In general, det(x◦ y) = det(x) det(y) unless x2 = αy₂ for some α ∈ R. A vector x = (x₁, x₂)∈ R × Rⁿ⁻¹ is said to be invertible if det(x)= 0. If x is invertible, then there exists a unique y = (y₁, y₂)∈ R × Rⁿ⁻¹ satisfying x◦ y = y ◦ x = e. We call

(5)

this y the inverse of x and denote it by x⁻¹. In fact, we have that x⁻¹= 1

x²₁− x2²(x₁,−x2) = 1

det(x)(tr(x)e− x).

(11)

Hence, x ∈ int(Kⁿ) if and only if x⁻¹ ∈ int(Kⁿ), and (x^k)⁻¹ is well-deﬁned if x ∈ int(Kⁿ).

In the following, we recall from [15] that each x = (x₁, x₂)∈ R × Rⁿ⁻¹ admits a spectral factorization associated withKⁿ of the form

x = λ₁(x)· u⁽¹⁾x + λ₂(x)· u⁽²⁾x ,

where λi(x) and u⁽ⁱ⁾x for i = 1, 2 are the spectral values and the associated spectral vectors of x, respectively, given by

λi(x) = x1+ (−1)ⁱx2,

u⁽ⁱ⁾_x =

⎧⎪

⎨

⎪⎩ 1 2

1, (−1)ⁱ x₂

x2

if x₂= 0;

1 2

1, (−1)ⁱx¯₂

if x₂= 0, (12)

with ¯x2being any vector inRⁿ⁻¹such that¯x2 = 1. If x2= 0, then the factorization is unique. The spectral decomposition along with the Jordan algebra associated with SOC has some basic properties, whose proofs can be found in [14, 15]. Here, we list four of them that will often be used in the subsequent sections.

Property 2.1. For any x = (x1, x2) ∈ R × Rⁿ⁻¹ with the spectral values λ1(x), λ2(x) and spectral vectors u⁽¹⁾x , u⁽²⁾x given as in (12), the following results hold:

(a) u⁽¹⁾x and u⁽²⁾x are orthogonal under Jordan product and have length 1/√ 2, i.e., u⁽¹⁾_x ◦ u⁽²⁾x = 0, u⁽¹⁾x  = u⁽²⁾x  = 1/√

2.

(b) u⁽¹⁾x and u⁽²⁾x are idempotent under Jordan product, i.e., u⁽ⁱ⁾x ◦ u⁽ⁱ⁾x = u⁽ⁱ⁾x for i = 1, 2.

(c) The determinant, the trace, and the Euclidean norm of x can be denoted by λ1(x), λ2(x):

det(x) = λ1(x)λ2(x), tr(x) = λ1(x) + λ2(x), x²= [λ1(x)]²+ [λ2(x)]²

2 .

(d) λ1(x) are nonnegative (positive) if and only if x∈ Kⁿ (x∈ int(Kⁿ)).

Lemma 2.1.

(a) For any x∈ Rⁿ, x_Kn 0⇐⇒ x, y ≥ 0 for any y _Kn 0.

(b) For any x∈ Rⁿ, x _Kn 0⇐⇒ x, y > 0 for any y Kⁿ0 and y= 0.

(c) For any x, y ∈ Rⁿ, let λi(x) and λi(y) for i = 1, 2 be their spectral values.

Then,

λ1(x)λ2(y) + λ2(x)λ1(y)≤ tr(x ◦ y) ≤ λ1(x)λ1(y) + λ2(x)λ2(y).

Proof. Part (a) is direct by the self-duality ofKⁿ, and we next consider parts (b) and (c).

(b) Let x = (x₁, x₂), y = (y₁, y₂)∈ R × Rⁿ⁻¹. The necessity follows from x, y = x1y₁+ x^T₂y₂≥ x1y₁− x2y2 ≥ x1y₁− y1x2 = y1(x₁− x2) > 0,

(6)

where the ﬁrst inequality is by Cauchy–Schwartz, the second is due to y_Kn 0, and the third is since x _Kn 0 and y= 0, y _Kn 0. Next, we prove the suﬃciency. First, from x, y > 0 for any y _Kn 0 and y = 0, we deduce that x1> 0 by setting y = e.

If x2= 0, then the conclusion follows. If x2= 0, then we set y = (1, −_x^x²₂). Clearly, y _Kn 0, y= 0, and 0 < x, y = x1− x2 = λ1(x). By Property 2.1 (d), we then have x _Kn 0.

(c) For any x = (x₁, x₂), y = (y₁, y₂)∈ R × Rⁿ⁻¹, by (12) we can compute that λ1(x)λ2(y) + λ2(x)λ1(y) = 2x1y1− 2x2y2 ≤ 2(x1y1+ x^T₂y2) = tr(x◦ y), λ₁(x)λ₁(y) + λ₂(x)λ₂(y) = 2x₁y₁+ 2x2y2 ≥ 2(x1y₁+ x^T₂y₂) = tr(x◦ y) . Combining with the two inequalities above then yields the desired result.

For any h :R → R, the following vector-valued function was considered in [6, 15]:

h^soc(x) = h[λ1(x)]· u⁽¹⁾x + h[λ2(x)]· u⁽²⁾x ∀x = (x1, x2)∈ R × Rⁿ⁻¹. (13)

If h is defined only on a subset ofR, then h^soc is defined on the corresponding subset of Rⁿ. The definition in (13) is unambiguous whether x₂ = 0 or x2 = 0. For the vector-valued function h^soc induced by h, we have the following results.

Lemma 2.2. Given a function h : I_R→ R, let h^soc : S→ Rⁿ be the vector-valued function induced by h as in (13), where I_R ⊆ R and S ⊆ Rⁿ. Then, the following results hold:

(a) For any x ∈ S, λi[h^soc(x)] = h[λi(x)] for i = 1, 2 and tr[h^soc(x)] =

2

i=1h[λi(x)].

(b) If h is continuously diﬀerentiable on I_R, then h^soc is continuously diﬀeren- tiable on the set S, and its transposed Jacobian at x = (x₁, x₂)∈ S is given by the formula

∇h^soc(x) = h(x1)I (14)

if x2= 0, and otherwise

∇h^soc(x) =

⎡

⎢⎢

⎣

b c x^T₂

x2 c x₂

x2 aI + (b− a)x₂x^T₂

x2²

⎤

⎥⎥ (15) ⎦ ,

where

a = h[λ₂(x)]− h[λ1(x)]

λ2(x)− λ1(x) , b = h[λ₂(x)] + h[λ₁(x)]

2 , c = h[λ₂(x)]− h[λ₁(x)]

2 .

(c) If h is continuously diﬀerentiable on I_R, then tr[h^soc(x)] is continuously dif- ferentiable on the set S, and its gradient ∇tr[h^soc(x)] = 2∇h^soc(x)· e = 2(h)^soc(x).

(d) If h is (strictly) convex on I_R, then tr[h^soc(x)] is (strictly) convex on the set S.

Proof. (a) The proof is direct by the deﬁnition of h^soc and the spectral value.

(b) The conclusion follows directly from [15, Propostion 5.2] or [6, Proposition 4].

(c) Since tr[h^soc(x)] = 2h^soc(x), e, by part (b) tr[h^soc(x)] is obviously continuously diﬀerentiable. Applying the chain rule for the inner product of two functions yields

∇tr[h^soc(x)] = 2∇h^soc(x)· e,

(7)

where ∇h^soc(x) is given by (14)–(15). By a simple computation, it is easy to verify that

∇h^soc(x)· e = h[λ1(x)]u⁽¹⁾_x + h[λ2(x)]u⁽²⁾_x = (h)^soc(x).

Combining the last two equalities immediately gives the second part of the conclusions.

(d) The proof is similar to that of [26, Lemma 3.2 (d)], and so we omit it.

To close this section, we review the deﬁnition of SOC-convexity and SOC-monotonicity. The two concepts, such as the matrix-convexity and the matrix-monotonicity in the semideﬁnite programming, play an important role in the solution methods of SOCPs.

Definition 2.1 (see [7]). Given a function h : I_R→ R, let h^soc : S→ Rⁿ be the vector-valued function deﬁned as in (13), where I_R⊆ R and S ⊆ Rⁿ. Then,

(a) h is said to be SOC-monotone of order n on I_Rif for any x, y∈ S, x_Kn y =⇒ h^soc(x)_Kn h^soc(y).

(b) h is said to be SOC-convex of order n on I_Rif for any x, y∈ S and 0 ≤ β ≤ 1, h^soc

βx + (1− β)y

_Kn βh^soc(x) + (1− β)h^soc(y).

(16)

We say that h is SOC-convex (respectively, SOC-monotone) on I_Rif h is SOC-convex of all order n (respectively, SOC-monotone of all order n) on I_R. A function h is said to be SOC-concave on I_R whenever−h is SOC-convex on IR. When h is continuous on I_R, the condition in (16) can be replaced by the more special condition:

h^soc

x + y 2

_Kn 1 2

h^soc(x) + h^soc(y)

. (17)

Obviously, the set of SOC-monotone functions and the set of SOC-convex functions are both closed under positive linear combinations and under pointwise limits.

3. Distance-like functions in SOCs. In this section, we present the deﬁnition of the distance-like function D(x, y) involved in the proximal-like algorithm (8) and some speciﬁc examples. Let φ :R → (−∞, +∞] be a closed proper convex function with domφ = [0, +∞) and assume that

(C.1) φ is strictly convex on its domain.

(C.2) φ is twice continuously diﬀerentiable on int(domφ), with lim_t_→0+φ(t) = +∞.

(C.3) φ(t)t− φ(t) is convex on int(domφ).

(C.4) φ is SOC-concave on int(domφ).

In what follows, we denote by Φ the class of functions satisfying Conditions C.1–C.4.

Given a φ∈ Φ, let φ^soc and (φ)^soc be the vector-valued function given as in (13).

We deﬁne D(x, y) involved in the proximal-like algorithm (8) by

D(x, y) :=

tr

φ^soc(y)− φ^soc(x)− (φ)^soc(x)◦ (y − x)

∀x ∈ int(Kⁿ), y∈ Kⁿ,

+∞ otherwise.

(18)

The function, as will be shown in the next section, possesses some favorable properties.

Particularly, D(x, y)≥ 0 for any x, y ∈ int(Kⁿ), and D(x, y) = 0 if and only if x = y.

Hence, D(x, y) can be used to measure the distance between the two points in int(Kⁿ).

(8)

In the following, we concentrate on the examples of the distance-like function D(x, y). For this purpose, we ﬁrst give another characterization for Condition C.3.

Lemma 3.1. Let φ : R → (−∞, +∞] be a closed proper function with domφ = [0, +∞). If φ is thrice continuously diﬀerentiable on int(domφ), then φ satisﬁes Con- dition C.3 if and only if its derivative function φ is exponentially convex,² or

φ(t1t2)≤ 1 2

φ(t²₁) + φ(t²₂)

∀t1, t2> 0.

(19)

Proof. Since the function φ is thrice continuously diﬀerentiable on int(domφ), φ satisﬁes Condition C.3 if and only if

φ(t) + tφ(t)≥ 0 (∀t > 0).

Observe that the inequality is also equivalent to

tφ(t) + t²φ(t)≥ 0 (∀t > 0),

and hence substituting by t = exp(θ) for θ∈ R into the inequality yields that exp(θ)φ(exp(θ)) + exp(2θ)φ(exp(θ))≥ 0 ∀θ ∈ R.

Since the left-hand side of this inequality is exactly [φ(exp(θ))], it means that φ(exp(·)) is convex on R. Consequently, the ﬁrst part of the conclusions follows.

Note that the convexity of φ(exp(·)) on R is equivalent to saying, for any θ1, θ2∈ R,

φ(exp(rθ1+ (1− r)θ2))≤ rφ(exp(θ1)) + (1− r)φ(exp(θ2)), r∈ [0, 1], which, by letting t1= exp(θ1) and t2= exp(θ2), can be rewritten as

φ(t^r₁t¹₂^−r)≤ rφ(t₁) + (1− r)φ(t₂) ∀t1, t₂> 0 and r∈ [0, 1].

This is clearly equivalent to the statement in (19) due to the continuity of φ. Remark 3.1. The exponential convexity was also used in the deﬁnition of the self-regular function [27] in which the authors denote Ω by the set of functions whose elements are twice continuously diﬀerentiable and exponentially convex on (0, +∞).

By Lemma 3.1, clearly, if h ∈ Ω, then the function t

0h(θ)dθ necessarily satisﬁes Condition C.3. For example, ln t belongs to Ω, and so t

0ln θdθ = t ln t satisﬁes Condition C.3.

For the characterizations of the SOC-concavity, interested readers may refer to [7, 9]. Here, we present a lemma which states that the composition of two SOC- concave functions is SOC-concave under some conditions. By this lemma, we may conveniently obtain some new SOC-concave functions from the existing ones.

Lemma 3.2. Let g : J_R→ R and h : I_R→ J_R, where J_R⊆ R and I_R ⊆ R. If g is SOC-concave and SOC-monotone on J_R and h is SOC-concave on I_R, then their composition g(h(·)) is also SOC-concave on I_R. If, in addition, h is SOC-monotone on I_R, then g(h(·)) is also SOC-monotone on I_R.

Proof. For the sake of notation, let g^soc: S→ Rⁿ and h^soc: S→ S be the vector- valued functions associated with g and h, respectively, where S ⊆ Rⁿ and S ⊆ Rⁿ.

2Which means the function φ(exp(·)) : R → R is convex on R,

(9)

Deﬁneg(t) = g(h(t)). Then, for any x ∈ S, it follows from (11) and (13) that g^soc(h^soc(x)) = g^soc

h(λ1(x))u⁽¹⁾_x + h(λ2(x))u⁽²⁾_x

= g[h(λ₁(x))]u⁽¹⁾_x + g[h(λ₂(x))]u⁽²⁾_x

=g^soc(x).

(20)

We next prove thatg(t) is SOC-concave on IR. For any x, y∈ S and 0 ≤ β ≤ 1, from the SOC-concavity of h(t) it follows that

h^soc(βx + (1− β)y) _Kn βh^soc(x) + (1− β)h^soc(y).

Using the SOC-monotonicity and SOC-concavity of g, we then obtain that g^soc

h^soc(βx + (1− β)y)

_Kn g^soc

βh^soc(x) + (1− β)h^soc(y)

_Kn βg^soc[h^soc(x)] + (1− β)g^soc[h^soc(y)].

This together with (20) implies that for any x, y∈ S and 0 ≤ β ≤ 1, g^soc

βx + (1− β)y

_Kn βg^soc(x) + (1− β)g^soc(y).

Consequently, the functiong(t), i.e., g(h(·)) is SOC-concave on IR. The second part of the conclusions is obvious.

Proposition 3.1. (a) The function h(t) = t^r, with 0 ≤ r ≤ 1 is both SOC- concave and SOC-monotone on [0, +∞).

(b) h(t) =−t^−r, with 0≤ r ≤ 1 is SOC-concave and SOC-monotone on (0, +∞).

(c) For all u ≤ 0, h(t) = _u_−t¹ is SOC-concave as well as SOC-monotone on (0, +∞).

(d) The function ln t is SOC-concave and SOC-monotone on (0, +∞).

Proof. (a) The proof has been given by [7, Proposition 3.7], and we here omit it.

(b) The conclusion follows directly from [9, Corollary 4.2].

(c) Let g(t) = −1/t and h(t) = t − u. Then, h(t) = 1/(u − t) is exactly the composition of the two functions, i.e., h(t) = g(h(t)). From part (b), g(t) is SOC- monotone and SOC-concave on (0, +∞); whereas by [7, Proposition 3.1 (b)] h(t) is SOC-monotone and SOC-concave on (0, +∞). Thus, applying Lemma 3.2, we readily obtain the conclusion.

(d) The proof can be found in [9]. In view of the importance of ln t, we here present a diﬀerent proof by following the same line as [3]. Noting that

ln t =

0

−∞

1

u− t − u u²+ 1

!

du (t > 0),

we have for any x∈ int(Kⁿ) that

ln x =

0

−∞

(ue− x)⁻¹− u u²+ 1e

! du.

(21)

For any x = (x₁, x₂), y = (y₁, y₂)∈ int(Kⁿ) and any 0≤ β ≤ 1, let w = ln(βx + (1− β)y) − β ln x − (1 − β) ln y.

(10)

Then, by the deﬁnition of SOC-concavity, proving the SOC-concavity of ln t on (0, +∞) is equivalent to showing that w ∈ Kⁿ. From (21) and (11), it follows that

w =

0

−∞

"

(ue− βx − (1 − β)y)⁻¹− β(ue − x)⁻¹− (1 − β)(ue − y)⁻¹# du

=

⎛

⎜⎜

⎝

0

−∞

u− βx1− (1 − β)y1

det(ue− βx − (1 − β)y) − β(u− x1)

det(ue− x)−(1− β)(u − y1) det(ue− y)

! du

0

−∞

βx2+ (1− β)y2

det(ue− βx − (1 − β)y)− βx2

det(ue− x) − (1− β)y2

det(ue− y)

! du

⎞

⎟⎟

⎠

:=

w₁ w₂

,

where w1∈ R and w2∈ Rⁿ⁻¹. However, by Proposition 3.1 (c) and Deﬁnition 2.1,

ue− βx − (1 − β)y₋₁

− β(ue − x)⁻¹− (1 − β)(ue − y)⁻¹∈ Kⁿ, which implies that

u− βx1− (1 − β)y1

det(ue− βx − (1 − β)y) − β(u− x1)

det(ue− x)−(1− β)(u − y1) det(ue− y) ≥ 0

and **

** βx2+ (1− β)y2

det(ue− βx − (1 − β)y) − βx2

det(ue− x)− (1− β)y2

det(ue− y)

****

≤ u− βx1− (1 − β)y1

det(ue− βx − (1 − β)y)− β(u− x1)

det(ue− x) −(1− β)(u − y1) det(ue− y) . As a consequence,

w1=

0

−∞

u− βx1− (1 − β)y1

det(ue− βx − (1 − β)y)− β(u− x1)

det(ue− x) −(1− β)(u − y1) det(ue− y)

! du

≥ 0 and

w2 ≤

0

−∞

**** βx₂+ (1− β)y2

det(ue− βx − (1 − β)y)− βx₂

det(ue− x)− (1− β)y2

det(ue− y)

!****du

≤

0

−∞

u− βx1− (1 − β)y1

det(ue− βx − (1 − β)y) − β(u− x1)

det(ue− x)−(1− β)(u − y1) det(ue− y)

! du

= w1.

This shows that w ∈ Kⁿ, and consequently ln t is SOC-concave on (0, +∞). By a similar argument, we can prove that ln t is SOC-monotone on (0, +∞).

From Lemma 3.2 and Proposition 3.1, we may obtain the following corollary, which particularly shows that the modiﬁed logarithmic barrier function is SOC- concave.

Corollary 3.1. (a) The modiﬁed logarithmic barrier function ln(α+t) for α > 0 is both SOC-concave and SOC-monotone on (−α, +∞).

(b) For any α > 0 and β > 0, the functions ln(α + βt^r), with 0 ≤ r ≤ 1 are SOC-concave and SOC-monotone on [0, +∞).

(11)

(c) For any u > 0, the functions _u+t^t are SOC-concave and SOC-monotone on (0, +∞).

(d) For all u > 0, the functions √⁻¹u+t are SOC-concave and SOC-monotone on (−u, +∞).

Proof. (a) The proof is due to Proposition 3.1(d), [7, Proposition 3.1], and Lemma 3.2 by letting g : (0, +∞) → R be g(t) = ln t, and h : (−a, +∞) → (0, +∞) be h(t) = a + t.

(b) Let g : (0, +∞) → R be g(t) = ln t, and h : (0, +∞) → (0, +∞) be h(t) = a + βt^r. The result follows from Proposition 3.1(a), Proposition 3.1(d), and Lemma 3.2.

(c) Let g : (−1, 0) → (0, 1) be g(t) = 1 + t, and h : (0, +∞) → (−1, 0) be h(t) =−u/(u+t). Then, we obtain the result from Proposition 3.1(c), [7, Proposition 3.1], and Lemma 3.2. The result also extends the conclusion of [7, Proposition 3.4].

(d) Let g : (0, +∞) → (0, +∞) be g(t) =√

t, and h : (−u, +∞) → (0, +∞) be h(t) = u + t. Then, from Lemma 3.2 it follows that g(h(t)) =√

u + t is SOC-concave and SOC-monotone on (−u, +∞). Using Lemma 3.2 again with g(t) = −1/t and h(t) =√

u + t, we obtain the desired result.

Now we present serval examples of D(x, y) to close this section. From these examples, we may see that the conditions required by φ∈ Φ are not so strict, and the construction of the distance-like functions in SOCs can be completed by selecting a class of single variate convex functions.

Example 3.1. Let φ(t) = t ln t− t + 1 if t ≥ 0, and φ(t) = +∞ if t < 0. It is easy to verify that φ satisﬁes Conditions C.1–C.3. Also, by Proposition 3.1(d), Condition C.4 also holds. From formula (13), it follows that, for any y∈ Kⁿ and x∈ int(Kⁿ),

φ^soc(y) = y◦ ln y − y + e and (φ)^soc(x) = ln x.

Consequently, the distance-like function induced by φ is given by

D1(x, y) = tr (y◦ ln y − y ◦ ln x + x − y) ∀x ∈ int(Kⁿ), y∈ Kⁿ.

This function is precisely the natural extension of the entropy-like distance dϕ(·, ·), with ϕ(t) = − ln t + t − 1 to the SOCs. In addition, comparing D1(x, y) with the distance-like function H(x, y) in Example 3.1 of [26], we note that D₁(x, y) = H(y, x), but the proximal-like algorithms corresponding to them are completely diﬀerent.

Example 3.2. Let φ(t) = t ln t + (1 + t) ln(1 + t)− (1 + t) ln 2 if t ≥ 0, and φ(t) = +∞ if t < 0. By computing, we can show that φ satisﬁes Conditions C.1–

C.3. Furthermore, from Proposition 3.1(d) and Corollary 3.1(a), we learn that φ also satisﬁes Condition C.4. This means that φ∈ Φ. For any y ∈ Kⁿ and x∈ int(Kⁿ), we can compute that

φ^soc(y) = y◦ ln y + (e + y) ◦ ln(e + y) − ln 2(e + y), (φ)^soc(x) = (2− ln 2)e + ln x + ln(e + x).

Therefore, the distance-like function generated by such a φ is given by D₂(x, y) = tr

− ln(e + x) ◦ (e + y) + y ◦ (ln y − ln x) + (e + y) ◦ ln(e + y) − 2(y − x)

for any x ∈ int(Kⁿ) and y ∈ Kⁿ. It should be pointed out that D2(x, y) is not the extension of dϕ(·, ·), with ϕ(t) = φ(t) given by [18] to the SOCs.

Example 3.3. Take φ(t) = t^2r+3² + t², with 0≤ r < ¹₂ if t≥ 0, and φ(t) = +∞

if t < 0. It is easy to verify that φ satisﬁes Conditions C.1–C.3. Furthermore, from

(12)

Proposition 3.1(a) it follows that φ satisﬁes Condition C.4. Thus, φ∈ Φ. By a simple computation,

φ^soc(y) = y^2r+3² + y² ∀y ∈ Kⁿ and (φ)^soc(x) = 2r + 3

2 x^2r+1² + 2x ∀x ∈ int(Kⁿ).

Hence, the distance-like function induced by φ has the following expression:

D3(x, y) = tr 2r + 1

2 x^2r+3² + x²− y ◦2r + 3

2 x^2r+1² + 2x

+ y^2r+3² + y²

! .

Example 3.4. Let φ(t) = t^a+1+at ln t−at, with 0 < a ≤ 1 if t ≥ 0, and φ(t) = +∞

if t < 0. It is easily shown that φ satisﬁes Conditions C.1–C.3. By Proposition 3.1(a) and Proposition 3.1(d), φis SOC-concave on (0, +∞). Hence, φ ∈ Φ. For any y ∈ Kⁿ and x∈ int(Kⁿ),

φ^soc(y) = y^a+1+ ay◦ ln y − ay and (φ)^soc(x) = (a + 1)x^a+ a ln x.

Consequently, the distance-like function induced by φ has the following expression:

D4(x, y) = tr

ax^a+1+ ax− y ◦

(a + 1)x^a+ a ln x

+ yâ+1+ ay◦ ln y − ay . 4. Properties of distance-like functions. In what follows, we study some favorable properties of the function D(x, y). We begin with two technical lemmas that will be used in the subsequent analysis. Among others, the first lemma is a direct consequence of Lemma 2.2 and the definition of Φ.

Lemma 4.1. Given a φ∈ Φ, let φ^soc and (φ)^soc be the vector-valued functions given as in (13). Then, we have the following results:

(a) φ^soc(x) and (φ)^soc(x) are well-deﬁned on Kⁿ and int(Kⁿ), respectively, and λi[φ^soc(x)] = φ[λi(x)], λi[(φ)^soc(x)] = φ[λi(x)], i = 1, 2.

(b) φ^soc(x) and (φ)^soc(x) are continuously diﬀerentiable on int(Kⁿ), with the transposed Jacobian at x given as in formulas (14)–(15).

(c) tr[φ^soc(x)] and tr[(φ)^soc(x)] are continuously diﬀerentiable on int(Kⁿ), and

∇tr φ^soc(x)

= 2∇φ^soc(x)· e = 2(φ)^soc(x),

∇tr

(φ)^soc(x)

= 2∇(φ)^soc(x)· e = 2(φ)^soc(x).

(22)

(d) The function tr[φ^soc(x)] is strictly convex on int(Kⁿ).

Lemma 4.2. Given a φ∈ Φ and a ﬁxed point z ∈ Rⁿ, let φz : int(Kⁿ)→ R be given by

φz(x) := tr

− z ◦ (φ)^soc(x)

. (23)

Then, the function φ_z(x) possesses the following properties:

(a) φ_z(x) is continuously diﬀerentiable on int(Kⁿ), with∇φz(x) =−2∇(φ)^soc(x)· z.

(b) φ_z(x) is convex over int(Kⁿ) when z ∈ Kⁿ, and furthermore, it is strictly convex over int(Kⁿ) when z∈ int(Kⁿ).

(13)

Proof. (a) Since φz(x) = −2(φ)^soc(x), z for any x ∈ int(Kⁿ), we have that φz(x) is continuously diﬀerentiable on int(Kⁿ) by Lemma 4.1(c). Moreover, applying the chain rule for the inner product of two functions readily yields ∇φz(x) =

−2∇(φ)^soc(x)· z.

(b) By the continuous diﬀerentiability of φz(x), to prove the convexity of φz on int(Kⁿ), it suﬃces to prove the following inequality:

φz

x + y 2

≤ 1 2

φz(x) + φz(y)

∀x, y ∈ int(Kⁿ).

(24)

By Condition C.4, φ is SOC-concave on (0, +∞). Therefore, we have that

−(φ)^soc

x + y 2

_Kn −1 2

(φ)^soc(x) + (φ)^soc(y)

, i.e.,

(φ)^soc

x + y 2

−1

2(φ)^soc(x)−1

2(φ)^soc(y)_Kn 0.

Using Lemma 2.1(a) and the fact that z∈ Kⁿ, we then obtain that +

z, (φ)^soc

x + y 2

−1

2(φ)^soc(x)−1

2(φ)^soc(y) ,

≥ 0, (25)

which in turn implies that -− z, (φ)^soc

x + y 2

.≤1 2

-− z, (φ)^soc(x) .

+1 2

-− z, (φ)^soc(y) .

. The last inequality is exactly the one in (24). Hence, φz is convex on int(Kⁿ) for z∈ Kⁿ.

To prove the second part of the conclusions, we need only to prove that the inequality in (25) holds strictly for any x, y∈ int(Kⁿ) and x= y. By Lemma 2.1(b), this is also equivalent to proving the vector (φ)^soc_x+y

2

−¹₂(φ)^soc(x)−¹₂(φ)^soc(y) is nonzero, since

(φ)^soc

x + y 2

−1

2(φ)^soc(x)−1

2(φ)^soc(y)∈ Kⁿ and z∈ int(Kⁿ).

From Condition C.4, it follows that φis concave on (0, +∞), since the SOC-concavity implies the concavity. This, together with the strict monotonicity of φ, implies that φ is strictly concave on (0, +∞). Using Lemma 2.2(d), we then have that tr[(φ)^soc(x)]

is strictly concave on int(Kⁿ). This means that, for any x, y∈ int(Kⁿ) and x= y, tr (φ)^soc

x + y 2

!−1

2tr [(φ)^soc(x)]−1

2tr [(φ)^soc(y)] > 0.

(26)

In addition, we note that the ﬁrst element of (φ)^soc_x+y

2

−¹₂(φ)^soc(x)−¹₂(φ)^soc(y) is

φ

λ1

x+y 2

+ φ

λ2

x+y 2

2 −φ(λ₁(x)) + φ(λ₂(x))

4 −φ(λ₁(y)) + φ(λ₂(y))

4 ,

(14)

which, by Property 2.1(c), can be rewritten as 1

2tr (φ)^soc

x + y 2

!−1

4tr [(φ)^soc(x)]−1

4tr [(φ)^soc(y)] . This together with (26) shows that (φ)^soc_x+y

2

−¹₂(φ)^soc(x)−¹₂(φ)^soc(y) is nonzero for any x, y∈ int(Kⁿ) and x= y. Consequently, φzis strictly convex on int(Kⁿ).

Now we are in a position to study the properties of the distance-like function D(x, y).

Proposition 4.1. Given a φ∈ Φ, let D(x, y) be deﬁned as in (18). Then, (a) D(x, y)≥ 0 for any x ∈ int(Kⁿ) and y ∈ Kⁿ, and D(x, y) = 0 if and only if

x = y;

(b) for any ﬁxed y∈ Kⁿ, D(·, y) is continuously diﬀerentiable on int(Kⁿ), with

∇xD(x, y) = 2∇(φ)^soc(x)· (x − y);

(27)

(c) for any ﬁxed y∈ Kⁿ, the function D(·, y) is convex over int(Kⁿ), and for any ﬁxed y∈ int(Kⁿ), D(·, y) is strictly convex over int(Kⁿ);

(d) for any ﬁxed y∈ int(Kⁿ), the function D(·, y) is essentially smooth;

(e) for any ﬁxed y ∈ Kⁿ, the level sets L_D(y, γ) :={x ∈ int(Kⁿ) : D(x, y)≤ γ}

for all γ≥ 0 are bounded.

Proof. (a) By Lemma 4.1(c), for any x ∈ int(Kⁿ) and y ∈ Kⁿ, we can rewrite D(x, y) as

D(x, y) = tr[φ^soc(y)]− tr[φ^soc(x)]− ∇tr[φ^soc(x)], y− x.

Notice that tr[φ^soc(x)] is strictly convex on int(Kⁿ) by Lemma 4.1(d), and hence D(x, y)≥ 0 for any x ∈ int(Kⁿ) and y∈ Kⁿ, and D(x, y) = 0 if and only if x = y.

(b) By Lemma 4.1(b) and Lemma 4.1(c), the functions tr[φ^soc(x)] and(φ)^soc(x), x are continuously diﬀerentiable on int(Kⁿ). Noting that, for any x ∈ int(Kⁿ) and y∈ Kⁿ,

D(x, y) = tr[φ^soc(y)]− tr[φ^soc(x)]− 2(φ)^soc(x), y− x;

we then have the continuous diﬀerentiability of D(·, y) on int(Kⁿ). Furthermore,

∇xD(x, y) =−∇tr[φ^soc(x)]− 2∇(φ)^soc(x)· (y − x) + 2(φ)^soc(x)

=−2(φ)^soc(x) + 2∇(φ)^soc(x)· (x − y) + 2(φ)^soc(x)

= 2∇(φ)^soc(x)· (x − y).

(c) By the deﬁnition of φ_z given as in (23), D(x, y) can be rewritten as D(x, y) = tr[(φ)^soc(x)◦ x − φ^soc(x)] + φy(x) + tr[φ^soc(y)].

Thus, to prove the (strict) convexity of D(·, y) on int(Kⁿ), it suﬃces to show that tr[(φ)^soc(x)◦ x − φ^soc(x)] + φy(x)

is (strictly) convex on int(Kⁿ). Let ψ : (0, +∞) → R be the function deﬁned by ψ(t) := φ(t)t− φ(t).

(28)