Journal of Optimization Theory and Applications, vol. 138, pp. 95-113, 2008

### Proximal-like algorithm using the quasi D-function for convex second-order cone programming

Shaohua Pan^{1}

School of Mathematical Sciences South China University of Technology

Guangzhou 510640, China

Jein-Shan Chen ^{2}
Department of Mathematics
National Taiwan Normal University

Taipei 11677, Taiwan

July 28, 2006

(revised on January 30, 2007)

Abstract In this paper, we present a measure of distance in second-order cone based on
a class of continuously differentiable strictly convex function on IR_{++}. Since the distance
function has some favorable properties similar to those of D-function [8], we here refer
it as a quasi D-function. Then, a proximal-like algorithm using the quasi D-function
is proposed and applied to the second-cone programming problem which is to minimize
a closed proper convex function with general second-order cone constraints. Like the
proximal point algorithm using D-function [5, 8], we under some mild assumptions es-
tablish the global convergence of the algorithm expressed in terms of function values,
and show that the sequence generated by the proposed algorithm is bounded and every
accumulation point is a solution to the considered problem.

Key words. Quasi D-function, Bregman function, proximal-like method, convex second- order cone programming.

AMS subject classifications 90C30

1The author’s work is partially supported by the Doctoral Starting-up Foundation (B13B6050640) of GuangDong Province. E-mail:shhpan@scut.edu.cn.

2Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office.

The author’s work is partially supported by National Science Council of Taiwan. E-mail:

jschen@math.ntnu.edu.tw.

### 1 Introduction

We consider the following convex second-order cone programming (CSOCP):

*min f (ζ)*

*s.t. Aζ + b º*_{K}^{n}*0,*

*where A is an n × m matrix with n ≥ m, b ∈ IR*^{n}*, f : IR*^{m}*→ (−∞, +∞] is a closed*
*proper convex function associated with the second-order cone (SOC for short) K** ^{n}* given
by

*K** ^{n}* :=

^{n}

*(x*

_{1}

*, x*

_{2}

*) ∈ IR × IR*

^{n−1}*| kx*

_{2}

*k ≤ x*

_{1}

^{o}

*,*(1)

*and x º*

_{Kn}*0 means x ∈ K*

*. Note that a function is closed if and only if it is lower*

^{n}*semi-continuous (l.s.c. for short) and a function is proper if f (ζ) < ∞ for at least one*

*ζ ∈ IR*

^{m}*and f (ζ) > −∞ for all ζ ∈ IR*

*. The CSOCP, as an extension of the standard second-order cone programming (SOCP) (see Sec. 4), has applications in a broad range of fields from engineering, control and finance to robust optimization and combinatorial optimization; see [1, 3, 6, 16, 17] and references therein.*

^{m}Recently, the SOCP has received much attention in optimization, particularly in the
context of solutions methods. In this paper, we focus on the solution of the more general
CSOCP. Note that the CSOCP is a special class of convex programs, and therefore it
can be solved via general convex programming methods. One of these methods is the
*proximal point algorithm for minimizing a convex function f (ζ) defined on IR** ^{m}* which
replaces the problem min

*ζ∈IR*^{m}*f (ζ) by a sequence of minimization problems with strictly*
*convex objectives, generating a sequence {ζ*^{k}*} defined by*

*ζ** ^{k}* = argmin

_{ζ∈IR}*m*

(

*f (ζ) +* 1

*2µ*_{k}*kζ − ζ*^{k−1}*k*^{2}

)

*,* (2)

*where µ*_{k}*is a sequence of positive numbers and k · k denotes the Euclidean norm in IR** ^{m}*.
The method was due to Martinet [18] who introduced the above proximal minimization

*problem based on the Moreau proximal approximation [19] of f . The proximal point*algorithm was then further developed and studied by Rockafellar [21, 22]. Later, several researchers [8, 5, 10, 11, 23] proposed and investigated nonquadratic proximal point algorithm for the convex programming with nonnegative constraints, by replacing the quadratic distance in (2) with other distance-like functions. Among others, Censor and Zenios [8] replaced the method (2) by a method of the form

*ζ** ^{k}*= argmin

_{ζ∈IR}*m*

(

*f (ζ) +* 1
*µ**k*

*D(ζ, ζ** ^{k}*)

)

*,* (3)

*where D(·, ·), called D-function, is a measure of distance based on a Bregman function.*

*Recall that, given a differentiable function ϕ, it is called a Bregman function [4, 9] if it*

*satisfies the properties listed in Definition 1.1 below, and the induced D-function is given*
as follows:

*D(ζ, ξ) := ϕ(ζ) − ϕ(ξ) − h∇ϕ(ξ), ζ − ξi,* (4)
*where h·, ·i denotes the inner product in IR*^{m}*and ∇ϕ denotes the gradient of ϕ.*

*Definition 1.1 Let S ⊆ IR*^{m}*be an open set and ¯S be its closure. Then ϕ : ¯S → IR is*
*called a Bregman function with zone S if the following properties hold:*

*(i) ϕ is continuously differentiable on S;*

*(ii) ϕ is strictly convex and continuous on ¯S;*

*(iii) For each γ ∈ IR, the level set L*_{D}*(ξ, γ) = {ζ ∈ ¯S : D(ζ, ξ) ≤ γ} and L*_{D}*(ζ, γ) =*
*{ξ ∈ S : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ S and ζ ∈ ¯S, respectively;*

*(iv) If {ξ*^{k}*} ⊂ S converges to ξ*^{∗}*, then D(ξ*^{∗}*, ξ*^{k}*) → 0;*

*(v) If {ζ*^{k}*} and {ξ*^{k}*} are sequences such that ξ*^{k}*→ ξ*^{∗}*∈ ¯S, {ζ*^{k}*} is bounded and if*
*D(ζ*^{k}*, ξ*^{k}*) → 0, then ζ*^{k}*→ ξ*^{∗}*.*

The Bregman proximal minimization (BPM) method described in (3) was further ex- tended by Kiwiel [15] with generalized Bregman functions, called B-functions. Compared with Bregman functions, these functions are possibly nondifferentiable and infinite on the boundary of their domain. For the detailed definition of B-functions and the convergence of BPM method using B-functions, please refer to [15].

The main purpose of this paper is to extend the BPM method (3) so that it can be
used to deal with the CSOCP. Specifically, we define a measure of distance in second-
*order cone K** ^{n}* by a class of continuously differentiable strictly convex functions on IR

_{++}

which are in fact special B-functions in IR (see Property 3.1). The distance measure,
*including the entropy-like distance in K** ^{n}* given by [7] as a special case, is shown to have
some favorable properties similar to those for a Bregman distance, and hence we here

*refer it as a quasi Bregman distance or quasi D-function. The specific definition is given*in Section 3. Then, a proximal-like algorithm using quasi D-function is proposed and ap- plied for solving the CSOCP. Like the proximal-point algorithm (3), we establish, under some mild assumptions, the global convergence of the algorithm expressed in terms of function values, and show that the sequence generated is bounded and each accumulation point is a solution of the CSOCP.

The rest of this paper is organized as follows. In Section 2, we review some basic
concepts and properties associated with SOC. In Section 3, we define a quasi D-function
*in K** ^{n}* and explore the relations among the quasi D-function, the D-function, and the

double-regularized distance function. In Section 4, we present a proximal-like algorithm using quasi D-function and apply it for solving the CSOCP, and meanwhile, analyze the convergence of the algorithm. Finally, we close this paper in Section 5.

Some words about our notation. We use IR_{+} and IR_{++} to denote the nonnegative
*real number set and the positive real number set, respectively, and I to represent an*
*identity matrix of suitable dimension. For a differentiable function φ in IR, φ** ^{0}* represents

*its derivative. Given a set S, we use ¯S, int(S) and bd(S) to denote the closure, the*

*interior and the boundary of S, respectively. For a closed proper convex function f :*IR

^{m}*→ (−∞, +∞], we denote the domain of f by dom(f ) := {ζ ∈ IR*

^{m}*| f (ζ) < ∞} and*

*the subdifferential of f at ¯ζ by ∂f (¯ζ) := {w ∈ IR*

^{m}*| f (ζ) ≥ f (¯ζ) + hw, ζ − ¯ζi, ∀ζ ∈ IR*

^{m}*}.*

*If f is differentiable at ζ, we use ∇f (ζ) to denote its gradient at ζ. For any x, y in IR** ^{n}*,

*we write x º*

_{Kn}*y if x − y ∈ K*

^{n}*; and write x Â*

_{Kn}*y if x − y ∈ int(K*

*). In other words,*

^{n}*we have x º*

_{Kn}*0 if and only if x ∈ K*

^{n}*; and x Â*

_{Kn}*0 if and only if x ∈ int(K*

*).*

^{n}### 2 Preliminaries

*In this section, we review some basic concepts and properties related to the K** ^{n}* that will

*be used in the subsequent analysis. For any x = (x*

_{1}

*, x*

_{2}

*), y = (y*

_{1}

*, y*

_{2}

*) ∈ IR × IR*

*, we define their Jordan product as*

^{n−1}*x ◦ y := (hx, yi, y*_{1}*x*_{2}*+ x*_{1}*y*_{2}*) .* (5)
*We write x + y to mean the usual componentwise addition of vectors and x*^{2} to mean
*x ◦ x. Then ◦, + and e = (1, 0, · · · , 0)*^{T}*∈ IR** ^{n}* have the following basic properties [12, 13]:

*(1) e◦x = x for all x ∈ IR*^{n}*. (2) x◦y = y ◦x for all x, y ∈ IR*^{n}*. (3) x◦(x*^{2}*◦y) = x*^{2}*◦(x◦y)*
*for all x, y ∈ IR*^{n}*. (4) (x + y) ◦ z = x ◦ z + y ◦ z for all x, y, z ∈ IR** ^{n}*. Note that the Jordan

*product is not associative, but it is power associated, i.e., x ◦ (x ◦ x) = (x ◦ x) ◦ x for all*

*x ∈ IR*

^{n}*. Thus, we may, without fear of ambiguity, write x*

^{m}*for the product of m copies*

*of x and x*

^{m+n}*= x*

^{m}*◦ x*

^{n}*for all positive integers m and n. We define x*

^{0}

*= e. Besides,*

*we should point out that K*

*is not closed under Jordan product.*

^{n}*For each x = (x*_{1}*, x*_{2}*) ∈ IR × IR*^{n−1}*, the determinant and the trace of x are defined by*
*det(x) = x*^{2}_{1}*− kx*_{2}*k*^{2}*, tr(x) = 2x*_{1}*.* (6)
*In general, det(x ◦ y) 6= det(x) det(y) unless x and y are collinear, i.e., x = αy for some*
*α ∈ IR. A vector x = (x*_{1}*, x*_{2}*) ∈ IR × IR*^{n−1}*is said to be invertible if det(x) 6= 0. If x is*
*invertible, then there exists a unique y ∈ IR*^{n}*satisfying x ◦ y = y ◦ x = e. We call this y*
*the inverse of x and denote it by x** ^{−1}*. In fact, we have

*x** ^{−1}* = 1

*x*^{2}_{1}*− kx*_{2}*k*^{2}*(x*_{1}*, −x*_{2}) = 1

*det(x)(tr(x)e − x).*

*Therefore, x ∈ int(K*^{n}*) if and only if x*^{−1}*∈ int(K*^{n}*). Moreover, if x ∈ int(K** ^{n}*), then

*x*

^{−k}*= (x*

*)*

^{k}

^{−1}*is also well-defined. For any x ∈ K*

*, it is known that there exists a unique*

^{n}*vector in K*

^{n}*denoted by x*

^{1/2}*such that (x*

*)*

^{1/2}^{2}

*= x*

^{1/2}*◦ x*

^{1/2}*= x.*

*Next we introduce the definition of spectral factorization. Let x = (x*_{1}*, x*_{2}*) ∈ IR ×*
IR^{n−1}*, then x can be decomposed as*

*x = λ*1*(x)u*^{(1)}_{x}*+ λ*2*(x)u*^{(2)}_{x}*,* (7)
*where λ*_{i}*(x) and u*^{(i)}* _{x}* are the spectral value and the associated spectral vector given by

*λ*_{i}*(x) = x*_{1}*+ (−1)*^{i}*kx*_{2}*k,*

*u*^{(i)}* _{x}* =

1 2

Ã

*1, (−1)*^{i}*x*_{2}
*kx*2*k*

!

*if x*_{2} *6= 0;*

1 2

³*1, (−1)*^{i}*w*¯_{2}^{´} *if x*_{2} *= 0,*

(8)

*for i = 1, 2 with ¯w*_{2} being any vector in IR^{n−1}*satisfying k ¯w*_{2}*k = 1. If x*_{2} *6= 0, the factor-*
*ization is unique. In the sequel, for any x ∈ IR*^{n}*, we write λ(x) := (λ*_{1}*(x), λ*_{2}*(x)) where*
*λ*1*(x), λ*2*(x) are the spectral values of x.*

The spectral decomposition along with the Jordan algebra associated with SOC has some basic properties as below, whose proofs can be found in [12, 13].

*Property 2.1 For any x = (x*_{1}*, x*_{2}*) ∈ IR × IR*^{n−1}*with the spectral values λ*_{1}*(x), λ*_{2}*(x)*
*and spectral vectors u*^{(1)}_{x}*, u*^{(2)}_{x}*given as above, we have*

*(a) u*^{(1)}_{x}*and u*^{(2)}_{x}*are orthogonal under Jordan product and have length 1/√*
*2, i.e.,*
*u*^{(1)}_{x}*◦ u*^{(2)}_{x}*= 0, ku*^{(1)}_{x}*k = ku*^{(2)}_{x}*k = 1/√*

*2.*

*(b) u*^{(1)}_{x}*and u*^{(2)}_{x}*are idempotent under Jordan product, i.e., u*^{(i)}_{x}*◦ u*^{(i)}_{x}*= u*^{(i)}_{x}*for i = 1, 2.*

*(c) The determinant, the trace and the Euclidean norm of x can be represented by*
*λ*_{1}*(x), λ*_{2}*(x):*

*det(x) = λ*_{1}*(x)λ*_{2}*(x), tr(x) = λ*_{1}*(x) + λ*_{2}*(x), kxk*^{2} = *λ*^{2}_{1}*(x) + λ*^{2}_{2}*(x)*

2 *.*

*(d) λ*1*(x), λ*2*(x) are nonnegative (positive) if and only if x ∈ K*^{n}*(x ∈ int(K*^{n}*)).*

*Finally, for any function g : IR → IR, one can define a corresponding function g*^{soc}*(x)*
in IR^{n}*by applying g to the spectral values of the spectral decomposition of x ∈ IR** ^{n}* with

*respect to K*

*. In [3, 13], the following vector-valued function was considered:*

^{n}*g*^{soc}*(x) = g (λ*_{1}*(x)) u*^{(1)}_{x}*+ g (λ*_{2}*(x)) u*^{(2)}_{x}*, ∀x = (x*_{1}*, x*_{2}*) ∈ IR × IR*^{n−1}*.* (9)
*If g is defined only on a subset of IR, then g*^{soc} is defined on the corresponding subset of
IR^{n}*. The definition in (9) is unambiguous whether x*_{2} *6= 0 or x*_{2} = 0.

*Lemma 2.1 [13, Proposition 5.2] or [3, Proposition 4] Given a function g : IR → IR,*
*let g*^{soc}*(x) be the vector-valued function defined by (9). If g is differentiable (respectively,*
*continuously differentiable), then g*^{soc}*(x) is also differentiable (respectively, continuously*
*differentiable), and its Jacobian at x = (x*_{1}*, x*_{2}*) ∈ IR × IR*^{n−1}*is given by the formula*

*∇g*^{soc}*(x) =*

"

*g*^{0}*(x*1) 0

0 0

#

*,* *if x*_{2} *= 0,*

*b* *cx*^{T}_{2}*/kx*_{2}*k*
*c* *x*2

*kx*_{2}*k* *aI + (b − a)x*2*x*^{T}_{2}
*kx*_{2}*k*^{2}

*, if x*_{2} *6= 0,*

(10)

*where*

*a =* *g(λ*_{2}*(x)) − g(λ*_{1}*(x))*

*λ*_{2}*(x) − λ*_{1}*(x)* *, b =* *g*^{0}*(λ*_{2}*(x)) + g*^{0}*(λ*_{1}*(x))*

2 *, c =* *g*^{0}*(λ*_{2}*(x)) − g*^{0}*(λ*_{1}*(x))*

2 *.* (11)

### 3 Quasi D-functions in SOC and their properties

In this section, we present a class of distance measures on SOC and discuss its relations
with the D-function and the double-regularized Bregman distance [24]. To the end, we
*need a class of functions φ : IR*_{+}*→ IR satisfying Property 3.1 below, in which the function*
*d : IR*_{+}*× IR*_{++} *→ IR is defined by*

*d(s, t) = φ(s) − φ(t) − φ*^{0}*(t)(s − t) ∀s ∈ IR*_{+}*, t ∈ IR*_{++}*.* (12)
*Property 3.1 (a) φ is continuously differentiable on IR*_{++}*;*

*(b) φ is strictly convex and continuous on IR*_{+}*;*

*(c) For each γ ∈ IR, the level sets {s ∈ IR*_{+}*| d(s, t) ≤ γ} and {t ∈ IR*_{++}*| d(s, t) ≤ γ} are*
*bounded for any t ∈ IR*++ *and s ∈ IR*+*, respectively.*

*(d) If {t*^{k}*} ⊂ IR*++ *is a sequence such that lim**k→+∞**t*^{k}*= 0, then for all s ∈ IR*++*,*
lim_{k→+∞}*φ*^{0}*(t*^{k}*)(s − t*^{k}*) = −∞.*

*The function φ satisfying (d) is said in [14] to be boundary coercive. If setting φ(x) = +∞*

*when x /∈ IR*_{+}*, then φ becomes a closed proper strictly convex on IR. Furthermore, by [15,*
*Lemma 2.4 (d)] and Property 3.1 (c), it is not difficult to see that φ(x) and* ^{P}^{n}_{i=1}*φ(x**i*)
are a B-function on IR and IR* ^{n}*, respectively. Unless otherwise stated, in the rest of this

*paper, we always assume that φ satisfies Property 3.1.*

From the discussions in Section 2, clearly, the following vector-valued functions
*φ*^{soc}*(x) = φ (λ*_{1}*(x)) u*^{(1)}_{x}*+ φ (λ*_{2}*(x)) u*^{(2)}* _{x}* (13)

and

*(φ** ^{0}*)

^{soc}

*(x) = φ*

^{0}*(λ*1

*(x)) u*

^{(1)}

_{x}*+ φ*

^{0}*(λ*2

*(x)) u*

^{(2)}

*(14)*

_{x}*are well-defined over K*

^{n}*and int(K*

*), respectively. In view of this, we define*

^{n}*H(x, y) :=*

( tr^{h}*φ*^{soc}*(x) − φ*^{soc}*(y) − (φ** ^{0}*)

^{soc}

*(y) ◦ (x − y)*

^{i}

*∀x ∈ K*

^{n}*, y ∈ int(K*

^{n}*),*

*+∞* *otherwise.* (15)

*In what follows, we will show that the function H : IR*^{n}*× IR*^{n}*→ (−∞, +∞] enjoys*
some favorable properties similar to those of the D-function. Particularly, we prove that
*H(x, y) ≥ 0 for any x ∈ K*^{n}*, y ∈ int(K*^{n}*), and moreover, H(x, y) = 0 if and only if x = y.*

Consequently, it can be regarded as a distance measure on the SOC.

We first start with two technical lemmas that will be used in the subsequent analysis.

*Lemma 3.1 For any x = (x*_{1}*, x*_{2}*), y = (y*_{1}*, y*_{2}*) ∈ IR × IR*^{n−1}*, we have that*
*tr(x ◦ y) ≤ hλ(x), λ(y)i,*

*where λ(x) = (λ*1*(x), λ*2*(x)) and λ(y) = (λ*1*(y), λ*2*(y)), and the inequality holds with an*
*equality if and only if x*_{2} *= αy*_{2} *for some α > 0.*

Proof. From equations (5)-(6) and Cauchy-Schwartz inequality,
*tr(x ◦ y) = 2hx, yi = 2x*_{1}*y*_{1}*+ 2x*^{T}_{2}*y*_{2} *≤ 2x*_{1}*y*_{1}*+ 2kx*_{2}*k · ky*_{2}*k.*

On the other hand, from the definition of the spectral values given by (8),
*hλ(x), λ(y)i = λ*1*(x)λ*1*(y) + λ*2*(x)λ*2*(y)*

*= (x*_{1} *− kx*_{2}*k)(y*_{1}*− ky*_{2}*k) + (x*_{1}*+ kx*_{2}*k)(y*_{1}*+ ky*_{2}*k)*

*= 2x*_{1}*y*_{1}*+ 2kx*_{2}*k · ky*_{2}*k.*

From the above two sides, we immediately obtain the inequality relation. In addition,
*we note that the inequality becomes an equality if and only if x*^{T}_{2}*y*_{2} *= kx*_{2}*k · ky*_{2}*k, which*
*is equivalent to saying that x*_{2} *= αy*_{2} *for some α > 0.* *2*

*Lemma 3.2 Let φ*^{soc}*(x) and (φ** ^{0}*)

^{soc}

*(x) be given as in (13) and (14), respectively. Then,*

*(a) φ*

^{soc}

*(x) is continuously differentiable on int(K*

^{n}*) with the gradient ∇φ*

^{soc}

*(x) satisfying*

*∇φ*^{soc}*(x)e = (φ** ^{0}*)

^{soc}

*(x).*

*(b) tr[φ*^{soc}*(x)] =*^{P}^{2}_{i=1}*φ[λ**i**(x)] and tr[(φ** ^{0}*)

^{soc}

*(x)] =*

^{P}

^{2}

_{i=1}*φ*

^{0}*[λ*

*i*

*(x)].*

*(c) tr[φ*^{soc}*(x)] is continuously differentiable on int(K*^{n}*) with ∇tr[φ*^{soc}*(x)] = 2∇φ*^{soc}*(x)e.*

*(d) tr[φ*^{soc}*(x)] is strictly convex and continuous on K*^{n}*.*

*(e) If {y*^{k}*} ⊂ int(K*^{n}*) is a sequence such that lim**k→+∞**y** ^{k}*= ¯

*y ∈ bd(K*

^{n}*), then*

*k→+∞*lim *h∇tr[φ*^{soc}*(y*^{k}*)], x − y*^{k}*i = −∞ for all x ∈ int(K*^{n}*).*

*In other words, the function tr[φ*^{soc}*(x)] is boundary coercive.*

Proof. (a) The first part follows directly from Lemma 2.1. Now we prove the second
*part. If x*2 *6= 0, then by formulas (10)-(11) it is easy to compute that*

*∇φ*^{soc}*(x)e =*

*φ*^{0}*(λ*_{2}*(x)) + φ*^{0}*(λ*_{1}*(x))*
*φ*^{0}*(λ*_{2}*(x)) − φ*2^{0}*(λ*_{1}*(x))*

2

*x*_{2}
*kx*_{2}*k*

*.*

In addition, using equations (8) and (14), we can prove that the vector in the right hand
*side is exactly (φ** ^{0}*)

^{soc}

*(x). Therefore, ∇φ*

^{soc}

*(x)e = (φ*

*)*

^{0}^{soc}

*(x). If x*

_{2}= 0, then using (10)

*and (8), we can also prove that ∇φ*

^{soc}

*(x)e = (φ*

*)*

^{0}^{soc}

*(x).*

(b) The result follows directly from Property 2.1 (c) and equations (13)-(14).

*(c) From part (a) and the fact that tr[φ*^{soc}*(x)] = tr[φ*^{soc}*(x) ◦ e] = 2hφ*^{soc}*(x), ei, clearly,*
*tr[φ*^{soc}*(x)] is continuously differentiable on int(K** ^{n}*). Applying the chain rule for inner

*product of two functions immediately yields that ∇tr[φ*

^{soc}

*(x)] = 2∇φ*

^{soc}

*(x)e.*

*(d) It is clear that φ*^{soc}*(x) is continuous on K** ^{n}*. We next prove that it is strictly convex

*on K*

^{n}*. For any x, y ∈ K*

^{n}*with x 6= y and α, β ∈ (0, 1) with α + β = 1, we have that*

*λ*_{1}*(αx + βy) = αx*_{1}*+ βy*_{1}*− kαx*_{2}*+ βy*_{2}*k ≥ αλ*_{1}*(x) + βλ*_{1}*(y),*
*λ*_{2}*(αx + βy) = αx*_{1}*+ βy*_{1}*+ kαx*_{2}*+ βy*_{2}*k ≤ αλ*_{2}*(x) + βλ*_{2}*(y),*
which implies that

*αλ*1*(x) + βλ*1*(y) ≤ λ*1*(αx + βy) ≤ λ*2*(αx + βy) ≤ αλ*2*(x) + βλ*2*(y).*

On the other hand,

*λ*_{1}*(αx + βy) + λ*_{2}*(αx + βy) = 2αx*_{1}*+ 2βy*_{1} *= [αλ*_{1}*(x) + βλ*_{1}*(y)] + [αλ*_{2}*(x) + βλ*_{2}*(y)].*

*The last two equations imply that there exists ρ ∈ [0, 1] such that*

*λ*_{1}*(αx + βy) = ρ[αλ*_{1}*(x) + βλ*_{1}*(y)] + (1 − ρ)[αλ*_{2}*(x) + βλ*_{2}*(y)],*
*λ*2*(αx + βy) = (1 − ρ)[αλ*1*(x) + βλ*1*(y)] + ρ[αλ*2*(x) + βλ*2*(y)].*

Thus, from Property 2.1, it follows that

*tr[φ*^{soc}*(αx + βy)] = φ[λ*1*(αx + βy)] + φ[λ*2*(αx + βy)]*

*= φ*^{h}*ρ(αλ*_{1}*(x) + βλ*_{1}*(y)) + (1 − ρ)(αλ*_{2}*(x) + βλ*_{2}*(y))*^{i}
*+φ*^{h}*(1 − ρ)(αλ*1*(x) + βλ*1*(y)) + ρ(αλ*2*(x) + βλ*2*(y))*^{i}

*≤ ρφ(αλ*_{1}*(x) + βλ*_{1}*(y)) + (1 − ρ)φ(αλ*_{2}*(x) + βλ*_{2}*(y))*
*+(1 − ρ)φ(αλ*_{1}*(x) + βλ*_{1}*(y)) + ρφ(αλ*_{2}*(x) + βλ*_{2}*(y))*

*= φ(αλ*_{1}*(x) + βλ*_{1}*(y)) + φ(αλ*_{2}*(x) + βλ*_{2}*(y))*

*< αφ(λ*1*(x)) + βφ(λ*1*(y)) + αφ(λ*2*(x)) + βφ(λ*2*(y))*

*= αtr[φ*^{soc}*(x)] + βtr[φ*^{soc}*(y)],*

where the first equality and the last one follow from Lemma 3.1 (b), and the two inequal-
*ities are due to the strict convexity of φ on IR*_{++}. From the definition of strict convexity,
we thus prove that the conclusion holds.

(e) From part (a) and part (c), we can readily obtain the following equality

*∇tr[φ*^{soc}*(x)] = 2(φ** ^{0}*)

^{soc}

*(x),*

*∀x ∈ int(K*

^{n}*).*(16) Using the relation and Lemma 3.1, we then have that

*h∇tr[φ*^{soc}*(y*^{k}*)], x − y*^{k}*i = 2h(φ** ^{0}*)

^{soc}

*(y*

^{k}*), x − y*

^{k}*i*

*= tr[(φ** ^{0}*)

^{soc}

*(y*

^{k}*) ◦ (x − y*

*)]*

^{k}*= tr[(φ** ^{0}*)

^{soc}

*(y*

^{k}*) ◦ x] − tr[(φ*

*)*

^{0}^{soc}

*(y*

^{k}*) ◦ y*

*]*

^{k}*≤*

X2

*i=1*

*φ*^{0}*[λ*_{i}*(y*^{k}*)]λ*_{i}*(x) − tr[(φ** ^{0}*)

^{soc}

*(y*

^{k}*) ◦ y*

^{k}*].*(17)

*In addition, by Property 2.1 (a)-(b), for any y ∈ int(K** ^{n}*), we can compute that

*(φ*

*)*

^{0}^{soc}

*(y) ◦ y =*

^{h}

*φ*

^{0}*(λ*

_{1}

*(y))u*

^{(1)}

_{y}*+ φ*

^{0}*(λ*

_{2}

*(y))u*

^{(2)}

_{y}^{i}

*◦*

^{h}

*λ*

_{1}

*(y)u*

^{(1)}

_{y}*+ λ*

_{2}

*(y)u*

^{(2)}

_{y}^{i}

*= φ*^{0}*(λ*_{1}*(y))λ*_{1}*(y)u*^{(1)}_{y}*+ φ*^{0}*(λ*_{2}*(y))λ*_{2}*(y)u*^{(2)}_{y}*,* (18)
which implies that

*tr[(φ** ^{0}*)

^{soc}

*(y*

^{k}*) ◦ y*

*] =*

^{k}X2

*i=1*

*φ*^{0}*[λ**i**(y*^{k}*)]λ**i**(y*^{k}*).* (19)
Combining with (17) and (19) immediately yields that

*h∇tr[φ*^{soc}*(y*^{k}*)], x − y*^{k}*i ≤*

X2

*i=1*

*φ*^{0}*[λ*_{i}*(y*^{k}*)][λ*_{i}*(x) − λ*_{i}*(y*^{k}*)].* (20)

*Note that λ*_{2}(¯*y) ≥ λ*_{1}(¯*y) = 0 and λ*_{2}*(x) ≥ λ*_{1}*(x) > 0 since ¯y ∈ bd(K*^{n}*) and x ∈ int(K** ^{n}*).

*Hence, if λ*2(¯*y) = 0, then by Property 3.1 (d) and the continuity of λ**i**(·) for i = 1, 2,*

*k→+∞*lim *φ*^{0}*[λ**i**(y*^{k}*)][λ**i**(x) − λ**i**(y*^{k}*)] = −∞, i = 1, 2,*
which means that

*k→+∞*lim

X2

*i=1*

*φ*^{0}*[λ**i**(y*^{k}*)][λ**i**(x) − λ**i**(y*^{k}*)] = −∞.* (21)

*If λ*_{2}(¯*y) > 0, then lim*_{k→+∞}*φ*^{0}*[λ*_{2}*(y*^{k}*)][λ*_{2}*(x) − λ*_{2}*(y** ^{k}*)] is finite and

*k→+∞*lim *φ*^{0}*[λ*_{1}*(y*^{k}*)][λ*_{1}*(x) − λ*_{1}*(y*^{k}*)] = −∞,*

and therefore the result in (21) also holds under such case. Combining (21) with (20),
we prove that the conclusion holds. *2*

*Using the relation in (16), we have that for any x ∈ K*^{n}*and y ∈ int(K** ^{n}*),
tr

^{h}

*(φ*

*)*

^{0}^{soc}

*(y) ◦ (x − y)*

^{i}= 2

^{D}

*(φ*

*)*

^{0}^{soc}

*(y), x − y*

^{E}=

^{D}

*∇tr[φ*

^{soc}

*(y)], x − y*

^{E}

*.*

*As a consequence, the function H(x, y) in (15) can be rewritten as*

*H(x, y) =*

( *tr[φ*^{soc}*(x)] − tr[φ*^{soc}*(y)] − h∇tr[φ*^{soc}*(y)], x − yi ∀x ∈ K*^{n}*, y ∈ int(K*^{n}*),*

*+∞* *otherwise.* (22)

*By the representation, we next investigate several important properties of H(x, y).*

*Proposition 3.1 Let H(x, y) be the function defined as in (15) or (22). Then,*

*(a) H(x, y) is continuous on K*^{n}*× int(K*^{n}*), and for any y ∈ int(K*^{n}*), the function H(·, y)*
*is strictly convex on K*^{n}*.*

*(b) For any given y ∈ int(K*^{n}*), H(x, y) is continuously differentiable on int(K*^{n}*) with*

*∇*_{x}*H(x, y) = ∇tr[φ*^{soc}*(x)] − ∇tr[φ*^{soc}*(y)] = 2[(φ** ^{0}*)

^{soc}

*(x) − (φ*

*)*

^{0}^{soc}

*(y)].*(23)

*(c) H(x, y) ≥*

^{P}

^{2}

_{i=1}*d(λ*

*i*

*(x), λ*

*i*

*(y)) ≥ 0 for any x ∈ K*

^{n}*and y ∈ int(K*

^{n}*), where d(·, ·) is*

*defined by (12). Moreover, H(x, y) = 0 if and only if x = y.*

*(d) For every γ ∈ IR, the partial level sets of L*_{H}*(y, γ) = {x ∈ K*^{n}*: H(x, y) ≤ γ} and*
*L**H**(x, γ) = {y ∈ int(K*^{n}*) : H(x, y) ≤ γ} are bounded for any y ∈ int(K*^{n}*) and*
*x ∈ K*^{n}*, respectively.*

*(e) If {y*^{k}*} ⊂ int(K*^{n}*) is a sequence converging to y*^{∗}*∈ int(K*^{n}*), then H(y*^{∗}*, y*^{k}*) → 0.*

*(f) If {x*^{k}*} ⊂ int(K*^{n}*) and {y*^{k}*} ⊂ int(K*^{n}*) are sequences such that {y*^{k}*} → y*^{∗}*∈ int(K*^{n}*),*
*{x*^{k}*} is bounded, and H(x*^{k}*, y*^{k}*) → 0, then x*^{k}*→ y*^{∗}*.*

*Proof. (a) Note that φ*^{soc}*(x), (φ** ^{0}*)

^{soc}

*(y), (φ*

*)*

^{0}^{soc}

*(y) ◦ (x − y) are continuous for any x ∈ K*

^{n}*and y ∈ int(K*

^{n}*) and the trace function tr(·) is also continuous, and hence H(x, y) is*

*continuous on K*

^{n}*× int(K*

^{n}*). From Lemma 3.2 (d), tr[φ*

^{soc}

*(x)] is strictly convex over K*

*,*

^{n}*whereas −tr[φ*

^{soc}

*(y)] − h∇tr[φ*

^{soc}

*(y)], x − yi is clearly convex in K*

^{n}*for fixed y ∈ int(K*

*).*

^{n}*This means that H(·, y) is strictly convex for any y ∈ int(K** ^{n}*).

*(b) By Lemma 3.2 (c), the function H(·, y) for any given y ∈ int(K** ^{n}*) is continuously

*differentiable on int(K*

*). The first equality in (23) is obvious and the second is due to (16).*

^{n}(c) The result follows directly from the following equalities and inequalities:

*H(x, y) = tr[φ*^{soc}*(x)] − tr[φ*^{soc}*(y)] − tr[(φ** ^{0}*)

^{soc}

*(y) ◦ (x − y)]*

*= tr[φ*^{soc}*(x)] − tr[φ*^{soc}*(y)] − tr[(φ** ^{0}*)

^{soc}

*(y) ◦ x] + tr[(φ*

*)*

^{0}^{soc}

*(y) ◦ y]*

*≥ tr[φ*^{soc}*(x)] − tr[φ*^{soc}*(y)] −*

X2

*i=1*

*φ*^{0}*(λ*_{i}*(y))λ*_{i}*(x) + tr[(φ** ^{0}*)

^{soc}

*(y) ◦ y]*

=

X2

*i=1*

h*φ(λ*_{i}*(x)) + φ(λ*_{i}*(y)) − φ*^{0}*(λ*_{i}*(y))λ*_{i}*(x) + φ*^{0}*(λ*_{i}*(y))λ*_{i}*(y)*^{i}

=

X2

*i=1*

h*φ(λ*_{i}*(x)) − φ(λ*_{i}*(y)) − φ*^{0}*(λ*_{i}*(y))(λ*_{i}*(x) − λ*_{i}*(y))*^{i}

=

X2

*i=1*

*d(λ*_{i}*(x), λ*_{i}*(y)) ≥ 0,*

where the first equality is due to (15), the second and fourth are obvious, the third follows
from Lemma 3.2 (b) and (18), the last one is from (12), and the first inequality follows
*from Lemma 3.1 and the last one is due to the strict convexity of φ on IR*_{+}. Note that
*tr[φ*^{soc}*(x)] is strictly convex for any x ∈ K*^{n}*by Lemma 3.2 (d), and therefore H(x, y) = 0*
*if and only if x = y by (22).*

*(d) From part (c), we have that L*_{H}*(y, γ) ⊆ {x ∈ K*^{n}*|* ^{P}^{2}_{i=1}*d(λ*_{i}*(x), λ*_{i}*(y)) ≤ γ}. By*
*Property 3.1 (c), the set in the right hand side is bounded. So, L*_{H}*(y, γ) is bounded for*
*y ∈ int(K*^{n}*). Similarly, L**H**(x, γ) is bounded for x ∈ K** ^{n}*.

From part (a)-(d), we immediately obtain the results in (e) and (f). *2*

*Remark 3.1 (i) From (22), it is not difficult to see that H(x, y) is exactly a distance*
*measure induced by tr[φ*^{soc}*(x)] via formula (4). Therefore, if n = 1 and φ is a*
*Bregman function with zone IR*_{++}*, i.e., φ also satisfies the property:*

*(e) if {s*^{k}*} ⊆ IR*_{+} *and {t*^{k}*} ⊂ IR*_{++} *are sequences such that t*^{k}*→ t*^{∗}*, {s*^{k}*} is*
*bounded, and d(s*^{k}*, t*^{k}*) → 0, then s*^{k}*→ t*^{∗}*;*

*then H(x, y) reduces to the Bregman distance function d(x, y) in (12).*

*(ii) When n > 1, H(x, y) is generally not a Bregman distance even if φ is a Bregman*
*function with zone IR*_{++}*, by noting that Proposition 3.1 (e) and (f) do not hold for*
*{x*^{k}*} ⊆ bd(K*^{n}*) and y*^{∗}*∈ bd(K*^{n}*). By the proof of Proposition 3.1 (c), the main*
*reason is that in order to guarantee that*

*tr[(φ** ^{0}*)

^{soc}

*(y) ◦ x] =*

X2

*i=1*

*φ*^{0}*(λ*_{i}*(y))λ*_{i}*(x)*

*for any x ∈ K*^{n}*and y ∈ int(K*^{n}*), the relation [(φ** ^{0}*)

^{soc}

*(y)]*

_{2}

*= αx*

_{2}

*with some α > 0*

*is required, where [(φ*

*)*

^{0}^{soc}

*(y)]*2

*is a vector composed of the last n − 1 elements of*

*(φ*

*)*

^{0}^{soc}

*(y). It is very stringent for φ to satisfy such relation. By this, tr[φ*

^{soc}

*(x)] is*

*not a B-function [15] on IR*

^{n}*, either, even if φ itself is a B-function.*

*(ii) We observe that H(x, y) is inseparable, whereas the double-regularized distance func-*
*tion proposed by [24] belongs to the separable class of functions. In view of this,*
*H(x, y) can not become a double-regularized distance function in K*^{n}*× int(K*^{n}*),*
*even when φ is such that ˜d(s, t) = d(s, t)/φ*^{00}*(t) +* ^{µ}_{2}*(s − t)*^{2} *is a double regularized*
*component (see [24]).*

*In view of Proposition 3.1 and Remark 3.1, we call H(x, y) a quasi D-function in this*
paper. In the following, we present several specific examples of quasi D-functions.

*Example 3.1. Let φ(t) = t ln t − t (with the convention 0 ln 0 = 0). It is easy to verify*
*that φ satisfies Property 3.1. By [13, Proposition 3.2 (b)] and (13)-(14), we can compute*
*that for any x ∈ K*^{n}*and y ∈ int(K** ^{n}*),

*φ*^{soc}*(x) = x ◦ ln x − x and (φ** ^{0}*)

^{soc}

*(y) = ln y.*

Therefore,

*H(x, y) =*

( *tr(x ◦ ln x − x ◦ ln y + y − x) ∀x ∈ K*^{n}*, y ∈ int(K*^{n}*),*

*+∞* *otherwise.*

Using the entropy-like distance, Chen [7] proposed a proximal-like algorithm for solving
*a special case of the CSOCP with A = I and b = 0.*

*Example 3.2. Let φ(t) = t*^{2}*−√*

*t. It is not hard to verify that φ satisfies Property 3.1.*

*From [3, 13], we have that for any x ∈ K** ^{n}*,

*x*^{2} *= x ◦ x = λ*^{2}_{1}*(x)u*^{(1)}_{x}*+ λ*^{2}_{2}*(x)u*^{(2)}* _{x}* and

*√*

*x =*^{q}*λ*_{1}*(x)u*^{(1)}* _{x}* +

^{q}

*λ*

_{2}

*(x)u*

^{(2)}

_{x}*.*

*By a direct computation, we then obtain for any x ∈ K*^{n}*and y ∈ int(K** ^{n}*),

*φ*

^{soc}

*(x) = x ◦ x −√*

*x and (φ** ^{0}*)

^{soc}

*(y) = 2y −*tr(

*√*

*y)e −√*
*y*
2^{q}*det(y)* *.*
This implies that

*H(x, y) =*

tr

*(x − y)*^{2}*− (√*
*x −√*

*y) +*(tr(*√*

*y)e −√*

*y) ◦ (x − y)*
2^{q}*det(y)*

*∀x ∈ K*^{n}*, y ∈ int(K*^{n}*),*

*+∞* *otherwise.*

*Example 3.3. Take φ(t) = t ln t − (1 + t) ln(1 + t) + (1 + t) ln 2 (with the convention*
*0 ln 0 = 0). It is easily shown that φ satisfies Property 3.1. Using Property 2.1 (a)-(b),*
*we can compute that for any x ∈ K*^{n}*and y ∈ int(K** ^{n}*),

*φ*^{soc}*(x) = x ◦ ln x − (e + x) ◦ ln(e + x) + (e + x) ln 2*
and

*(φ** ^{0}*)

^{soc}

*(y) = ln y − ln(e + y) + e ln 2.*

Consequently,
*H(x, y) =*

( tr^{h}*x ◦ (ln x −ln y) −(e + x) ◦ (ln(e +x) − ln(e +y))*^{i} *∀x ∈ K*^{n}*, y ∈ int(K*^{n}*),*

*+∞* *otherwise.*

In addition, from [14, 23], it follows that^{P}^{m}_{i=1}*φ(ζ*_{i}*) generated by φ in the above examples*
*is a Bregman function with zone S = IR*^{m}_{+}, and consequently ^{P}^{m}_{i=1}*d(ζ*_{i}*, ξ** _{i}*) defined as in
(12) is a D-function induced by

^{P}

^{m}

_{i=1}*φ(ζ*

*).*

_{i}*To close this section, we present another important property of H(x, y).*

*Proposition 3.2 Let H(x, y) be defined as in (15) or (22). Then, for all x, y ∈ int(K** ^{n}*)

*and z ∈ K*

^{n}*, the following three-points identity holds:*

*H(z, x) + H(x, y) − H(z, y) =* ^{D}*∇tr[φ*^{soc}*(y)] − ∇tr[φ*^{soc}*(x)], z − x*^{E}

= tr^{h³}*(φ** ^{0}*)

^{soc}

*(y) − (φ*

*)*

^{0}^{soc}

*(x)*

^{´}

*◦ (z − x)*

^{i}

*.*

*Proof. Using the definition of H given as in (22), we have that*

D*∇tr[φ*^{soc}*(x)], z − x*^{E} *= tr[φ*^{soc}*(z)] − tr[φ*^{soc}*(x)] − H(z, x),*

D*∇tr[φ*^{soc}*(y)], x − y*^{E} *= tr[φ*^{soc}*(x)] − tr[φ*^{soc}*(y)] − H(x, y),*

D*∇tr[φ*^{soc}*(y)], z − y*^{E} *= tr[φ*^{soc}*(z)] − tr[φ*^{soc}*(y)] − H(z, y).*

Subtracting the first two equations from the last one gives the first equality. By (16),

D*∇tr[φ*^{soc}*(y)] − ∇tr[φ*^{soc}*(x)], z − x*^{E}= 2^{D}*(φ** ^{0}*)

^{soc}

*(y) − (φ*

*)*

^{0}^{soc}

*(x), z − y*

^{E}

*.*

*This together with the fact that tr(x ◦ y) = hx, yi leads to the second equality.*

*2*

### 4 Proximal-like algorithm for the CSOCP

In this section, we propose a proximal-like algorithm for solving the CSOCP based on
*the quasi D-function H(x, y). For the sake of notation, we denote F by the set*

*F =* ^{n}*ζ ∈ IR*^{m}*| Aζ + b º** _{Kn}* 0

^{o}

*.*(24)

*It is easy to verify that F is convex and its interior int(F) is given by*

*int(F) =*^{n}*ζ ∈ IR*^{m}*| Aζ + b Â** _{Kn}* 0

^{o}

*.*(25)

*Let ψ : IR*

^{m}*→ (−∞, +∞] be the function defined by*

*ψ(ζ) =*

( *tr[φ*^{soc}*(Aζ + b)] if ζ ∈ F,*

*+∞* *otherwise.* (26)

*By Lemma 3.2, it is easily shown that the following conclusions hold for ψ(ζ).*

*Lemma 4.1 Let ψ(ζ) be given as in (26). If the matrix A has full rank m, then*
*(a) ψ(ζ) is continuously differentiable on int(F) with ∇ψ(ζ) = 2A*^{T}*(φ** ^{0}*)

^{soc}

*(Aζ + b).*

*(d) ψ(ζ) is strictly convex and continuous on F.*

*(c) ψ(ζ) is boundary coercive, i.e., if {ξ*^{k}*} ⊂ int(F) is such that lim*_{k→+∞}*ξ*^{k}*= ξ ∈*
*bd(F), then for all ζ ∈ int(F), there holds that lim**k→+∞**∇ψ(ξ** ^{k}*)

^{T}*(ζ − ξ*

^{k}*) = −∞.*

*Let D(ζ, ξ) be the function induced by the above ψ(ζ) via formula (4), i.e.,*

*D(ζ, ξ) = ψ(ζ) − ψ(ξ) − h∇ψ(ξ), ζ − ξi.* (27)
Then, from (26) and (22), it is not difficult to see that

*D(ζ, ξ) = H(Aζ + b, Aξ + b).* (28)

So, by Proposition 3.1 and Lemma 4.1, we can prove the following conclusions.

*Lemma 4.2 Let D(ζ, ξ) be given by (27) or (28). If the matrix A has full rank m, then*
*(a) D(ζ, ξ) is continuous on F × int(F), and for any given ξ ∈ int(F), the function*

*D(·, ξ) is strictly convex on F.*

*(b) For any fixed ξ ∈ int(F), D(·, ξ) is continuously differentiable on int(F) with*

*∇*_{ζ}*D(ζ, ξ) = ∇ψ(ζ) − ∇ψ(ξ) = 2A*^{T}^{h}*(φ** ^{0}*)

^{soc}

*(Aζ + b) − (φ*

*)*

^{0}^{soc}

*(Aξ + b)*

^{i}

*.*

*(c) D(ζ, ξ) ≥* ^{P}^{2}_{i=1}*d(λ*_{i}*(Aζ + b), λ*_{i}*(Aξ + b)) ≥ 0 for any ζ ∈ F and ξ ∈ int(F), where*
*d(·, ·) is defined by (12). Moreover, D(ζ, ξ) = 0 if and only if ζ = ξ.*

*(d) For each γ ∈ IR, the partial level sets of L*_{D}*(ξ, γ) = {ζ ∈ F : D(ζ, ξ) ≤ γ} and*
*L**D**(ζ, γ) = {ξ ∈ int(F) : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ int(F) and ζ ∈ F,*
*respectively.*

The proximal-like algorithm that we propose for the CSOCP is defined as follows:

*ζ*^{0} *∈ int(F),* (29)

*ζ** ^{k}* = argmin

*ζ∈F*

n*f (ζ) + (1/µ*_{k}*)D(ζ, ζ** ^{k−1}*)

^{o}

*(k ≥ 1),*(30)

*where {µ*_{k}*}** _{k≥1}* is a sequence of positive numbers. To establish the convergence of the
algorithm, we make the following assumptions for the CSOCP:

(A1) inf^{n}*f (ζ) | ζ ∈ F*^{o}*:= f*_{∗}*> −∞ and dom(f ) ∩ int(F) 6= ∅.*

*(A2) The matrix A is of maximal rank m.*

*Remark 4.1 Assumption (A1) is elementary for the solution of the CSOCP. Assump-*
*tion (A2) is common in the solution of SOCPs and it is obviously satisfied when F = K*^{n}*.*
*Moreover, if we consider the standard SOCP*

*min c*^{T}*x*

*s.t. Ax = b,* *x ∈ K*^{n}*,* (31)

*where A ∈ IR*^{m×n}*with m ≤ n, b ∈ IR*^{m}*, and c ∈ IR*^{n}*, the assumption that A has full row*
*rank m is standard. Consequently, its dual problem, given by*

*max b*^{T}*y*

*s.t. c − A*^{T}*y º*_{Kn}*0,* (32)

*satisfies assumption (A2). This shows that we can solve the SOCP by applying the*
*proximal-like algorithm in (29)-(30) to the dual problem (32).*

In what follows, we are ready to prove the convergence of the proximal-like algorithm in (29)-(30) under assumptions (A1) and (A2). We first show the algorithm is well- defined.

*Proposition 4.1 Suppose that assumptions (A1)-(A2) hold. Then, the algorithm de-*
*scribed by (29)-(30) generates a sequence {ζ*^{k}*} ⊂ int(F) such that*

*−2µ*^{−1}_{k}*A*^{T}^{h}*(φ** ^{0}*)

^{soc}

*(Aζ*

^{k}*+ b) − (φ*

*)*

^{0}^{soc}

*(Aζ*

^{k−1}*+ b)*

^{i}

*∈ ∂f (ζ*

^{k}*).*(33)

*Proof. The proof proceeds by induction. For k = 0, it clearly holds. Assume that*
*ζ*^{k−1}*∈ int(F). Let f**k**(ζ) := f (ζ) + µ*^{−1}_{k}*D(ζ, ζ** ^{k−1}*). Then assumption (A1) and Lemma

*4.2 (d) imply that f*

_{k}*has bounded level sets in F. By the lower semi-continuity of f and*Lemma 4.2 (a), the minimization problem min

_{ζ∈F}*f*

_{k}*(ζ), i.e. the subproblem (30), has*

*solutions. Moreover, the solution ζ*

^{k}*is unique due to the convexity of f and the strict*

*convexity of D(·, ξ). In the following, we prove that ζ*

^{k}*∈ int(F).*

*By [20, Theorem 23.8] and the definition of D(ζ, ξ) given by (27), we can verify that*
*ζ*^{k}*is the only ζ ∈ dom(f ) ∩ F such that*

*2µ*^{−1}_{k}*A*^{T}*(φ** ^{0}*)

^{soc}

*(Aζ*

^{k−1}*+ b) ∈ ∂*

^{³}

*f (ζ) + µ*

^{−1}

_{k}*ψ(ζ) + δ(ζ|F)*

^{´}

*,*(34)

*where δ(ζ|F) = 0 if ζ ∈ F and +∞ otherwise. We will show that*

*∂*^{³}*f (ζ) + µ*^{−1}_{k}*ψ(ζ) + δ(ζ|F)*^{´}*= ∅ for all ζ ∈ bd(F),* (35)
*which by (34) implies that ζ*^{k}*∈ int(F). Take ζ ∈ bd(F) and assume that there exists*
*w ∈ ∂*^{³}*f (ζ) + µ*^{−1}_{k}*ψ(ζ)*^{´}. Take *ζ ∈ dom(f ) ∩ int(F) and let*^{b}

*ζ*^{l}*= (1 − ²**l**)ζ + ²**l**ζ*b (36)

with lim*l→+∞**²**l* *= 0. From the convexity of int(F) and dom(f ), it then follows that*
*ζ*^{l}*∈ dom(f ) ∩ int(F), and moreover, lim*_{l→+∞}*ζ*^{l}*= ζ. Consequently,*

*²**l**w** ^{T}*(

*ζ − ζ) = w*

^{b}

^{T}*(ζ*

^{l}*− ζ)*

*≤ f (ζ*^{l}*) − f (ζ) + µ*^{−1}_{k}^{h}*ψ(ζ*^{l}*) − ψ(ζ)*^{i}

*≤ f (ζ*^{l}*) − f (ζ) + µ*^{−1}_{k}^{D}*2A*^{T}*(φ** ^{0}*)

^{soc}

*(Aζ*

^{l}*+ b), ζ*

^{l}*− ζ*

^{E}

*≤ ²**l**(f (ζ) − f (ζ)) + µ*^{b} ^{−1}_{k}*²*_{l}

*1 − ²** _{l}*tr

^{h}

*(φ*

*)*

^{0}^{soc}

*(Aζ*

^{l}*+ b) ◦ (Aζ − Aζ*

^{b}

*)*

^{l}^{i}where the first equality is due to (36), the first inequality follows from the definition of

*subdifferential and the convexity of f (ζ) + µ*

^{−1}

_{k}*ψ(ζ) in F, the second one is due to the*

*convexity and differentiability of ψ(ζ) in int(F), and the last one is from (36) and the*

*convexity of f . Using Lemma 3.1 and (18), we then have that*

*µ*_{k}*(1 − ²*_{l}*)[f (ζ) − f (ζ) + w*^{b} * ^{T}*(

*ζ − ζ)]*

^{b}

*≤ tr*^{h}*(φ** ^{0}*)

^{soc}

*(Aζ*

^{l}*+ b) ◦ (Aζ + b)*

^{b}

^{i}

*− tr*

^{h}

*(φ*

*)*

^{0}^{soc}

*(Aζ*

^{l}*+ b) ◦ (Aζ*

^{l}*+ b)*

^{i}

*≤*

X2

*i=1*

h*φ*^{0}*(λ*_{i}*(Aζ*^{l}*+ b))λ*_{i}*(Aζ + b) − φ*^{b} ^{0}*(λ*_{i}*(Aζ*^{l}*+ b))λ*_{i}*(Aζ*^{l}*+ b)*^{i}

=

X2

*i=1*

*φ*^{0}*(λ*_{i}*(Aζ*^{l}*+ b))*^{h}*λ*_{i}*(Aζ + b) − λ*^{b} _{i}*(Aζ*^{l}*+ b)*^{i}*.*

*Since ζ ∈ bd(F), i.e., Aζ + b ∈ bd(K** ^{n}*), it follows that lim

_{l→+∞}*λ*

_{1}

*(Aζ*

^{l}*+ b) = 0. Thus,*using Property 3.1 (d) and following the same line as the proof of Lemma 3.2 (d), we