• 沒有找到結果。

Proximal-Like Algorithm Using the Quasi D-Function for Convex Second-Order Cone Programming

N/A
N/A
Protected

Academic year: 2022

Share "Proximal-Like Algorithm Using the Quasi D-Function for Convex Second-Order Cone Programming"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

DOI 10.1007/s10957-008-9380-8

Proximal-Like Algorithm Using the Quasi D-Function for Convex Second-Order Cone Programming

S.H. Pan· J.S. Chen

Published online: 12 April 2008

© Springer Science+Business Media, LLC 2008

Abstract In this paper, we present a measure of distance in a second-order cone based on a class of continuously differentiable strictly convex functions on R++. Since the distance function has some favorable properties similar to those of the D-function (Censor and Zenios in J. Optim. Theory Appl. 73:451–464 1992), we refer to it as a quasi D-function. Then, a proximal-like algorithm using the quasi D-function is proposed and applied to the second-cone programming problem, which is to minimize a closed proper convex function with general second-order cone con- straints. Like the proximal point algorithm using the D-function (Censor and Zenios in J. Optim. Theory Appl. 73:451–4641992; Chen and Teboulle in SIAM J. Optim.

3:538–5431993), under some mild assumptions we establish the global convergence of the algorithm expressed in terms of function values; we show that the sequence generated by the proposed algorithm is bounded and that every accumulation point is a solution to the considered problem.

Keywords Bregman functions· Quasi D-functions · Proximal-like methods · Convex second-order cone programming

Communicated by M. Fukushima.

Research of Shaohua Pan was partially supported by the Doctoral Starting-up Foundation (B13B6050640) of GuangDong Province.

Jein-Shan Chen is a member of the Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work was partially supported by National Science Council of Taiwan.

S.H. Pan

School of Mathematical Sciences, South China University of Technology, Guangzhou 510640, China e-mail:[email protected]

J.S. Chen (



)

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail:[email protected]

(2)

1 Introduction

We consider the following convex second-order cone programming (CSOCP):

min f (ζ ),

s.t. Aζ+ b Kn0,

where A is an n× m matrix with n ≥ m, b is a vector in Rn, f : Rm→ (−∞, +∞]

is a closed proper convex function,Knis a second-order cone (SOC for short) inRn given by

Kn:= {(x1, x2)∈ R × Rn−1| x2 ≤ x1}, (1) and xKn 0 means that x∈ Kn. Note that a function is closed if and only if it is lower semi-continuous (l.s.c. for short) and a function is proper if f (ζ ) <+∞ for at least one ζ ∈ Rmand f (ζ ) >−∞ for all ζ ∈ Rm. The CSOCP, as an extension of the standard second-order cone programming (SOCP) (see Sect.4), has applications in a broad range of fields from engineering, control and finance to robust optimization and combinatorial optimization; see [3–7, and references therein].

Recently, the SOCP has received much attention in optimization, particularly in the context of solutions methods. In this paper, we focus on the solution of the more general CSOCP. Note that the CSOCP is a special class of convex programs, and therefore it can be solved via general convex programming methods. One of these methods is the proximal point algorithm for minimizing a convex function f (ζ ) de- fined onRm, which replaces the problem minζ∈Rmf (ζ )by a sequence of minimiza- tion problems with strictly convex objectives and generates a sequencek} by

ζk= argmin

ζ∈Rm {f (ζ ) + (1/(2μk))ζ − ζk−12}, (2) where μk is a sequence of positive numbers and ·  denotes the Euclidean norm in Rm. The method was due to Martinet [8] who introduced the above proximal mini- mization problem based on the Moreau proximal approximation [9] of f . The proxi- mal point algorithm was then further developed and studied by Rockafellar [10,11].

Later, several researchers [1,2,12–14] proposed and investigated nonquadratic prox- imal point algorithm for the convex programming with nonnegative constraints, by replacing the quadratic distance in (2) with other distance-like functions. Among oth- ers, Censor and Zenios [1] replaced the method (2) by a method of the form

ζk= argmin

ζ∈Rm {f (ζ ) + (1/μk)D(ζ, ζk)}, (3) where D(·, ·), called the D-function, is a measure of distance based on a Bregman function.

Recall that, given an open convex set S ofRm, a convex real function g defined on the closure of S, is called a Bregman function [15–17] if it satisfies the properties listed in Definition1.1below; the induced D-function is given by

Dϕ(ζ, ξ ):= ϕ(ζ ) − ϕ(ξ) − ∇ϕ(ξ), ζ − ξ , (4) where ·, · denotes the inner product in Rmand∇ϕ denotes the gradient of ϕ.

(3)

Definition 1.1 Let S⊆ Rmbe an open set and let ¯Sbe its closure. Then, ϕ: ¯S → R is called a Bregman function with zone S if the following properties hold:

(i) ϕ is continuously differentiable on S.

(ii) ϕ is strictly convex and continuous on ¯S.

(iii) For each γ ∈ R, the partial level sets L1(ξ, γ ):= {ζ ∈ ¯S | Dϕ(ζ, ξ )≤ γ } and L2(ζ, γ ):= {ξ ∈ S | Dϕ(ζ, ξ )≤ γ } are bounded for any ξ ∈ S and ζ ∈ ¯S.

(iv) Ifk} ⊂ S converges to ξ, then Dϕ, ξk)converges to 0.

(v) Ifk} and {ξk} are sequences such that ξk→ ξ∈ ¯S, {ζk} is bounded and, if Dϕk, ξk)→ 0, then ζk→ ξ.

The Bregman proximal minimization (BPM) method described as in (3) was further extended by Kiwiel [18] with generalized Bregman functions, called B- functions. Compared with Bregman functions, these functions are possibly nondif- ferentiable and infinite on the boundary of their domain. For the detailed definition of B-functions and the convergence of BPM method using B-functions, please refer to [18].

The main purpose of this paper is to extend the BPM method (3) so that it can be used to deal with the CSOCP. Specifically, we define a measure of distance in second- order coneKn by a class of continuously differentiable strictly convex functions on R++which are in fact special B-functions inR (see Property3.1). The distance mea- sure, including the entropy-like distance in Kn as a special case, is shown to have some favorable properties similar to those for a Bregman distance, and hence we here refer it as a quasi Bregman distance or quasi D-function. The specific definition is given in Sect.3. Then, a proximal-like algorithm using quasi D-function is pro- posed and applied for solving the CSOCP. Like the proximal-point algorithm (3), we establish, under some mild assumptions, the global convergence of the algorithm ex- pressed in terms of function values, and show that the sequence generated is bounded and each accumulation point is a solution of the CSOCP.

The rest of this paper is organized as follows. In Sect. 2, we review some basic concepts and properties associated with SOC. In Sect. 3, we define a quasi D-function in Kn and explore the relations among the quasi D-function, the D- function, and the double-regularized distance function [19]. In Sect.4, we present a proximal-like algorithm using quasi D-function and apply it for solving the CSOCP, and meanwhile, analyze the convergence of the algorithm. Finally, we close this paper in Sect.5.

Some words about our notation.R+ andR++denote the nonnegative real num- ber set and the positive real number set, respectively, and I represents an identity matrix of suitable dimension. For a differentiable function φ inR, φ represents its derivative. Given a set S, we use ¯S, int(S) and bd(S) to denote the closure, the in- terior and the boundary of S, respectively. For a closed proper convex function f : Rm→ (−∞, +∞], we denote the domain of f by dom(f ) := {ζ ∈ Rm| f (ζ ) < ∞}

and the subdifferential of f at ¯ζ by ∂f ( ¯ζ ):= {w ∈ Rm| f (ζ ) ≥ f (¯ζ) + w, ζ − ¯ζ ,

∀ζ ∈ Rm}. If f is differentiable at ζ , we use ∇f (ζ ) to denote its gradient at ζ . For any x, y inRn, we write xKnyif x−y ∈ Kn; and write xKnyif x−y ∈ int(Kn).

In other words, we have xKn0 if and only if x∈ Kn; and xKn0 if and only if x∈ int(Kn).

(4)

2 Preliminaries

In this section, we review some basic concepts and properties related to the SOC Knthat will be used in the subsequent analysis. For any x= (x1, x2), y= (y1, y2)∈ R × Rn−1, we define their Jordan product as

x◦ y := ( x, y , y1x2+ x1y2). (5) We write x+ y to mean the usual componentwise addition of vectors and x2to mean x◦ x. Then ◦, + and e = (1, 0, . . . , 0)T ∈ Rnhave the following basic properties [20, 21]: (1) e◦ x = x for all x ∈ Rn. (2) x◦ y = y ◦ x for all x, y ∈ Rn. (3) x◦ (x2◦ y) = x2◦ (x ◦ y) for all x, y ∈ Rn. (4) (x+ y) ◦ z = x ◦ z + y ◦ z for all x, y, z ∈ Rn. Note that the Jordan product is not associative, but it is power associated, i.e., x◦ (x ◦ x) = (x◦ x) ◦ x for all x ∈ Rn. Thus, we may, without fear of ambiguity, write xm for the product of m copies of x and xm+n= xm◦ xnfor all positive integers m and n.

We define x0= e. Besides, we should point out that Kn is not closed under Jordan product.

For each x= (x1, x2)∈ R × Rn−1, the determinant and the trace of x are defined by

det(x)= x12− x22, tr(x)= 2x1. (6) In general, det(x◦ y) = det(x) det(y) unless x and y are collinear, i.e., x = αy for some α∈ R. A vector x = (x1, x2)∈ R × Rn−1is said to be invertible if det(x)= 0.

If x is invertible, then there exists a unique y∈ Rnsatisfying x◦ y = y ◦ x = e. We call this y the inverse of x and denote it by x−1. In fact, we have

x−1= (1/(x12− x22))(x1,−x2)= (1/ det(x))(tr(x)e − x).

Therefore, x∈ int(Kn)if and only if x−1∈ int(Kn). For any x∈ Kn, it is known that there exists a unique vector inKndenoted by x1/2such that (x1/2)2= x1/2◦x1/2= x.

Next, we introduce the definition of spectral factorization. Let x= (x1, x2)∈ R × Rn−1; then, x can be decomposed as

x= λ1(x)u(1)x + λ2(x)u(2)x , (7) where λi(x)and u(i)x are the spectral value and the associated spectral vector given by

λi(x):= x1+ (−1)ix2, u(i)x :=

(1/2)(1, (−1)ix2/x2), if x2= 0;

(1/2)(1, (−1)i ¯w2), if x2= 0,

(8)

for i= 1, 2 with ¯w2being any vector inRn−1satisfying ¯w2 = 1. If x2= 0, the fac- torization is unique. In the sequel, for any x∈ Rn, we write λ(x):= (λ1(x), λ2(x)), where λ1(x), λ2(x)are the spectral values of x.

The spectral decomposition along with the Jordan algebra associated with SOC has some basic properties as below, whose proofs can be found in [20,21].

(5)

Property 2.1 For any x= (x1, x2)∈ R × Rn−1with the spectral values λ1(x), λ2(x) and spectral vectors u(1)x , u(2)x given as above, we have:

(a) u(1)x and u(2)x are orthogonal under the Jordan product and have length 1/√ 2, i.e.,

u(1)x ◦ u(2)x = 0, u(1)x  = u(2)x  = 1/2.

(b) u(1)x and u(2)x are idempotent under the Jordan product, i.e., u(i)x ◦ u(i)x = u(i)x for i= 1, 2.

(c) The determinant, the trace and the norm of x can be represented by λ1(x), λ2(x):

det(x)= λ1(x)λ2(x), tr(x)= λ1(x)+ λ2(x),

x2= (λ21(x)+ λ22(x))/2.

(d) λ1(x), λ2(x)are nonnegative (positive) if and only if x∈ Kn(x∈ int(Kn)).

Finally, for any g: R → R, one can define a corresponding function gsoc(x)inRn by applying g to the spectral values of the spectral decomposition of x with respect toKn, i.e.,

gsoc(x)= g(λ1(x))u(1)x + g(λ2(x))u(2)x , ∀x = (x1, x2)∈ R × Rn−1. (9) If g is defined only on a subset ofR, then gsocis defined on the corresponding subset ofRn. The definition in (9) is unambiguous whether x2= 0 or x2= 0. The following lemma states some relations between the vector-valued function gsoc and the scalar function g, whose proof can be found in [6,21].

Lemma 2.1 Given a function g: R → R, let gsoc(x)be the vector-valued function defined by (9). If g is differentiable (respectively, continuously differentiable), then gsoc(x) is also differentiable (respectively, continuously differentiable), and its Ja- cobian at x= (x1, x2)∈ R × Rn−1is given by∇gsoc(x)= g(x1)I, if x2= 0, and otherwise

∇gsoc(x)=

 b cx2T/x2

cx2/x2 aI + (b − a)(x2x2T)/x22



, (10)

where

a= [g(λ2(x))− g(λ1(x))]/[λ2(x)− λ1(x)],

b= [g2(x))+ g1(x))]/2, c= [g2(x))− g1(x))]/2. (11)

3 Quasi D-Functions in SOC and Their Properties

In this section, we present a class of distance measures on SOC and discuss its rela- tions with the D-function and the double-regularized Bregman distance [19]. For this

(6)

purpose, we need a class of functions φ: R+→ R satisfying Property3.1below, in which the function d: R+× R++→ R is defined by

d(s, t )= φ(s) − φ(t) − φ(t )(s− t), ∀s ∈ R+, t∈ R++. (12) Property 3.1

(a) φ is continuously differentiable onR++. (b) φ is strictly convex and continuous onR+.

(c) For each γ∈ R, the level sets {s ∈ R+| d(s, t) ≤ γ } and {t ∈ R++| d(s, t) ≤ γ } are bounded for any t∈ R++and s∈ R+, respectively.

(d) If {tk} ⊂ R++ is a sequence such that limk→+∞tk = 0, then limk→+∞φ(tk)(s− tk)= −∞ for all s ∈ R++.

The function φ satisfying Property3.1(d) is said to be boundary coercive in [22].

If setting φ(t)= +∞ when t /∈ R+, then φ becomes a closed proper strictly convex onR. Furthermore, by Lemma 2.4 of [18] and Property3.1(c), it is not difficult to see that φ(t) andn

i=1φ (xi)are a B-function onR and Rn, respectively. Unless other- wise stated, in the rest of this paper, we always assume that φ satisfies Property 3.1.

From the discussions in Sect.2, clearly, the following vector-valued functions φsoc(x)= φ(λ1(x))u(1)x + φ(λ2(x))u(2)x (13) and

)soc(x)= φ1(x)) u(1)x + φ2(x)) u(2)x (14) are well-defined overKnand int(Kn), respectively. In view of this, we define

H (x, y):=

⎧⎨

tr[φsoc(x)− φsoc(y)− (φ)soc(y)◦ (x − y)], ∀x ∈ Kn, y∈ int(Kn),

+∞, otherwise.

(15)

In what follows, we will show that the function H : Rn× Rn→ (−∞, +∞] enjoys some favorable properties similar to those of the D-function. Particularly, we prove that H (x, y)≥ 0 for any x ∈ Kn, y∈ int(Kn), and moreover, H (x, y)= 0 if and only if x= y. Consequently, it can be regarded as a distance measure on the SOC.

We first start with two technical lemmas that will be used in the subsequent analy- sis.

Lemma 3.1 For any x = (x1, x2), y= (y1, y2)∈ R × Rn−1, we have tr(x◦ y) ≤ λ(x), λ(y) where λ(x) = (λ1(x), λ2(x)) and λ(y)= (λ1(y), λ2(y)), and the in- equality holds with equality if and only if x2= αy2for some α > 0.

Proof From (5)–(6) and Cauchy-Schwartz inequality,

tr(x◦ y) = 2 x, y = 2x1y1+ 2xT2y2≤ 2x1y1+ 2x2 · y2.

(7)

On the other hand, from the definition of the spectral values given by (8), λ(x), λ(y) = (x1− x2)(y1− y2) + (x1+ x2)(y1+ y2)

= 2x1y1+ 2x2 · y2.

From the above two sides, we obtain immediately the inequality relation. In addition, we note that the inequality becomes an equality if and only if x2Ty2= x2 · y2, which is equivalent to saying that x2= αy2for some α > 0.  Lemma 3.2 Let φsoc(x)and (φ)soc(x)be given as in (13) and (14), respectively.

Then:

(a) φsoc(x) is continuously differentiable on int(Kn) with the gradient ∇φsoc(x) satisfying∇φsoc(x)e= (φ)soc(x).

(b) tr[φsoc(x)] =2

i=1φ[λi(x)] and tr[(φ)soc(x)] =2

i=1φi(x)].

(c) tr[φsoc(x)] is continuously differentiable on int(Kn) with ∇ tr[φsoc(x)] = 2∇φsoc(x)e.

(d) tr[φsoc(x)] is strictly convex and continuous on Kn.

(e) If{yk} ⊂ int(Kn)is a sequence such that limk→+∞yk= ¯y ∈ bd(Kn), then

k→+∞lim ∇ tr[φsoc(yk)], x − yk = −∞ for all x ∈ int(Kn).

In other words, the function tr[φsoc(x)] is boundary coercive.

Proof (a) The first part is due to Lemma2.1, and we next prove the second part. If x2= 0, then by formulas (10)–(11) it is easy to compute that

∇φsoc(x)e=

(1/2)[φ2(x))+ φ1(x))]

(1/2)[φ2(x))− φ1(x))](x2/x2)

.

In addition, using (8) and (14), we can prove that the vector in the right hand side is exactly (φ)soc(x). Therefore,∇φsoc(x)e= (φ)soc(x). If x2= 0, from ∇φsoc(x)= φ(x1)Iand formula (8), we readily obtain∇φsoc(x)e= (φ)soc(x).

(b) The result follows directly from Property2.1(c) and (13)–(14).

(c) From part (a) and the fact that tr[φsoc(x)] = tr[φsoc(x)◦ e] = 2 φsoc(x), e , clearly, tr[φsoc(x)] is continuously differentiable on int(Kn). Applying the chain rule for inner product of two functions yields immediately that ∇ tr[φsoc(x)] = 2∇φsoc(x)e.

(d) It is clear that φsoc(x)is continuous onKn. We next prove that it is strictly convex on Kn. For any x, y∈ Kn with x = y and α, β ∈ (0, 1) with α + β = 1, we have that

λ1(αx+ βy) = αx1+ βy1− αx2+ βy2 ≥ αλ1(x)+ βλ1(y), λ2(αx+ βy) = αx1+ βy1+ αx2+ βy2 ≤ αλ2(x)+ βλ2(y), implying that

αλ1(x)+ βλ1(y)≤ λ1(αx+ βy) ≤ λ2(αx+ βy) ≤ αλ2(x)+ βλ2(y).

(8)

On the other hand,

λ1(αx+ βy) + λ2(αx+ βy) = 2αx1+ 2βy1

= [αλ1(x)+ βλ1(y)] + [αλ2(x)+ βλ2(y)].

The last two equations imply that there exists ρ∈ [0, 1] such that

λ1(αx+ βy) = ρ[αλ1(x)+ βλ1(y)] + (1 − ρ)[αλ2(x)+ βλ2(y)], λ2(αx+ βy) = (1 − ρ)[αλ1(x)+ βλ1(y)] + ρ[αλ2(x)+ βλ2(y)].

Thus, from Property2.1, it follows that

tr[φsoc(αx+ βy)] = φ[λ1(αx+ βy)] + φ[λ2(αx+ βy)]

= φ[ρ(αλ1(x)+ βλ1(y))+ (1 − ρ)(αλ2(x)+ βλ2(y))] +φ[(1 − ρ)(αλ1(x)+ βλ1(y))+ ρ(αλ2(x)+ βλ2(y))]

≤ ρφ(αλ1(x)+ βλ1(y))+ (1 − ρ)φ(αλ2(x)+ βλ2(y)) +(1 − ρ)φ(αλ1(x)+ βλ1(y))+ ρφ(αλ2(x)+ βλ2(y))

= φ(αλ1(x)+ βλ1(y))+ φ(αλ2(x)+ βλ2(y))

< αφ (λ1(x))+ βφ(λ1(y))+ αφ(λ2(x))+ βφ(λ2(y))

= α tr[φsoc(x)] + β tr[φsoc(y)],

where the first equality and the last one follow from part (b), and the two inequalities are due to the strict convexity of φ onR++. By the definition of strict convexity, the conclusion holds.

(e) From part (a) and part (c), we can obtain readily the following equality:

∇ tr[φsoc(x)] = 2(φ)soc(x), ∀x ∈ int(Kn). (16) Using the relation and Lemma3.1, we then have that

∇ tr[φsoc(yk)], x − yk = 2 (φ)soc(yk), x− yk

= tr[(φ)soc(yk)◦ (x − yk)]

= tr[(φ)soc(yk)◦ x] − tr[(φ)soc(yk)◦ yk]

2 i=1

φi(yk)]λi(x)− tr[(φ)soc(yk)◦ yk]. (17)

In addition, by Property2.1(a)–(b), for any y∈ int(Kn), we can compute

)soc(y)◦ y = φ1(y))λ1(y)u(1)y + φ2(y))λ2(y)u(2)y , (18)

(9)

which implies that

tr[(φ)soc(yk)◦ yk] = 2 i=1

φi(yk)]λi(yk). (19)

Combining (17) and (19) immediately yields that

∇ tr[φsoc(yk)], x − yk2 i=1

φi(yk)][λi(x)− λi(yk)]. (20)

Note that λ2(¯y) ≥ λ1(¯y) = 0 and λ2(x)≥ λ1(x) >0 since ¯y ∈ bd(Kn) and xint(Kn). Hence, if λ2(¯y) = 0, then by Property 3.1(d) and the continuity of λi(·) for i= 1, 2, we have

k→+∞lim φi(yk)][λi(x)− λi(yk)] = −∞, i = 1, 2, which means that

k→+∞lim 2 i=1

φi(yk)][λi(x)− λi(yk)] = −∞. (21)

If λ2(¯y) > 0, then limk→+∞φ2(yk)][λ2(x)− λ2(yk)] is finite and

k→+∞lim φ1(yk)][λ1(x)− λ1(yk)] = −∞;

therefore, the result in (21) also holds under such case. Combining (21) with (20), we

prove that the conclusion holds. 

Using the relation (16), we have that, for any x∈ Knand y∈ int(Kn), tr[(φ)soc(y)◦ (x − y)] = 2 (φ)soc(y), x− y = ∇ tr[φsoc(y)], x − y . As a consequence, the function H (x, y) in (15) can be rewritten as

H (x, y)=

⎧⎪

⎪⎩

tr[φsoc(x)] − tr[φsoc(y)]

− ∇ tr[φsoc(y)], x − y ∀x ∈ Kn, y∈ int(Kn),

+∞ otherwise.

(22)

By the representation, we next investigate several important properties of H (x, y).

Proposition 3.1 Let H (x, y) be the function defined as in (15) or (22). Then, (a) H (x, y) is continuous on Kn× int(Kn)and, for any y∈ int(Kn), the function

H (·, y) is strictly convex on Kn.

(b) For any given y∈ int(Kn), H (x, y) is continuously differentiable on int(Kn)with

xH (x, y)= ∇ tr[φsoc(x)] − ∇ tr[φsoc(y)] = 2[(φ)soc(x)− (φ)soc(y)]. (23)

(10)

(c) H (x, y)≥2

i=1d(λi(x), λi(y))≥ 0 for any x ∈ Kn and y∈ int(Kn), where d(·, ·) is defined by (12). Moreover, H (x, y)= 0 if and only if x = y.

(d) For each γ ∈ R, the level sets LH(y, γ ):= {x ∈ Kn | H(x, y) ≤ γ } and LH(x, γ ):= {y ∈ int(Kn)| H (x, y) ≤ γ } are bounded for any y ∈ int(Kn)and x∈ Kn, respectively.

(e) If{yk} ⊂ int(Kn)is a sequence converging to y∈ int(Kn), then H (y, yk)→ 0.

(f) If {xk} ⊂ int(Kn) and {yk} ⊂ int(Kn) are sequences such that {yk} → yint(Kn),{xk} is bounded, and H(xk, yk)→ 0, then xk→ y.

Proof (a) Note that φsoc(x), (φ)soc(y), (φ)soc(y)◦ (x − y) are continuous for any x ∈ Kn and y∈ int(Kn)and the trace function tr(·) is also continuous, and hence H (x, y)is continuous onKn× int(Kn). From Lemma3.2(d), tr[φsoc(x)] is strictly convex overKn, whereas− tr[φsoc(y)] − ∇ tr[φsoc(y)], x − y is clearly convex in Kn for fixed y∈ int(Kn). This means that H (·, y) is strictly convex for any y ∈ int(Kn).

(b) By Lemma3.2(c), the function H (·, y) for any given y ∈ int(Kn)is continu- ously differentiable on int(Kn). The first equality in (23) is obvious and the second is due to (16).

(c) The result follows directly from the following equalities and inequalities:

H (x, y)= tr[φsoc(x)] − tr[φsoc(y)] − tr[(φ)soc(y)◦ (x − y)]

= tr[φsoc(x)] − tr[φsoc(y)] − tr[(φ)soc(y)◦ x] + tr[(φ)soc(y)◦ y]

≥ tr[φsoc(x)] − tr[φsoc(y)] − 2

i=1

φi(y))λi(x)+ tr[(φ)soc(y)◦ y]

= 2 i=1

[φ(λi(x))+ φ(λi(y))− φi(y))λi(x)+ φi(y))λi(y)]

= 2 i=1

[φ(λi(x))− φ(λi(y))− φi(y))(λi(x)− λi(y))]

= 2 i=1

d(λi(x), λi(y))≥ 0,

where the first equality is due to (15), the second and fourth are obvious, the third follows from Lemma3.2(b) and (18), the last one is from (12), and the first inequality follows from Lemma3.1and the last one is due to the strict convexity of φ onR+. Note that tr[φsoc(x)] is strictly convex for any x ∈ Kn by Lemma3.2(d), and so H (x, y)= 0 if and only if x = y by (22).

(d) From part (c), we have LH(y, γ )⊆ {x ∈ Kn| 2

i=1d(λi(x), λi(y))≤ γ }. By Property3.1(c), the set in the right-hand side is bounded. So, LH(y, γ )is bounded for y∈ int(Kn). Similarly, LH(x, γ )is bounded for x∈ Kn.

From part (a)–(d), we obtain immediately the results in (e) and (f). 

(11)

Remark 3.1

(i) From (22), it is not difficult to see that H (x, y) is exactly a distance measure induced by tr[φsoc(x)] via formula (4). Therefore, if n= 1 and φ is a Bregman function with zone R++, i.e., φ also satisfies the property: (e) if {sk} ⊆ R+ and {tk} ⊂ R++ are sequences such that tk → t, {sk} is bounded, and d(sk, tk)→ 0, then sk → t; then H (x, y) reduces to the Bregman distance function d(x, y) in (12).

(ii) When n > 1, H (x, y) is generally not a Bregman distance even if φ is a Bregman function with zoneR++, since Proposition3.1(e) and (f) do not hold for{yk} ⊆ bd(Kn)and y∈ bd(Kn). By the proof of Proposition3.1(c), the main reason is that, to guarantee that

tr[(φ)soc(y)◦ x] = 2 i=1

φi(y))λi(x),

for any x∈ Kn and y ∈ int(Kn), the relation [(φ)soc(y)]2= αx2 with some α >0 is required, where[(φ)soc(y)]2 is a vector composed of the last n− 1 elements of (φ)soc(y). It is very stringent for φ to satisfy such relation. By this, tr[φsoc(x)] is not a B-function [18] onRneither, even if φ itself is a B-function.

(iii) We observe that H (x, y) is inseparable, whereas the double-regularized distance function proposed by [19] belongs to the separable class of functions. In view of this, H (x, y) cannot become a double-regularized distance function inKn× int(Kn), even when φ is such that ˜d(s, t )= d(s, t)/φ(t )+μ2(s−t)2is a double regularized component (see [19]).

By Proposition 3.1and Remark3.1, we call H (x, y) a quasi D-function in this paper. In the following, we present several specific examples of quasi D-functions.

Example 3.1 Let φ(t)= t log t − t (with the convention 0 log 0 = 0). It is easy to verify that φ satisfies Property 3.1. By Proposition3.2(b) of [21] and (13)–(14), we can compute φsoc(x)= x ◦ log x − x and (φ)soc(y)= log y for any x ∈ Kn and y∈ int(Kn). Therefore,

H (x, y)= tr(x ◦ log x − x ◦ log y + y − x), ∀x ∈ Kn, y∈ int(Kn).

Example 3.2 Let φ(t)= t2−√

t. It is not hard to verify that φ satisfies Prop- erty3.1. Notice that, for any x∈ Kn, x2= x ◦ x = λ21(x)u(1)x + λ22(x)u(2)x and√

x=

λ1(x)u(1)x +√

λ2(x)u(2)x , and a direct computation then yields φsoc(x)= x ◦x −x and (φ)soc(y)= 2y − (1/2)[tr(

y )e−√ y]/

det(y). This implies that, for any x∈ Kn, y∈ int(Kn),

H (x, y)= tr



(x− y)2− (x−√

y )+(tr(

y )e− √y ) ◦ (x − y) 2√

det(y)

 .

Example 3.3 Take φ(t)= t log t −(1+t) log(1+t)+(1+t) log 2 (with 0 log 0 = 0).

It is easily shown that φ satisfies Property3.1. Using Property2.1(a)–(b), we can

(12)

compute

φsoc(x)= x ◦ log x − (e + x) ◦ log(e + x) + (e + x) log 2, x ∈ Kn and

)soc(y)= log y − log(e + y) + e log 2, y ∈ int(Kn).

Consequently, for any x∈ Knand y∈ int(Kn),

H (x, y)= tr[x ◦ (log x − log y) − (e + x) ◦ (log(e + x) − log(e + y))].

In addition, from [14,22], it follows thatm

i=1φ (ζi)generated by φ in the above examples is a Bregman function with zone S= Rm+, and consequentlym

i=1d(ζi, ξi) defined as in (12) is a D-function induced bym

i=1φ (ζi).

To close this section, we present another important property of H (x, y).

Proposition 3.2 Let H (x, y) be defined as in (15) or (22). Then, for all x, yint(Kn)and z∈ Kn, the following three-points identity holds:

H (z, x)+ H (x, y) − H(z, y) = ∇ tr[φsoc(y)] − ∇ tr[φsoc(x)], z − x

= tr[((φ)soc(y)− (φ)soc(x))◦ (z − x)].

Proof Using the definition of H given as in (22), we have that

∇ tr[φsoc(x)], z − x = tr[φsoc(z)] − tr[φsoc(x)] − H (z, x), ∇ tr[φsoc(y)], x − y = tr[φsoc(x)] − tr[φsoc(y)] − H(x, y), ∇ tr[φsoc(y)], z − y = tr[φsoc(z)] − tr[φsoc(y)] − H (z, y).

Subtracting the first two equations from the last one gives the first equality. By (16), ∇ tr[φsoc(y)] − ∇ tr[φsoc(x)], z − x = 2 (φ)soc(y)− (φ)soc(x), z− y . This, together with the fact that tr(x◦ y) = x, y , leads to the second equality. 

4 Proximal-Like Algorithm for the CSOCP

In this section, we propose a proximal-like algorithm for solving the CSOCP based on the quasi D-function H (x, y). For the sake of notation, we denoteF by the feasible set

F := {ζ ∈ Rm| Aζ + b Kn0}. (24) It is easy to verify thatF is convex and its interior int(F) is given by

int(F) = {ζ ∈ Rm| Aζ + b Kn0}. (25)

(13)

Let ψ: Rm→ (−∞, +∞] be the function defined by ψ (ζ ):=

tr[φsoc(Aζ+ b)], if ζ ∈ F,

+∞, otherwise. (26)

By Lemma3.2, it is easily shown that the following conclusions hold for ψ(ζ ).

Lemma 4.1 Let ψ (ζ ) be given as in (26). If the matrix A has full rank m, then:

(a) ψ(ζ ) is continuously differentiable on int(F) with ∇ψ(ζ ) = 2AT)soc × (Aζ+ b).

(b) ψ(ζ ) is strictly convex and continuous onF.

(c) ψ(ζ ) is boundary coercive, i.e., if k} ⊂ int(F) is such that limk→+∞ξk = ξ ∈ bd(F), then for all ζ ∈ int(F), there holds that limk→+∞∇ψ(ξk)T × (ζ− ξk)= −∞.

LetD(ζ, ξ) be the function induced by the above ψ(ζ ) via formula (4), i.e., D(ζ, ξ) := ψ(ζ ) − ψ(ξ) − ∇ψ(ξ), ζ − ξ . (27) Then, from (26) and (22), it is not difficult to see that

D(ζ, ξ) = H(Aζ + b, Aξ + b). (28)

So, by Proposition3.1and Lemma4.1, we can prove the following conclusions.

Lemma 4.2 LetD(ζ, ξ) be given by (27) or (28). If the matrix A has full rank m, then:

(a) D(ζ, ξ) is continuous on F × int(F) and, for any given ξ ∈ int(F), the function D(·, ξ) is strictly convex on F.

(b) For any fixed ξ∈ int(F), D(·, ξ) is continuously differentiable on int(F) with

ζD(ζ, ξ) = ∇ψ(ζ ) − ∇ψ(ξ) = 2AT[(φ)soc(Aζ+ b) − (φ)soc(Aξ+ b)].

(c) D(ζ, ξ) ≥2

i=1d(λi(Aζ+ b), λi(Aξ+ b)) ≥ 0 for any ζ ∈ F and ξ ∈ int(F), where d(·, ·) is defined by (12). Moreover,D(ζ, ξ) = 0 if and only if ζ = ξ.

(d) For each γ ∈ R, the partial level sets of LD(ξ, γ )= {ζ ∈ F | D(ζ, ξ) ≤ γ } and LD(ζ, γ )= {ξ ∈ int(F) : D(ζ, ξ) ≤ γ } are bounded for any ξ ∈ int(F) and ζ ∈ F, respectively.

The proximal-like algorithm that we propose for the CSOCP is defined as follows:

ζ0∈ int(F), (29)

ζk= argmin

ζ∈F {f (ζ ) + (1/μk)D(ζ, ζk−1)}, k ≥ 1, (30) wherek}k≥1is a sequence of positive numbers.

To establish the convergence of the algorithm, we make the following assump- tions:

(14)

(A1) inf{f (ζ ) | ζ ∈ F} := f>−∞ and dom(f ) ∩ int(F) = ∅.

(A2) The matrix A is of maximal rank m.

Remark 4.1 Assumption (A1) is elementary for the solution of the CSOCP. Assump- tion (A2) is common in the solution of SOCPs and it is obviously satisfied when F = Kn. Moreover, if we consider the standard SOCP

min cTx,

s.t. Ax= b, x ∈ Kn, (31)

where A∈ Rm×n with m≤ n, b ∈ Rm, and c∈ Rn, the assumption that A has full row rank m is standard. Consequently, its dual problem, given by

max bTy,

s.t. c− ATyKn0, (32)

satisfies Assumption (A2). This shows that we can solve the SOCP by applying the proximal-like algorithm in (29)–(30) to the dual problem (32).

In what follows, we are ready to prove the convergence of the proximal-like algo- rithm in (29)–(30) under Assumptions (A1) and (A2). We first show that the algorithm is well-defined.

Proposition 4.1 Suppose that Assumptions (A1)–(A2) hold. Then, the algorithm de- scribed as in (29)–(30) generates a sequence{ζk} ⊂ int(F) such that

−2μ−1k AT[(φ)soc(Aζk+ b) − (φ)soc(Aζk−1+ b)] ∈ ∂f (ζk). (33) Proof The proof proceeds by induction. For k= 0, clearly, ζ0∈ int(F). Assume that ζk−1∈ int(F). Let fk(ζ ):= f (ζ ) + μ−1k D(ζ, ζk−1). Then Assumption (A1) and Lemma4.2(d) imply that fk has bounded level sets inF. By the lower semi- continuity of f and Lemma 4.2(a), the minimization problem minζ∈Ffk(ζ ), i.e.

the subproblem (30), has solutions. Moreover, the solution ζk is unique due to the convexity of f and the strict convexity of D(·, ξ). In the following, we prove that ζk∈ int(F).

By Theorem 23.8 of [23] and the optimal condition for (30), ζkis the only ζ∈ Rn such that

−1k AT)soc(Aζk−1+ b) ∈ ∂(f (ζ ) + μ−1k ψ (ζ )+ δ(ζ | F)), (34) where δ(ζ| F) = 0 if ζ ∈ F and +∞ otherwise. We will show that

∂(f (ζ )+ μ−1k ψ (ζ )+ δ(ζ | F)) = ∅, for all ζ ∈ bd(F), (35) which by (34) implies that ζk∈ int(F). Take ζ ∈ bd(F) and assume that there exists w∈ ∂(f (ζ ) + μ−1k ψ (ζ )+ δ(ζ |F)). Take ζ∈ dom(f ) ∩ int(F) and let

ζl= (1 − l)ζ+ l (36)

(15)

with liml→+∞l= 0. From the convexity of int(F) and dom(f ), it then follows that ζl∈ dom(f ) ∩ int(F) and, moreover, liml→+∞ζl= ζ . Consequently,

lwT(ζ− ζ ) = wTl− ζ )

≤ f (ζl)− f (ζ ) + μ−1k [ψ(ζl)− ψ(ζ )]

≤ f (ζl)− f (ζ ) + μ−1k 2AT)soc(Aζl+ b), ζl− ζ

≤ l(f (ζ )− f (ζ )) + μ−1k l

1− ltr[(φ)soc(Aζl+ b) ◦ (Aζ − Aζl)], where the first equality is due to (36), the first inequality follows from the definition of subdifferential and the convexity of f (ζ )+ μ−1k ψ (ζ )+ δ(ζ |F) in F, the second one is due to the convexity and differentiability of ψ(ζ ) in int(F), and the last one is from (36) and the convexity of f . Using Lemma3.1and (18), we then have that

μk(1− l)[f (ζ ) − f (ζ )+ wT(ζ − ζ )]

≤ tr[(φ)soc(Aζl+ b) ◦ (Aζ+ b)] − tr[(φ)soc(Aζl+ b) ◦ (Aζl+ b)]

2 i=1

i(Aζl+ b))λi(Aζ+ b) − φi(Aζl+ b))λi(Aζl+ b)]

= 2 i=1

φi(Aζl+ b))[λi(Aζ + b) − λi(Aζl+ b)].

Since ζ ∈ bd(F), i.e., Aζ + b ∈ bd(Kn), it follows that liml→+∞λ1(Aζl+ b) = 0.

Thus, using Property3.1(d) and following the same line as the proof of Lemma3.2(d), we can prove that the right-hand side of the last inequality goes to −∞ when l tends to +∞, whereas the left-hand side has a finite limit. This gives a contradic- tion. Hence, (35) follows, which means that ζk∈ int(F).

Finally, let us prove ∂δ(ζk| F) = {0}. From p. 226 of [23], it follows that

∂δ(z|Kn)= {υ ∈ Rn| υ Kn0, tr(υ◦ z) = 0}.

Using Theorem 23.9 of [23] and the assumption dom(f )∩ int(F) = ∅, we have

∂δ(ζ| F) = {ATυ∈ Rn| υ Kn0, tr(υ◦ (Aζ + b)) = 0}.

In addition, from the self-dual property of the symmetric cone Kn, we know that tr(x ◦ y) = 0 for any x Kn 0 and y Kn 0 implies x= 0. Thus, we obtain

∂δ(ζk| F) = {0}. This together with (34) and Theorem 23.8 of [23] yields the de-

sired result. 

Proposition4.1implies that the second-order cone constrained subproblem in (30) is actually equivalent to an unconstrained one,

ζk= argmin

ζ∈Rm {f (ζ ) + μ−1k D(ζ, ζk−1)},

參考文獻

相關文件

Abstract Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

Although we have obtained the global and superlinear convergence properties of Algorithm 3.1 under mild conditions, this does not mean that Algorithm 3.1 is practi- cally efficient,

We consider an extended second-order cone linear complementarity problem (SOCLCP), including the generalized SOCLCP, the horizontal SOCLCP, the vertical SOCLCP, and the mixed SOCLCP

It is well known that second-order cone (SOC) programming can be regarded as a special case of positive semidefinite programming using the arrow matrix.. This paper further studies

The purpose of this talk is to analyze new hybrid proximal point algorithms and solve the constrained minimization problem involving a convex functional in a uni- formly convex

In this section, we define the so-called SOC-r-convex functions which is viewed as the natural extension of r-convex functions to the setting associated with second- order

We point out that extending the concepts of r-convex and quasi-convex functions to the setting associated with second-order cone, which be- longs to symmetric cones, is not easy