Interior proximal methods and central paths for convex second-order cone programming

(1)

Contents lists available atScienceDirect

Nonlinear Analysis

journal homepage:www.elsevier.com/locate/na

Interior proximal methods and central paths for convex second-order cone programming

Shaohua Pan

^a

, Jein-Shan Chen

^b,^∗^,1

aDepartment of Mathematics, South China University of Technology, Guangzhou 510640, China

bDepartment of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

a r t i c l e i n f o

Article history:

Received 18 November 2008 Accepted 28 June 2010

Keywords:

Convex second-order cone optimization Interior proximal methods

Proximal distances with respect to SOCs Convergence

Central path

a b s t r a c t

We make a unified analysis of interior proximal methods of solving convex second-order cone programming problems. These methods use a proximal distance with respect to second-order cones which can be produced with an appropriate closed proper univariate function in three ways. Under some mild conditions, the sequence generated is bounded with each limit point being a solution, and global rates of convergence estimates are obtained in terms of objective values. A class of regularized proximal distances is also constructed which can guarantee the global convergence of the sequence to an optimal solution. These results are illustrated with some examples. In addition, we also study the central paths associated with these distance-like functions, and for the linear SOCP we discuss their relations with the sequence generated by the interior proximal methods. From this, we obtain improved convergence results for the sequence for the interior proximal methods using a proximal distance continuous at the boundary of second-order cones.

1. Introduction

We consider the following convex second-order cone programming problem (CSOCP):

inf f

(

^x

)

s

.

^t

.

^Ax

=

b

,

^x

_K0

,

⁽¹⁾

where f : Rⁿ

→

_R

∪ {+∞}

is a closed proper convex function, A is an m

×

n matrix with full row rank m

,

b is a vector in R^m

,

^x

_K0 means x

∈

_K, andKis the Cartesian product of some second-order cones (SOCs), also called Lorentz cones [1].

In other words,

K

=

_Kⁿ¹

×

_Kⁿ²

× · · · ×

_Kⁿ^r (2)

where r

,

ⁿ1

, . . . ,

ⁿr

≥

1 with n₁

+ · · · +

n_r

=

n, and Kⁿⁱ

:=

(

^x1

,

^x2

) ∈

R

×

_Rⁿⁱ⁻¹

|

x₁

≥ k

x₂

k

with

k · k

being the Euclidean norm. When f reduces to a linear function, i.e. f

(

^x

) =

^c^Tx for some c

∈

_Rⁿ,(1)becomes the standard SOCP. Throughout this paper, we denote by X∗the optimal set of(1), and letV

:= {

x

∈

_Rⁿ

|

Ax

=

b

}

. The

∗Corresponding author. Tel.: +886 2 29325417; fax: +886 2 29332342.

E-mail addresses:shhpan@scut.edu.cn(S. Pan),jschen@math.ntnu.edu.tw(J.-S. Chen).

1 Member of the Mathematics Division, National Center for Theoretical Sciences, Taipei Office.

doi:10.1016/j.na.2010.06.079

(2)

CSOCP, as an extension of the standard SOCP, has a wide range of applications from engineering, control, and finance to robust optimization and combinatorial optimization; see [2,3] and references therein.

There have proposed various methods for solving the CSOCP, which include the interior point methods [4–6], the smoothing Newton methods [7,8], the smoothing–regularization method [9], the semismooth Newton method [10], and the merit function method [11]. These methods are all developed by reformulating the KKT optimality conditions as a system of equations or an unconstrained minimization problem. This paper will focus on an iterative scheme which is proximal based and handles directly the CSOCP itself. Specifically, the proximal-type algorithm consists of generating a sequence

{

x^k

}

via

x^k

:=

argmin

λ

kf

(

^x

) +

^H

(

^x

,

^x^k⁻¹

) |

^x

∈

_K

∩

_V

,

^k

=

1

,

²

, . . .

⁽³⁾

where

{ λ

k

}

is a sequence of positive parameters, and H: Rⁿ

×

_Rⁿ

→

_R

∪ {+∞}

is a proximal distance with respect to intK (seeDefinition 3.1) which plays the same role as the Euclidean distance

k

x

−

y

k

²in the classical proximal algorithms (see, e.g., [12,13]), but possesses certain more desirable properties for forcing the iterates to stay inK

∩

_V, thus eliminating the constraints automatically. As will be shown in Section4, such proximal distances can be produced with an appropriate closed proper univariate function.

In this paper, under mild assumptions like those used in interior proximal methods for convex programs over nonnegative orthant cones (see, e.g., [14–20]), we show that the sequence

{

x^k

}

is bounded with all limit points, being a solution of(1), and obtain global rates of convergence in terms of objective values. But, unlike for interior proximal methods for convex programs over nonnegative orthant cones, the global convergence of

{

x^k

}

to an optimal solution can be guaranteed for the class of proximal distancesF1

(

^K

)

^or^F2

(

^K

)

under a very restrictive assumption for X∗(seeTheorem 3.2(a)), or for their subclassesF

b

1

(

^Kⁿ

)

^orF

b

2

(

^Kⁿ

)

under mild assumptions for X∗(seeTheorem 3.2(b)), or for the smallest subclass_F

¯

2

(

^Kⁿ

)

^. These results are illustrated with some examples.

Just like proximal point methods with generalized distances, the central paths derived from barrier functions have been the object of intensive study. Recently, the central paths for semidefinite programming were under active study (see, e.g., [21–24]). For example, da Cruz Neto et al. [21] established relations among the central paths in semidefinite programming, generalized proximal point methods, and Cauchy trajectories in Riemannian manifolds, extending the results of Iusem et al. [25] for monotone variational inequality problems. Motivated by this, we also investigate the properties of the central paths of(1)with respect to (w.r.t.) the distance-like functions used by interior proximal methods (seePropositions 5.2 and5.3). For the linear SOCP, we discuss the relations between the central paths and the sequences generated by the interior proximal methods, and show that the sequence generated by interior proximal methods will converge under the usual assumptions if the proximal distance satisfies a certain continuity at the boundary of second-order cones (seeTheorem 5.2).

Auslender and Teboulle [15] provided a unified technique for analyzing and designing interior proximal methods for convex and conic optimization. However, for the CSOCP, we notice that it seems hard to find a proximal distance example for the classF+

(

Kⁿ

)

such that global convergence results similar to those for [15, Theorem 2.2] can apply for it. In this paper, we extend their unified analysis technique to interior proximal methods using a proximal distance which can be produced with an appropriate univariate function in three ways, and establish the global convergence results for the smallest class_F

¯

₂

(

^Kⁿ

)

^, and the classF

b

2

(

^Kⁿ

)

with some mild assumptions of X∗. The examples from the two classes of proximal distances are easy to find. In particular, for the linear SOCP, we obtain improved convergence results for these interior proximal methods, by exploring the relations between the sequence generated by the interior proximal methods and the central path associated with the corresponding proximal distances. In view of these contexts, this paper can be regarded as a refinement of [15] for the second-order cone optimization.

Throughout this paper, I denotes an identity matrix of suitable dimension and Rⁿdenotes the space of n-dimensional real column vectors. For any x

,

^y

∈

_Rⁿ, we write x

_Kny if x

−

y

∈

_Kⁿ; and we write x

_Kny if x

−

y

∈

intKⁿ. Given a matrix E

,

^Im

(

^E

)

means the subspace generated by the columns of E. A function is closed if and only if it is lower semicontinuous (lsc), and a function is proper if f

(

^x

) < ∞

for at least one x

∈

_Rⁿand f

(

^x

) > −∞

^{for all x}

∈

_Rⁿ. For a lsc proper convex function f

:

_Rⁿ

→

_R

∪ {+∞}

, we denote its domain by domf

:= {

x

∈

_Rⁿ

|

f

(

^x

) < ∞}

^{and the}

-subdifferential of f at

¯

x by

∂

f

(¯

^x

) := {w ∈

Rⁿ

|

f

(

^x

) ≥

^f

(¯

^x

) + hw,

^x

− ¯

x

i − , ∀

^x

∈

_Rⁿ

}

. If f is differentiable at x

, ∇

^f

(

^x

)

means the gradient of f at x. For a differentiable h on R

,

^h⁰^{and h}⁰⁰denote its first and second derivatives. For any closed set S

,

int S denotes the interior of S.

In the rest of this paper, we focus on the case whereK

=

_Kⁿ, and all the analysis can be carried over to the case where Khas the direct product structure as in(2). Unless otherwise stated, we make the following minimal assumption for the CSOCP(1):

(A1) domf

∩ (

^V

∩

intKⁿ

) 6= ∅

^{and f}^∗

:=

inf

{

f

(

^x

) |

^x

∈

_V

∩

_Kⁿ

} > −∞

^. 2. Preliminaries

This section recalls some preliminary results that will be used in the subsequent sections. For any x

= (

^x1

,

^x2

),

^y

= (

^y1

,

^y2

) ∈

R

×

_Rⁿ⁻¹, their Jordan product [1] is defined as

x

◦

y

:= (h

^x

,

^y

i ,

^y1x₂

+

x₁y₂

).

⁽⁴⁾

It is easy to verify that the identity element under the Jordan product is e

≡ (

¹

,

⁰

, . . . ,

⁰

)

^T

∈

_Rⁿ, i.e., e

◦

x

=

x for all x

∈

_Rⁿ. Note that the Jordan product is not associative, but it is power associated, i.e., x

◦ (

^x

◦

x

) = (

^x

◦

x

) ◦

x for all x

∈

_Rⁿ. Thus, we

(3)

may without fear of ambiguity write x^mfor the product of m copies of x and x^m⁺ⁿ

=

x^m

◦

xⁿfor all positive integers m and n. n We stipulate x⁰

=

e. For each x

= (

^x1

,

^x2

) ∈

R

×

_Rⁿ⁻¹, let

det

(

^x

) :=

^x²1

− k

x₂

k

² and tr

(

^x

) :=

^2x1

.

⁽⁵⁾

These are called the determinant and the trace of x, respectively. A vector x is said to be invertible if det

(

^x

) 6=

^{0. If x}

∈

_Rⁿis invertible, there is a unique y

∈

_Rⁿsatisfying x

◦

y

=

y

◦

x

=

e. We call this y the inverse of x and denote it by x⁻¹.

We recall from [1] that each x admits a spectral factorization associated withKⁿ:

x

= λ

1

(

^x

)

^u⁽x¹⁾

+ λ

2

(

^x

)

^u⁽x²⁾

,

⁽⁶⁾

where

λ

i

(

^x

)

^{and u}⁽xⁱ⁾for i

=

1

,

2 are the spectral values of x

= (

^x1

,

^x2

) ∈

R

×

_Rⁿ⁻¹and the associated spectral vectors, defined by

λ

i

(

^x

) =

^x1

+ (−

¹

)

ⁱ

k

x₂

k ,

^u⁽xⁱ⁾

=

¹

2 1

, (−

¹

)

ⁱx

¯

₂

,

⁽⁷⁾

withx

¯

₂

=

^x²

kx2kif x₂

6=

0, otherwise being any vector in Rⁿ⁻¹such that

k¯

x₂

k =

1. If x₂

6=

0, then the factorization is unique.

The following lemma is direct by formula(6).

Lemma 2.1. For any x

= (

^x1

,

^x2

),

^y

= (

^y1

,

^y2

) ∈

R

×

_Rⁿ⁻¹, the following results hold:

(a) det

(

^x

) = λ

1

(

^x

)λ

2

(

^x

),

^tr

(

^x

) = λ

1

(

^x

) + λ

2

(

^x

)

^and

k

x

k

²

=

¹

2

(λ

1

(

^x

))

²

+ (λ

2

(

^x

))

²

. (b) x

∈

_Kⁿ

⇐⇒ λ

1

(

^x

) ≥

^{0 and x}

∈

_int_Kⁿ

⇐⇒ λ

1

(

^x

) >

^0.

(c)

λ

1

(

^x

)λ

2

(

^y

) + λ

2

(

^x

)λ

1

(

^y

) ≤

^tr

(

^x

◦

y

) ≤ λ

1

(

^x

)λ

1

(

^y

) + λ

2

(

^x

)λ

2

(

^y

)

^.

With the spectral factorization above, one may define a vector-valued function using a univariate function. For any given h: I_R

→

_{R with I}_R

⊆

R, define h^soc: S

→

_Rⁿby

h^soc

(

^x

) :=

^h

(λ

1

(

^x

)) ·

^u⁽x¹⁾

+

h

(λ

2

(

^x

)) ·

^u⁽x²⁾

, ∀

^x

∈

S

.

⁽⁸⁾ The definition is unambiguous whether x₂

6=

0 or x₂

=

0. For example, let h

(

^t

) =

^t⁻¹^{for any t}

>

0; then using formulas (6)and(8)we can compute that

x⁻¹

:=

h^soc

(

^x

) =

¹

x²₁

− k

x₂

k

²

(

^x1

, −

^x2

) =

^tr

(

^x

)

^e

−

x

det

(

^x

)

^{for x}

∈

intKⁿ

.

⁽⁹⁾

Moreover, by Lemma 2.2 of [26], S is open whenever I_Ris open, and S is closed whenever I_Ris closed. The following lemma shows that some favorable properties of h can be transmitted to h^soc, whose proofs were given in Proposition 5.1 of [8] and Lemma 2.2 of [27].

Lemma 2.2. Given h: I_R

→

_{R with I}_R

⊆

_{R, let h}^soc: S

→

_Rⁿbe the vector-valued function induced by h via(8), where S

⊆

_Rⁿ. Then, the following results hold:

(a) If h is continuously differentiable on int I_R, then h^socis continuously differentiable on int S, and for any x

∈

int S with x

= (

^x1

,

^x2

) ∈

R

×

_Rⁿ⁻¹,

∇

_h^soc

(

^x

) =



 

 

 

 

h⁰

(

^x1

)

^I ^{if x}2

=

0

,







b c x^T₂

k

x₂

k

c x₂

k

x₂

k

^aI

+ (

^b

−

a

) k

^xx²₂^x

k

^T²²







otherwise

where a

=

^h^(λ_λ²⁽^x⁾⁾⁻^h^(λ¹⁽^x⁾⁾

2(x)−λ1(x)

,

^b

=

^h⁰^(λ²⁽^x⁾⁾⁺^h⁰^(λ¹⁽^x⁾⁾

2

,

^c

=

^h⁰^(λ²⁽^x⁾⁾⁻^h⁰^(λ¹⁽^x⁾⁾

2 .

(b) If h is continuously differentiable on int I_R, then tr

(

^h^soc

(

^x

))

is continuously differentiable on int S with

∇

tr

(

^h^soc

(

^x

)) =

2

∇

h^soc

(

^x

)

^e

=

2

(

^h⁰

)

^soc

(

^x

)

^.

(c) If h is (strictly) convex on I_R, then tr

(

^h^soc

(

^x

))

is (strictly) convex on S.

Lemma 2.3. (a) The real-valued function ln

(

^det

(

^x

))

is strictly concave on intKⁿ. (b) For any x

,

^y

∈

intKⁿwith x

6=

y, it holds that

det

(α

^x

+ (

¹

− α)

^y

) > (

^det

(

^x

))

^α

(

^det

(

^y

))

¹⁻^α

, ∀α ∈ (

⁰

,

¹

).

Proof. Clearly, part (b) is a direct consequence of part (a). The proof of part (a) was given in [28, Prop. 2.4(a)] by computing the Hessian matrix of ln

(

^det

(

^x

))

. Here, we give a simpler proof. Let ln x be the vector-valued function induced by ln t via (8). FromLemma 2.1(a), ln

(

^det

(

^x

)) =

^ln

(λ

1

(

^x

)) +

^ln

(λ

2

(

^x

)) =

^tr

(

^{ln x}

)

^{for any x}

∈

_int_Kⁿ. The result is then direct by Lemma 2.2(c) and the strict concavity of ln t

(

^t

>

⁰

)

^.

(4)

To close this section, we review the definition of SOC-convexity and SOC-monotonicity. The two concepts, like matrix- convexity and the matrix-monotonicity in semidefinite programming, play an important role in the solution methods of SOCPs.

Definition 2.1 ([28]). Given h: I_R

→

_{R with I}_R

⊆

_{R. Let h}^soc: S

→

_Rⁿwith S

⊆

_Rⁿbe the vector-valued function induced by h via formula(8). Then,

(a) h is said to be SOC-convex of order n on I_Rif for any x

,

^y

∈

_{S and 0}

≤ β ≤

^1,

h^soc

(β

^x

+ (

¹

− β)

^y

)

Kⁿ

β

^h^soc

(

^x

) + (

¹

− β)

^h^soc

(

^y

).

⁽¹⁰⁾ (b) h is said to be SOC-monotone of order n on I_Rif for any x

,

^y

∈

S,

x

_Kny

H⇒

h^soc

(

^x

)

Kⁿh^soc

(

^y

).

We say that h is SOC-convex (respectively, SOC-monotone) on I_Rif h is SOC-convex of all orders n (respectively, SOC- monotone of all orders n) on I_R. A function h is said to be SOC-concave on I_Rwhenever

−

h is SOC-convex on I_R. When h is continuous on I_R, the condition in(10)can be replaced by a more special condition:

h^soc

x

+

y 2

_Kn

1

2

(

^h^soc

(

^x

) +

^h^soc

(

^y

)).

⁽¹¹⁾

Obviously, the set of SOC-monotone functions and the set of SOC-convex functions are both closed under positive linear combinations and under pointwise limits.

For the characterizations of SOC-convexity and SOC-monotonicity, the interested reader may refer to [28,29]. The following lemma collects some common SOC-concave functions whose proofs can be found in [27] or are direct by Lemma 3.2 of [27].

Lemma 2.4. (a) For any fixed u

∈

R, the function h

(

^t

) = (

^t

+

u

)

^r ^{with r}

∈ [

0

,

¹

]

is SOC-concave and SOC-monotone on

[−

u

, +∞)

^.

(b) For any fixed u

∈

R, the function h

(

^t

) = −(

^t

+

u

)

⁻^r^{with r}

∈ [

0

,

¹

]

(−

^u

, +∞)

^. (c) For any fixed

α ≥

⁰

,

^ln

(α +

^t

)

[−

a

, +∞)

^.

(d) For any fixed u

≥

₀

,

_u+^tt is SOC-concave and SOC-monotone on

(−

^u

, +∞)

^. 3. Interior proximal methods

First of all, we present the definition of a proximal distance w.r.t. the open cone intKⁿ.

Definition 3.1. An extended-valued function H: Rⁿ

×

_Rⁿ

→

_R

∪ {+∞}

is called a proximal distance with respect to intKⁿ if it satisfies the following properties:

(P1) domH

(·, ·) =

^C1

×

_C₂with intKⁿ

×

intKⁿ

⊂

_C₁

×

_C₂

⊆

_Kⁿ

×

_Kⁿ.

(P2) For each given y

∈

intKⁿ

,

^H

(·,

^y

)

is continuous and strictly convex onC1, and it is continuously differentiable on intKⁿwith dom

∇

₁H

(·,

^y

) =

^intKⁿ.

(P3) H

(

^x

,

^y

) ≥

0 for all x

,

^y

∈

_Rⁿ, and H

(

^y

,

^y

) =

0 for all y

∈

intKⁿ^.

(P4) For each fixed y

∈

_C₂, the sets

{

x

∈

_C₁

:

H

(

^x

,

^y

) ≤ γ }

are bounded for all

γ ∈

R.

Definition 3.1has a little difference from Definition 2.1 of [15] for a proximal distance w.r.t. intKⁿ, since here H

(·,

^y

)

^is required to be strictly convex overC1for any fixed y

∈

intKⁿ. We denote byD

(

^int^Kⁿ

)

the family of functions H satisfying Definition 3.1. With a given H

∈

_D

(

^int^Kⁿ

)

, we have the following basic iterative algorithm for(1).

Interior Proximal Algorithm (IPA). Given H

∈

_D

(

^intKⁿ

)

^{and x}⁰

∈

_V

∩

intKⁿ^{, for k}

=

1

,

²

, . . .

^{, with}

λ

k

>

^{0 and}

k

≥

0, generate a sequence

{

x^k

} ⊂

_V

∩

intKⁿwith g^k

∈ ∂

kf

(

^x^k

)

via the following iterative scheme:

x^k

:=

argmin

λ

kf

(

^x

) +

^H

(

^x

,

^x^k⁻¹

) |

^x

∈

_V

(12) such that

λ

kg^k

+ ∇

₁H

(

^x^k

,

^x^k⁻¹

) =

^A^T^u^k ^{for some u}^k

∈

_R^m

.

⁽¹³⁾

The following proposition implies that the IPA is well-defined, and moreover, from its proof we see that the iterative formula(12)is equivalent to the iterative scheme(3). When

k

>

0 for any k

∈

N (the set of natural numbers), the IPA can be viewed as an approximate interior proximal method, and it becomes exact if

k

=

0 for all k

∈

_N.

(5)

Proposition 3.1. For any given H

∈

_D

(

^intKⁿ

)

^{and y}

∈

intKⁿ, consider the problem

f∗

(

^y

, τ) =

^inf

{ τ

^f

(

^x

) +

^H

(

^x

,

^y

) |

^x

∈

_V

}

with

τ >

⁰

.

⁽¹⁴⁾ Then, for each

≥

0, there exist x

(

^y

, τ) ∈

^V

∩

intKⁿand g

∈ ∂

f

(

^x

(

^y

, τ))

^{such that}

τ

^g

+ ∇

₁H

(

^x

(

^y

, τ),

^y

) =

^A^T^u ⁽¹⁵⁾

for some u

∈

_R^m. Moreover, for such x

(

^y

, τ)

^{, we have}

τ

^f

(

^x

(

^y

, τ)) +

^H

(

^x

(

^y

, τ),

^y

) ≤

^f^∗

(

^y

, τ) + .

Proof. Set F

(

^x

, τ) := τ

^f

(

^x

)+

^H

(

^x

,

^y

)+δ

V∩Kⁿ

(

^x

)

^{, where}

δ

V∩Kⁿ

(

^x

)

is the indicator function defined on the setV

∩

_Kⁿ. Since domH

(·,

^y

) =

^C1

⊂

_Kⁿ, it is clear that

f∗

(

^y

, τ) =

^inf

F

(

^x

, τ) |

^x

∈

_Rⁿ

.

⁽¹⁶⁾

Since f∗

> −∞

, it is easy to verify that for any

γ ∈

R the following relation holds:

x

∈

_Rⁿ

|

F

(

^x

, τ) ≤ γ ⊂

^x

∈

_V

∩

_Kⁿ

|

H

(

^x

,

^y

) ≤ γ − τ

^f^∗

⊂ {

_x

∈

_C₁

|

_H

(

^x

,

^y

) ≤ γ − τ

^f∗

} ,

which together with (P4) implies that F

(·, τ)

has bounded level sets. In addition, by (P1)–(P3), F

(·, τ)

is a closed proper and strictly convex function. Hence, the problem(16)has a unique solution, say x

(

^y

, τ)

. From the optimality conditions of(16), we get

0

∈ ∂

^F

(

^x

(

^y

, τ)) = τ∂

^f

(

^x

(

^y

, τ)) + ∇

1H

(

^x

(

^y

, τ),

^y

) + ∂δ

V∩_Kⁿ

(

^x

(

^y

, τ))

where the equality is due to Theorem 23.8 of [30] and domf

∩ (

^V

∩

intKⁿ

) 6= ∅

. Notice that dom

∇

₁H

(·,

^y

) =

^int^Kⁿ^and dom

∂δ

V∩Kⁿ

(·) =

^V

∩

_Kⁿ. Therefore, the last equation implies x

(

^y

, τ) ∈

^V

∩

intKⁿ, and there exists g

∈ ∂

^f

(

^x

(

^y

, τ))

^such that

− τ

^g

− ∇

₁H

(

^x

(

^y

, τ),

^y

) ∈ ∂δ

V∩_Kⁿ

(

^x

(

^y

, τ)).

On the other hand, by the definition of

δ

V∩Kⁿ

(·)

, it is not hard to derive that

∂δ

V∩Kⁿ

(

^x

) =

^Im

(

^A^T

) ∀

^x

∈

_V

∩

intKⁿ

.

The last two equations imply that(15)holds for

=

^{0. When}

>

^0,⁽¹⁵⁾also holds for such x

(

^y

, τ)

and g since

∂

^f

(

^x

(

^y

, τ)) ⊂

∂

f

(

^x

(

^y

, τ))

. Finally, since for each y

∈

intKⁿthe function H

(·,

^y

)

is strictly convex, and since g

∈ ∂

f

(

^x

(

^y

, τ))

^{, we have}

τ

^f

(

^x

) +

^H

(

^x

,

^y

) ≥ τ

^f

(

^x

(

^y

, τ)) +

^H

(

^x

(

^y

, τ),

^y

) + hτ

^g

+ ∇

₁H

(

^x

(

^y

, τ),

^y

),

^x

−

x

(

^y

, τ)i −

= τ

^f

(

^x

(

^y

, τ)) +

^H

(

^x

(

^y

, τ),

^y

) + h

^A^T^u

,

^x

−

x

(

^y

, τ)i −

= τ

^f

(

^x

(

^y

, τ)) +

^H

(

^x

(

^y

, τ),

^y

) −

^{for all x}

∈

_V

,

where the first equality is from(15)and the last one is by x

,

^x

(

^y

, τ) ∈

V. Thus, f∗

(

^y

, τ) =

^inf

{ τ

^f

(

^x

) +

^H

(

^x

,

^y

) |

^x

∈

_V

} ≥ τ

^f

(

^x

(

^y

, τ)) +

^H

(

^x

(

^y

, τ),

^y

) −

^.

In the rest of this section, we focus on the convergence behaviors of the IPA with H from several subclasses ofD

(

^int^Kⁿ

)

^, which also satisfy one of the following properties.

(P5) For any x

,

^y

∈

intKⁿand z

∈

_C₁

,

^H

(

^z

,

^y

) −

^H

(

^z

,

^x

) ≥ h∇

1H

(

^x

,

^y

),

^z

−

x

i

.

(

^P5⁰

)

^{For any x}

,

^y

∈

intKⁿand z

∈

_C₂, H

(

^y

,

^z

) −

^H

(

^x

,

^z

) ≥ h∇

1H

(

^x

,

^y

),

^z

−

x

i

.

(P6) For each x

∈

_C₁, the level sets

{

y

∈

_C₂

:

H

(

^x

,

^y

) ≤ γ }

are bounded for all

γ ∈

R.

Specifically, we denote asF1

(

^int^Kⁿ

)

^and^F2

(

^int^Kⁿ

)

the families of functions H

∈

_D

(

^int^Kⁿ

)

satisfying (P5) and

(

^P5⁰

)

^, respectively. IfC1

=

_Kⁿ, we denote asF1

(

^Kⁿ

)

the family of functions H

∈

_D

(

^int^Kⁿ

)

satisfying (P5) and (P6). IfC2

=

_Kⁿ, we writeF2

(

^intKⁿ

)

^asF

(

Kⁿ

)

. It is easy to see that the class of proximal distanceF

(

^intKⁿ

)

(respectively,F

(

Kⁿ

)

^{) in [15]}

subsumes the

(

^H

,

^H

)

^{with H}

∈

_F₁

(

^int^Kⁿ

)

(respectively,F1

(

^Kⁿ

)

), but it does not include any

(

^H

,

^H

)

^{with H}

∈

_F₂

(

^int^Kⁿ

)

(respectively,F2

(

^Kⁿ

)

^).

Theorem 3.1. Let

{

x^k

}

be the sequence generated by the IPA with H

∈

_F₁

(

^int^Kⁿ

)

^{or H}

∈

_F₂

(

^int^Kⁿ

)

^{. Set}

σ

ν

= P

_ν

k=1

λ

k. Then, the following results hold:

(a) f

(

^x^ν

) −

^f

(

^x

) ≤ σ

_ν⁻¹^H

(

^x

,

^x⁰

) + σ

_ν⁻¹

P

_ν

k=1

σ

k

kfor any x

∈

_V

∩

_C₁if H

∈

_F₁

(

^int^Kⁿ

)

^{; f}

(

^x^ν

) −

^f

(

^x

) ≤ σ

_ν⁻¹^H

(

^x⁰

,

^x

) + σ

_ν⁻¹

P

_ν

k=1

σ

k

kfor any x

∈

_V

∩

_C₂if H

∈

_F₂

(

^int^Kⁿ

)

^. (b) If

σ

ν

→ +∞

and

k

→

0, then lim inf_ν→∞f

(

^x^ν

) =

^f^∗^. (c) The sequence

{

f

(

^x^k

)}

converges to f∗whenever

P

∞

k=1

k

< ∞

^.

(6)

(d) If X∗

6= ∅

, then

{

x^k

}

is bounded with all limit points in X∗under

(

^d1

)

^or

(

^d2

)

^: (d1) X∗is bounded and

P

∞

k=1

k

< ∞

^; (d2)

P

∞

k=1

λ

k

< ∞

^{and H}

∈

_F₁

(

^Kⁿ

) (

^{or H}

∈

_F₂

(

^Kⁿ

))

^.

Proof. The proofs are similar to those of [15, Theorem 4.1]. For completeness, we here take H

∈

_F₂

(

^int^Kⁿ

)

for example to prove the results.

(a) Since g^k

∈ ∂

kf

(

^x^k

)

, from the definition of the subdifferential, it follows that f

(

^x

) ≥

^f

(

^x^k

) + h

^g^k

,

^x

−

x^k

i −

k

∀

x

∈

_Rⁿ

.

This, together with Eq.(13), implies that

λ

k

(

^f

(

^x^k

) −

^f

(

^x

)) ≤ h∇

1H

(

^x^k

,

^x^k⁻¹

),

^x

−

x^k

i + λ

k

∀

x

∈

_V

∩

_C₂

.

Using

(

^P5⁰

)

^{with x}

=

x^k

,

^y

=

x^k⁻¹and z

=

x

∈

_V

∩

_C₂, it then follows that

λ

k

(

^f

(

^x^k

) −

^f

(

^x

)) ≤

^H

(

^x^k⁻¹

,

^x

) −

^H

(

^x^k

,

^x

) + λ

k

∀

x

∈

_V

∩

_C₂

.

⁽¹⁷⁾ Summing over k

=

1

,

²

, . . . , ν

in this inequality yields that

− σ

νf

(

^x

) + X

^ν

k=₁

λ

kf

(

^x^k

) ≤

^H

(

^x⁰

,

^x

) −

^H

(

^x^ν

,

^x

) + X

^ν

k=₁

λ

k

.

⁽¹⁸⁾

On the other hand, setting x

=

x^k⁻¹in(17), we obtain f

(

^x^k

) −

^f

(

^x^k⁻¹

) ≤ λ

⁻k¹

H

(

^x^k⁻¹

,

^x^k⁻¹

) −

^H

(

^x^k

,

^x^k⁻¹

) +

k

≤

k

.

⁽¹⁹⁾ Multiplying the inequality by

σ

k−1(with

σ

0

≡

0) and summing over k

=

1

, . . . , ν

^{, we get}

X

ν k=1

σ

k−1f

(

^x^k

) − X

^ν

k=1

σ

k−1f

(

^x^k⁻¹

) ≤ X

^ν

k=1

σ

k−1

k

.

Noting that

σ

k

= λ

k

+ σ

k−1with

σ

0

≡

0, the above inequality can reduce to

σ

νf

(

^x^ν

) − X

^ν

k=1

λ

kf

(

^x^k

) ≤ X

^ν

k=1

σ

k−1

k

.

⁽²⁰⁾

Adding the inequalities(18)and(20)and recalling that

σ

k

= λ

k

+ σ

k−1, it follows that f

(

^x^ν

) −

^f

(

^x

) ≤ σ

_ν⁻¹

H

(

^x⁰

,

^x

) −

^H

(

^x^ν

,

^x

) + σ

_ν⁻¹

X

^ν

k=1

σ

k

∀

x

∈

_V

∩

_C₂

,

which immediately implies the desired result due to the nonnegativity of H

(

^x^ν

,

^x

)

^.

(b) If

σ

ν

→ +∞

and

k

→

0, then applying Lemma 2.2(ii) of [15] with a_k

=

kand b_ν

:= σ

_ν⁻¹

P

_ν

k=1

λ

k

kyields

σ

_ν⁻¹

P

_ν

k=1

λ

k

→

0. From part (a), it then follows that lim inf

ν→∞ f

(

^x^ν

) ≤

^inf

f

(

^x

) |

^x

∈

_V

∩

intKⁿ

.

This together with f

(

^x^ν

) ≥

^inf

{

f

(

^x

) |

^x

∈

_V

∩

_Kⁿ

}

implies that lim inf

ν→∞ f

(

^x^ν

) =

^inf

f

(

^x

) |

^x

∈

_V

∩

intKⁿ

=

f∗

.

(c) From(19), 0

≤

f

(

^x^k

) −

^f^∗

≤

f

(

^x^k⁻¹

) −

^f^∗

+

k. Using Lemma 2.1 of [15] with

γ

k

≡

0 and

v

k

=

f

(

^x^k

) −

^f^∗, we have that

{

f

(

^x^k

)}

converges to f∗whenever

P

∞

k=1

k

< ∞

^.

(d) If the condition (d1) holds, then the sets

{

x

∈

_V

∩

_Kⁿ

|

f

(

^x

) ≤ γ }

are bounded for all

γ ∈

R, since f is closed proper convex and X∗

= {

x

∈

_V

∩

_Kⁿ

|

f

(

^x

) ≤

^f^∗

}

. Note that(19)implies

{

x^k

} ⊂ {

x

∈

_V

∩

_Kⁿ

|

f

(

^x

) ≤

^f

(

^x⁰

)+P

^kj=1

j

}

. Combining with

P

∞

k=1

k

< ∞

, clearly we have that

{

x^k

}

is bounded. Since

{

f

(

^x^k

)}

converges to f∗and f is lsc, passing to the limit and recalling that

{

x^k

} ⊂

_V

∩

_Kⁿyields that each limit point of

{

x^k

}

is a solution of(1).

Suppose that the condition (d2) holds. If H

∈

_F₂

(

^Kⁿ

)

, then inequality(17)holds for each x

∈

_V

∩

_Kⁿ, and particularly for x∗

∈

X∗. Consequently,

H

(

^x^k

,

^x^∗

) ≤

^H

(

^x^k⁻¹

,

^x^∗

) + λ

k

∀

x∗

∈

X∗

.

⁽²¹⁾

Summing over k

=

1

,

²

, . . . , ν

for the last inequality, we obtain H

(

^x^ν

,

^x^∗

) ≤

^H

(

^x⁰

,

^x^∗

) + X

^ν

k=1

λ

k

.

Interior proximal methods and central paths for convex second-order cone programming

Nonlinear Analysis