On the existence of saddle points for nonlinear second-order cone programming problems

(1)

On the existence of saddle points for nonlinear second-order cone programming problems

Jinchuan Zhou · Jein-Shan Chen

Received: 8 October 2013 / Accepted: 27 October 2014 / Published online: 6 November 2014

Abstract In this paper, we study the existence of local and global saddle points for nonlinear second-order cone programming problems. The existence of local saddle points is developed by using the second-order sufficient conditions, in which a sigma-term is added to reflect the curvature of second-order cone. Furthermore, by dealing with the perturbation of the primal problem, we establish the existence of global saddle points, which can be applicable for the case of multiple optimal solutions. The close relationship between global saddle points and exact penalty representations are discussed as well.

Keywords Local and global saddle points· Second-order sufficient conditions · Augmented Lagrangian· Exact penalty representations

Mathematics Subject Classification 90C26· 90C46 1 Introduction

Recall that the second-order cone (SOC), also called the Lorentz cone or ice-cream cone, in IR^m+1is defined as

Km+1= {(x1, x2) ∈ IR × IR^m| x2 ≤ x1},

The author’s work is supported by National Natural Science Foundation of China (11101248, 11171247, 11271233), Shandong Province Natural Science Foundation (ZR2010AQ026, ZR2012AM016), and Young Teacher Support Program of Shandong University of Technology. Jein-Shan Chen work is supported by Ministry of Science and Technology, Taiwan.

J. Zhou

Department of Mathematics School of Science, Shandong University of Technology, Zibo 255049, People’s Republic of China

e-mail: jinchuanzhou@163.com

J.-S. Chen (

B

⁾

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail: jschen@math.ntnu.edu.tw

(2)

where · denotes the Euclidean norm. The order relation induced by this pointed closed convex coneK_m+1is given by

xKm+10 ⇐⇒ x ∈ IR^m+1, x1≥ x2.

In this paper, we consider the following nonlinear second-order cone programming (NSOCP) min f(x)

s.t. gj(x) _K_{m j +1} 0, j = 1, 2, . . . , J, h(x) = 0,

(1)

where f : IRⁿ → IR, h : IRⁿ → IR^l, gj : IRⁿ → IR^m^j⁺¹are twice continuously differentiable functions, andKmj+1is the second-order cone in IR^m^j⁺¹for j = 1, 2, . . . , J.

For a given nonlinear programming problem, we can define another programming problem associated with it by using traditional Lagrangian functions. The original problem is called the primal problem, and the latter one is called the dual problem. Since the weak duality property always holds, our concern is on how to obtain the strong duality property (or zero duality gap property). In other words, we want to know when the primal and dual problems have the same optimal values, which provides the theoretical foundation for many primal-dual type methods. However, if we employ the traditional Lagrangian functions, then some convexity is necessary for achieving strong duality property. To overcome this drawback, we need to resort to the augmented Lagrangian functions, whose main advantage is ensuring the strong duality property without requiring convexity. In addition, the zero duality gap property coincides with the existence of global saddle points, provided that the optimal solution sets of the primal and dual problems are nonempty, respectively. Many researchers have studied the properties of augmented Lagrangian and the existence of saddle points. For example, Rockafellar and Wets [13] proposed a class of augmented Lagrangian where augmented function is required to be convex functions. This was extended by Huang and Yang [6] where convexity condition is replaced by level-boundedness, and it was further generalized by Zhou and Yang [21] where level-boundedness condition is replaced by so-called valley-at-zero property; see also [14]

for more details. These important works give an unified frame for the augmented Lagrangian function and its duality theory. Meanwhile, Floudas and Jongen [5] pointed out the crucial role of saddle points for the minimization of smooth functions with a finite number of stationary points. The necessary and/or sufficient conditions to ensure the existence of local and/or global saddle points were investigated by many researchers. For example, the existence of local and global saddle points of Rockafellar’s augmented Lagrangian function was studied in [12].

Local saddle points of the generalized Mangasarian’s augmented Lagrangian were analyzed in [19]. The existences of local and global saddle points of pth power nonlinear Lagrangian were discussed in [7,8,18]. For more references, please see [9,10,14,16,17,20,22].

All the results mentioned above are focused on either the standard nonlinear programming or the generalized minimizing problems [13]. The main purpose of this paper is to establish the existences of local and global saddle points of NSOCP (1) by sufficiently exploiting the special structure of SOC. As shown in nonlinear programming, the positive definiteness of

∇_{x x}² L over the critical cone is a sufficient condition for the existence of local saddle points.

However, this classical result cannot be extended trivially to NSOCP (1) and the analysis is more complicated because IR₊ⁿ is polyhedral, whereasK_m+1 is non-polyhedral. Hence, we particulary study the sigma-term [4], which in some extend stands for the curvature

(3)

of second-order cone. Our result shows that the local saddle point exists provided that the sum of∇²x xL andHis positive definite even if∇x x² L is indefinite (see Theorem2.3). This undoubtedly clarifies the essential role played by the sigma-term. Moreover, by developing the perturbation of the primal problem, we establish the existence of global saddle points without restricting the optimal solution being unique, as required in [12,16]. Furthermore, we study another important concept, exact penalty representation, and develop its new necessary and sufficient conditions. The close relationship between global saddle points and exact penalty representations is established as well.

To end this section, we introduce some basic concepts which will be needed for our subsequent analysis. Let IRⁿ be n-dimensional real vector space. For x, y ∈ IRⁿ, the inner product is denoted by x^Ty orx, y. Given a convex subset A ⊆ IRⁿand a point x ∈ A, the normal cone of A at x, denoted by NA(x), is defined as

N_A(x) := {v ∈ IRⁿ| v, z − x ≤ 0, ∀z ∈ A}, and the tangent cone, denoted by T_A(x), is defined as

TA(x) := NA(x)^◦,

where NA(x)^◦ means the polar cone of NA(x). Given d ∈ TA(x), the outer second order tangent set is defined as

T_A²(x, d) =

w ∈ IRⁿ ∃t_n ↓ 0 such that dist

x+ tnd+1 2t_n²w, A

= o(tn²) . The support function of A is

σ (x | A) := sup{x, z | z ∈ A}.

We also write cl(A), int(A), and ∂(A) to stand for the closure, interior, and boundary of A, respectively. For the simplicity of notations, let us writeKjto stand forKmj+1andKbe the Cartesian product of these second-order cones, i.e.,K:=K1×K2× · · ·KJ. In addition, we denote g(x) := (g1(x), g2(x), . . . , gJ(x)), p :=_J

j=1(mj+ 1), and S^∗means the solution set of NSOCP (1). According to [13, Exercise 11.57], the augmented Lagrangian function for NSOCP (1) is written as

Lc(x, λ, μ, c) := f (x) + μ, h(x) + c 2h(x)² +c

2

J j=1

dist²

gj(x) −λj

c ,Kj

− λj

c ²

. (2)

Here c ∈ IR++ := {ζ ∈ IR | ζ > 0} and (x, λ, μ) ∈ IRⁿ × IR^p × IR^l with λ = (λ1, λ2, . . . , λJ) ∈ IR^m¹⁺¹× IR^m²⁺¹× · · · × IR^m^J⁺¹.

Definition 1.1 LetLcbe given as in (2) and(x^∗, λ^∗, μ^∗) ∈ IRⁿ× IR^p× IR^l.

(a) The triple(x^∗, λ^∗, μ^∗) is said to be a local saddle point ofLcfor some c> 0 if there existsδ > 0 such that

Lc(x^∗, λ, μ) ≤Lc(x^∗, λ^∗, μ^∗) ≤Lc(x, λ^∗, μ^∗), ∀x ∈B(x^∗, δ), (λ, μ) ∈ IR^p× IR^l, (3) whereB(x^∗, δ) denotes the δ-neighborhood of x^∗, i.e.,B(x^∗, δ) := {x ∈ IRⁿ| x−x^∗ ≤ δ}.

(4)

(b) The triple(x^∗, λ^∗, μ^∗) is said to be a global saddle point ofLcfor some c> 0 if Lc(x^∗, λ, μ) ≤Lc(x^∗, λ^∗, μ^∗) ≤Lc(x, λ^∗, μ^∗), ∀x ∈ IRⁿ, (λ, μ) ∈ IR^p× IR^l. (4)

2 On local saddle points

In this section, we focus on the necessary and sufficient conditions for the existence of local saddle points. For simplicity, we let Q stand for a second-order cone without emphasizing its dimension, while using the notation Q⊂ IR^m+1to indicate that Q is regarded as a second- order cone in IR^m+1. In other words, the result holding for Q is also applicable toKi for i = 1, . . . , J in the subsequent analysis. According to [13, Example 6.16] we know for a∈ Q,

− b ∈ NQ(a) ⇐⇒ Q(a − b) = a

⇐⇒ dist(a − b, Q) = b

⇐⇒ a ∈ Q, b ∈ Q, a^Tb= 0, (5)

where the last equivalence comes from the fact that Q is a self-dual cone, i.e.,(Q)^◦= −Q.

Lemma 2.1 Let_L_cbe given as in (2). Then, the augmented Lagrangian function_L_c(x, λ, μ) is nondecreasing with respect to c> 0.

Proof See [13, Exercise 11.56].

We now discuss the necessary conditions for local saddle points.

Theorem 2.1 Suppose(x^∗, λ^∗, μ^∗) is a local saddle point ofLc^∗. Then, (a) −λ^∗∈ N_K(g(x^∗));

(b) _L_c(x^∗, λ^∗, μ^∗) = f (x^∗) for all c > 0;

(c) x^∗is a local optimal solution to NSOCP (1).

Proof We first show that x^∗is a feasible point of NSOCP (1), for which we need to verify two things: (i) h(x^∗) = 0, (ii) gj(x^∗) Kj 0 for all j= 1, 2, . . . , J.

(i) Suppose h(x^∗) = 0. Taking μ = γ h(x^∗) with γ → ∞, and applying the first inequality in (3) yieldsLc^∗(x^∗, λ^∗, μ^∗) = ∞ which is a contradiction. Thus, h(x^∗) = 0.

(ii) Suppose g_j(x^∗) /∈ Kj for some j = 1, . . . , J. Then, there exist ˜λj ∈ Kj such that η := ˜λj, gj(x^∗) < 0. Therefore, for β ∈ IR

dist²

gj(x^∗) −β ˜λj

c^∗ ,Kj

−

β ˜λj

c^∗

2

=

g_j(x^∗) −β ˜λj

c^∗ − Kj

g_j(x^∗) −β ˜λj

c^∗

2

−

β ˜λj

c^∗

2

=

gj(x^∗)−_K_j

gj(x^∗)−β ˜λj

c^∗

2

−2

β ˜λj

c^∗ , gj(x^∗)−_K_j

gj(x^∗)−β ˜λj

c^∗

(5)

≥ dist²

g_j(x^∗),Kj

− 2β

˜λj

c^∗, gj(x^∗)

= dist²

gj(x^∗),Kj

− 2β η

c^∗. (6)

Here the inequality comes from the facts that

g_j(x^∗) − Kj

g_j(x^∗) −β ˜λj

c^∗

≥ g_j(x^∗) − Kj(gj(x^∗)) = dist(gj(x^∗),Kj)

and

˜λj, _Kj

g_j(x^∗) − (β ˜λj/c^∗)

≥ 0

because ˜λj ∈KjandKj

g_j(x^∗) − (β ˜λj/c^∗)

∈Kj. Takingβ → ∞, it follows from (3) and (6) thatLc^∗(x^∗, λ^∗, μ^∗) is unbounded above which is a contradiction.

Pluggingλ = 0 in the first inequality of (3) (i.e.,Lc^∗(x^∗, 0, μ^∗) ≤ Lc^∗(x^∗, λ^∗, μ^∗)), we obtain

J j=1

⎡

⎣dist²

gj(x^∗) −λ^∗_j c^∗,Kj

−

λ^∗_j c^∗

2⎤

⎦ ≥ 0, (7)

where we have used the feasibility of x^∗as shown above.

On the other hand, we have dist

g_j(x^∗) −λ^∗_j c^∗,Kj

≤

g_j(x^∗) −λ^∗_j

c^∗ − gj(x^∗) =

λ^∗_j c^∗ ,

where the inequality is due to the fact that gj(x^∗) ∈Kjas shown above. This together with (7) ensures that

dist

g_j(x^∗) −λ^∗_j c^∗,Kj

=

λ^∗_j c^∗

. (8)

Combining (5) and (8) yields −λ^∗_j ∈ N_Kj(gj(x^∗)) for all j = 1, . . . , J, i.e., −λ^∗ ∈ N_K(g(x^∗)) by [13, Proposition 6.41]. This establishes part (a). Furthermore, it implies

dist

gj(x^∗) −λ^∗_j c ,Kj

=

λ^∗_j c

, ∀c> 0, (9)

because −λ^∗_j/c ∈ NKj(gj(x^∗)) for all c > 0 (since NKj(gj(x^∗)) is a cone). Hence Lc(x^∗, λ^∗, μ^∗) = f (x^∗) for all c > 0. This establishes part (b).

Now, we turn the attention to part (c). Suppose x∈B(x^∗, δ) is any feasible point of NSOCP (1). Then, from (3), we know

f(x) ≥Lc^∗(x, λ^∗, μ^∗) ≥Lc^∗(x^∗, λ^∗, μ^∗) = f (x^∗),

where the first inequality comes from the fact that x is feasible. This means x^∗is a local

optimal solution to NSOCP (1). The proof is complete.

(6)

For NSOCP (1), we say that Robinson’s constraint qualification holds at x^∗if∇hi(x^∗) for i= 1, . . . , l are linearly independent and there exists d ∈ IRⁿsuch that

∇h(x^∗)d = 0 and g(x^∗) + ∇g(x^∗)d ∈ int(K).

It is known that if x^∗is a local solution to NSOCP (1) and Robinson’s constraint qualification holds at x^∗, then there exists(λ^∗, μ^∗) ∈ IR^p× IR^l such that the following Karush-Kuhn- Tucker (KKT) conditions

∇xL(x^∗, λ^∗, μ^∗) = 0, h(x^∗) = 0, −λ^∗∈ NK(g(x^∗)), (10) or equivalently,

∇xL(x^∗, λ^∗, μ^∗) = 0, h(x^∗) = 0, λ^∗∈K, g(x^∗) ∈K, (λ^∗)^Tg(x^∗) = 0, where L(x, λ, μ) is the standard Lagrangian function of NSOCP (1), i.e.,

L(x, λ, μ) := f (x) + μ, h(x) − λ, g(x) . (11) For convenience of subsequent analysis, we denote by (x^∗) all Lagrangian multipliers (λ^∗, μ^∗) satisfying (10).

It is well-known that the second order sufficient conditions are utilized to ensure the existence of local saddle points. In the nonlinear programming, it requires the positive definiteness of∇_{x x}² L over the critical cone. However, due to the non-polyhedric of second-order cone, an additional widely known sigma-term (orσ -term), which stands for the curvature of second-order cone, is required. In particular, it was noted in [4, page 177] that theσ -term vanishes when the cone is polyhedral. Due to the important role played byσ -term in the analysis of second-order cone, before developing the sufficient conditions for the existence of local saddle points, we shall study some basic properties ofσ -term which will be used in the subsequence analysis. First, based on the arguments given in [1, Theorem 29] we obtain the following result.

Theorem 2.2 Let x ∈ Q and d ∈ TQ(x). Then, the support function of the outer second order tangent set T_Q²(x, d) is

σ

y| T_Q²(x, d)

=

⎧⎪

⎪⎪

⎨

⎪⎪

⎪⎩

−_x^y¹₁ d^T 1 0

0 −Im

d, for y ∈ NQ(x) ∩ {d}^⊥, x ∈ ∂ Q\{0}, 0, for y∈ NQ(x) ∩ {d}^⊥, x /∈ ∂ Q\{0},

+∞, for y /∈ NQ(x) ∩ {d}^⊥.

Proof We know from [4, Proposition 3.34] that

T_Q²(x, d) + TTQ(x)(d) ⊂ T_Q²(x, d) ⊂ TTQ(x)(d).

This implies σ

y| T_Q²(x, d) + σ

y| TTQ(x)(d)

= σ

y| T_Q²(x, d) + TTQ(x)(d)

≤ σ

y| T_Q²(x, d)

≤ σ

y| TTQ(x)(d) . (12)

(7)

Note that σ

y| TTQ(x)(d)

< +∞ ⇐⇒ σ

y| TTQ(x)(d)

= 0 (13)

⇐⇒ y ∈ NTQ(x)(d) (14)

⇐⇒ y ∈ T_Q(x)_◦

= NQ(x), y^Td= 0 (15) where the first and third equivalences come from the fact that TT_Q(x)(d) and TQ(x) are cones, respectively. Thus, we only need to establish the exact formula ofσ

y| T_Q²(x, d)

, provided that (15) holds. In addition, it also indicates from (12) thatσ

y| T_Q²(x, d)

= ∞ whenever y /∈ NQ(x) ∩ {d}^⊥, since T_Q²(x, d) is nonempty for x ∈ Q and d ∈ TQ(x) by [1, Lemma 27].

In fact, under condition (15), it follows from (12) and (13) that σ

y| T_Q²(x, d)

≤ σ

y| TTQ(x)(d)

= 0. (16)

Furthermore, in light of condition (15), we discuss the following four cases.

(i) If x = 0, then 0 ∈ T_Q²(x, d) = TQ(d) where the equality is due to [1, Lemma 27].

Thus,

σ

y| T_Q²(x, d)

= σ

y| TQ(d)

≥ 0.

This together with (16) impliesσ

y| T_Q²(x, d)

= 0.

(ii) If x∈ int(Q), then it follows from (15) that y= 0. Hence, σ

y| T_Q²(x, d)

= 0.

(iii) If x ∈ ∂(Q)\{0} and d ∈ int(TQ(x)), then it follows from (14) that y = 0 since d∈ int(TQ(x)). Hence σ

y| T_Q²(x, d)

= 0 = −(y1/x1)(d₁²− d2²).

(iv) If x∈ ∂(Q)\{0} and d ∈ ∂(TQ(x)), then the desired result can be obtained by following the arguments given in [1, p. 222]. We provide the proof for the sake of completeness.

Note thatσ

y|T_Q²(x, d)

is to maximize y₁w1+ y₂^Tw2over allw satisfying −w1x₁+ w^T₂x2≤ d₁²− d2²(see [1, Lemma 27]). From y∈ NQ(x), i.e., −y ∈ Q, x ∈ Q, and x^Ty = 0, we know −y1 = αx1 and−y2 = −αx2 withα = −^y_x¹₁ ≥ 0, see [1, page 208]. Thus,

y, w = y1w1+y2^Tw2= α

w2^Tx₂−w1x₁

≤ α

d₁²−d2²

= −y₁ x₁

d₁²−d2² .

The maximum can be obtained at(w1, w2) = (−^d_x¹²

1, −^d_x²²

2²x2). This establishes the

desired expression.

Remark 2.1 Let A be a convex subset in IR^m⁺¹. In the proof of Theorem2.2, we use the inclusion T_A²(x, d) ⊂ TTA(x)(d). It is known from [4, page 168] that these two sets are the same if A is polyhedral. But, for the non-polyhedral cone Q, the following example shows this inclusion maybe strict.

Example 2.1 For Q⊂ IR³, let ¯x = (1, 1, 0) and ¯d = (1, 1, 1). Then,

TQ( ¯x) = {d = (d1, d2, d3) ∈ IR³| (d2, d3)^T( ¯x2, ¯x3) − d1¯x1 ≤ 0}

= {d = (d1, d2, d3) | d2− d1≤ 0},

(8)

which implies ¯d ∈ ∂TQ( ¯x). Hence,

T_Q²( ¯x, ¯d) = {w = (w1, w2, w3) | (w2, w3)^T( ¯x2, ¯x3) − w1¯x1≤ ¯d₁²− ( ¯d2, ¯d3)²}

= {w = (w1, w2, w3) | w2− w1≤ −1}.

On the other hand, since TTQ( ¯x)( ¯d) = cl(RTQ( ¯x)( ¯d)), whereRTQ( ¯x)( ¯d) denotes the radical (or feasible) cone of TQ( ¯x) at ( ¯d), then for each w ∈ TTQ( ¯x)( ¯d), there exists w∈RTQ( ¯x)( ¯d) → w such that ¯d+ tw∈ TQ( ¯x) for some t > 0, i.e.,

( ¯d2, ¯d3) + t(w₂, w₃)T

( ¯x2, ¯x3) − ( ¯d1+ tw₁) ¯x1 ≤ 0,

which ensures that(w₂, w₃)^T( ¯x2, ¯x3) − w₁¯x1 ≤ 0. Now, taking limit yields w2− w1 ≤ 0.

Thus, we obtain

T_T_Q_{( ¯x)}( ¯d) = {w = (w1, w2, w3) | w2− w1≤ 0}

which says T_Q²( ¯x, ¯d)T_T_Q_{( ¯x)}( ¯d). In fact, 0 ∈ TTQ( ¯x)( ¯d), but 0 /∈ T_Q²( ¯x, ¯d).

Corollary 2.1 For x∈ Q and y ∈ NQ(x), we define

(x, y) := TQ(x) ∩ {y}^⊥= {d | d ∈ TQ(x) and y^Td= 0}.

Then,σ

y| T_Q²(x, d)

is nonpositive and continuous with respect to d over (x, y).

Proof We first show thatσ (y | T_Q²(x, d)) is nonpositive for d ∈ (x, y). In fact, we know from Theorem2.2thatσ

y| T_Q²(x, d)

= 0 when x = 0, or x ∈ int(Q), or x ∈ ∂(Q)\{0}

and d∈ int(TQ(x)). If x ∈ ∂(Q)\{0} and d ∈ ∂(TQ(x)), then we have x1d₁ = x₂^Td₂by the formula of T_Q(x), see [1, Lemma 25]. Hence x₁|d1| = |x₂^Td₂| ≤ x2d2 which implies

|d1| ≤ d2 because x1 = x2 > 0. Note that −y1 is nonnegative since−y ∈ Q. Then, applying Theorem2.2yieldsσ

y| T_Q²(x, d)

= −(y1/x1)(d₁²− d2²) ≤ 0. Thus, in any case, we have verified the nonpositivity ofσ

y| T_Q²(x, d)

over (x, y).

Next, we now show the continuity ofσ

y| T_Q²(x, d)

with respect to d over (x, y).

Indeed, if x = 0 or x ∈ int(Q), then σ

y| T_Q²(x, d)

= 0 for all d ∈ (x, y) which, of course, is continuous. If x ∈ ∂ Q\{0}, then σ

y| T_Q²(x, d)

= −(y1/x1)(d₁²− d2²) for d∈ (x, y) which is continuous with respect to d as well.

Remark 2.2 For a general closed convex cone, σ (y | T²(x, d)) can be a discontinuous function of d; see [4, Page 178] or [15, Page 489]. But, when is the second order cone Q, our result shows that this function is continuous.

For a convex subset A in IR^m+1, it is well known that the function dist²(x, A) is con- tinuously differentiable with∇dist²(x, A) = 2 (x − A(x)). But, there are very limited results on second order differentiability unless some additional structure is imposed on A, for example, second order regularity, see [2,3,15].

Letφ(x) := dist²(x, Q) for Q ⊂ IR^m+1. Since Q is second order regular, then according to [15],φ possesses the following nice property: for any x, d ∈ IR^m+1, there holds that

lim

d→d t↓0

φ(x + td) − φ(x) − tφ(x; d)

1

2t² =V(x, d) (17)

(9)

whereV(x, d) is the optimal value of the problem min

2d − z²− 2σ

x− Q(x) | T_Q²(Q(x), z)

s.t. z ∈

Q(x), x − Q(x)

. (18)

With these preparations, the sufficient conditions for the existence of local saddle points are given as below.

Theorem 2.3 Suppose x^∗is a feasible point of the NSOCP (1) satisfying the following:

(i) x^∗is a KKT point and(λ^∗, μ^∗) ∈ (x^∗), i.e.,

∇xL(x^∗, λ^∗, μ^∗) = 0 and − λ^∗∈ N_K(g(x^∗)).

(ii) the following second order conditions hold

∇²x xL(x^∗, y^∗)(d, d) + d^TH(x^∗, λ^∗)d > 0, ∀d ∈C(x^∗, λ^∗)\{0}, (19) where

C(x^∗, λ^∗):=

d∈ IRⁿ| ∇h(x^∗)d =0, ∇g(x^∗)d ∈ TK g(x^∗)

,

∇g(x^∗)d_T

(λ^∗)=0 , andH(x^∗, λ^∗) :=_J

j=1H^j(x^∗, λ^∗_j) with

H^j x^∗, λ^∗_j

:=

⎧⎪

⎪⎨

⎪⎪

⎩

−_(g^(λ_j_(x^∗^j⁾∗¹))1∇gj(x^∗)^T 1 0

0 −Imj

∇gj(x^∗), gj(x^∗) ∈ ∂(Kj)\{0},

0, ot herwise.

Then,(x^∗, λ^∗, μ^∗) is a local saddle point ofLcfor some c> 0.

Proof The first inequality in (3) follows from the fact thatLc(x^∗, λ^∗, μ^∗) = f (x^∗) by (5) since−λ^∗∈ N_K(g(x^∗)), and thatLc(x^∗, λ, μ) ≤ f (x^∗) for all (λ, μ) ∈ IR^p × IR^l due to x^∗being feasible.

We will prove the second inequality in (3) by contradiction, i.e., we cannot find c > 0 andδ > 0 such that f (x^∗) =Lc(x^∗, λ^∗, μ^∗) ≤Lc(x, λ^∗, μ^∗) for all x ∈B(x^∗, δ). In other words, there exists a sequence c_n → ∞ as n → ∞, and each fixed cn, we always find a sequence{x_kⁿ} (noting that its sequence is dependent on cn) such that x_kⁿ → x^∗as k → ∞ and

f(x^∗) >Lcn(xⁿ_k, λ^∗, μ^∗). (20) To proceed, we denote t_kⁿ := x_kⁿ−x^∗ and d_kⁿ:= (x_kⁿ−x^∗)/x_kⁿ−x^∗. Assume, without loss of generality, that d_kⁿ→ ˜dⁿas k→ ∞. First, we observe that

φ

gj(xⁿ_k) −λ^∗_j cn

= φ

g_j(x^∗) −λ^∗_j

c_n + t_kⁿ∇gj(x^∗)d_kⁿ+1

2(t_kⁿ)²∇²g_j(x^∗)(d_kⁿ, d_kⁿ) + o (t_kⁿ)²

= φ

gj(x^∗) −λ^∗_j cn + t_kⁿ

∇gj(x^∗)d_kⁿ+1

2t_kⁿ∇²gj(x^∗)(d_kⁿ, d_kⁿ)

+ o (t_kⁿ)²

(10)

= φ

g_j(x^∗) −λ^∗_j c_n

+ t_kⁿφ

g_j(x^∗) −λ^∗_j c_n

∇gj(x^∗)d_kⁿ+1

2t_kⁿ∇²g_j(x^∗)(d_kⁿ, d_kⁿ)

+1 2(t_kⁿ)²V

gj(x^∗) −λ^∗_j

cn, ∇gj(x^∗) ˜dⁿ

+ o

(t_kⁿ)²

(21) where the second equality follows from the fact of φ being Lipschitz continuous (in fact, φ is continuously differentiable) and the last step is due to (17). From (18), V

gj(x^∗) − λ^∗_j/cn, ∇gj(x^∗) ˜dⁿ

is the optimal value of the following problem min 2∇gj(x^∗) ˜dⁿ− z²− 2σ

−^λ_c_n^∗^j T_K²_j(gj(x^∗), z) !

s.t. z ∈ (gj(x^∗), −λ^∗_j)

(22)

where we have used the fact that

gj(x^∗), −λ^∗_j/cn

= (gj(x^∗), −λ^∗_j) by definition since

cn = 0, and _K_j

gi(x^∗) − (λ^∗_j/cn)

= gi(x^∗) because −λ^∗_j ∈ N_K_j(gj(x^∗)) by (5).

Note that the optimal value of the above problem (22) is finite sinceσ is nonpositive by Corollary2.1, and that the objective function is strongly convex (because · ² is strongly convex and−σ is convex [4, Proposition 3.48]). Hence, the optimal solution of the problem (22) exists and is unique, say zⁿ_j, i.e.,

V

gj(x^∗) −λ^∗_j

cn, ∇gj(x^∗) ˜dⁿ

= 2∇gj(x^∗) ˜dⁿ− zⁿ_j²− 2σ

−λ^∗_j cn

T_K²_j(gj(x^∗), zⁿ_j)

, (23) where zⁿ_j ∈ (gj(x^∗), −λ^∗_j). Then, combining (21) and (23) yields

dist²

gj(x_kⁿ) −λ^∗_j cn,Kj

−

λ^∗_j cn

2

= −2t_kⁿ

λ^∗_j

c_n, ∇gj(x^∗)d_kⁿ+1

2t_kⁿ∇²g_j(x^∗)(d_kⁿ, d_kⁿ)

+ (t_kⁿ)²

∇gj(x^∗) ˜dⁿ− zⁿ_j²− σ

−λ^∗_j cn

T_K²_j(gj(x^∗), zⁿ_j)

+ o((t_kⁿ)²), (24)

where we use the fact that dist

g_j(x^∗) − (λ^∗_j/cn),Kj

= λ^∗_j/cn and

φ

g_j(x^∗) −λ^∗_j c_n

= 2

g_j(x^∗) −λ^∗_j c_n − _K_j

g_j(x^∗) −λ^∗_j c_n

= −2λ^∗_j c_n. Since f(x^∗) >Lc_n(x_kⁿ, λ^∗, μ^∗) by (20), applying the Taylor expansion, we obtain from (24) that

0> f (x_kⁿ) − f (x^∗) + μ^∗, h(x_kⁿ) +c_n

2h(x_kⁿ)² +cn

2

J j=1

⎡

⎣dist²

gj(x_kⁿ) −λ^∗_j cn,Kj

−

λ^∗_j ck

2⎤

⎦

(11)

= t_kⁿ∇ f (x^∗)^Td_kⁿ+1

2(t_kⁿ)²(d_kⁿ)^T∇²f(x^∗)d_kⁿ+ o((t_kⁿ)²) +

"

μ^∗, t_kⁿ∇h(x^∗)d_kⁿ +1

2(t_kⁿ)²∇h(x^∗)(d_kⁿ, d_kⁿ) + o((t_kⁿ)²)

# +cn

2t_kⁿ∇h(x^∗)d_kⁿ+ o(t_kⁿ)² +c_n

2

J j=1

− 2t_kⁿ

λ^∗_j

c_n, ∇gj(x^∗)d_kⁿ+1

2t_kⁿ∇²gj(x^∗)(d_kⁿ, d_kⁿ)

+ (t_kⁿ)²

∇gj(x^∗) ˜dⁿ− zⁿ_j²− σ

−λ^∗_j c_n

T_K²_j(gj(x^∗), zⁿ_j)

+ o((t_kⁿ)²)

.

Dividing by(t_kⁿ)²/2 on both sides and taking limits as k → ∞ give

0≥ ∇_{x x}² L(x^∗, λ^∗, μ^∗)( ˜dⁿ, ˜dⁿ) + cn∇h(x^∗) ˜dⁿ² (25) + cn

J j=1

∇gj(x^∗) ˜dⁿ− zⁿ_j²− σ

−λ^∗_j c_n T_K²

j(gj(x^∗), zⁿ_j)

where we use the fact that∇xL(x^∗, λ^∗, μ^∗) = 0, the first equality in KKT conditions (10).

Since−λ^∗_j ∈ NKj(gj(x^∗)) from (10) and zⁿ_j ∈ (gj(x^∗), −λ^∗_j), applying Corollary2.1 yields

σ

−λ^∗_j cn

T_K²_j(gj(x^∗), zⁿ_j)

= 1 cnσ

−λ^∗_jT_K²

j(gj(x^∗), zⁿ_j)

≤ 0

where the equality is due to the positive homogeneity of the support function, see [11]. Thus, it follows from (25) that

0≥ ∇_{x x}² L(x^∗, λ^∗, μ^∗)( ˜dⁿ, ˜dⁿ) + cn∇h(x^∗) ˜dⁿ²+ cn

J j=1

∇gj(x^∗) ˜dⁿ− zⁿ_j².

Due to ˜dⁿ = 1 for all n, we may assume, taking a subsequence if necessary, that ˜dⁿ→ ˜d.

Because cncan be made sufficiently large as n→ ∞, we obtain from the above inequality that

∇h(x^∗) ˜dⁿ→ 0 and ∇gj(x^∗) ˜dⁿ− zⁿ_j → 0. Therefore, ∇h(x^∗) ˜d = limn→∞∇h(x^∗) ˜dⁿ= 0 and

dist

∇gj(x^∗) ˜d, (gj(x^∗), −λ^∗_j)

= lim

n→∞dist

∇gj(x^∗) ˜dⁿ, (gj(x^∗), −λ^∗_j)

≤ lim_n→∞∇gj(x^∗) ˜dⁿ− zⁿ_j = 0

which implies∇gj(x^∗) ˜d ∈ (gj(x^∗), −λ^∗_j) for all j = 1, 2, . . . , J. Thus, we have ˜d ∈ C(x^∗, λ^∗). In addition, it follows from (25) again that

0≥ ∇x x² L(x^∗, λ^∗, μ^∗)( ˜dⁿ, ˜dⁿ) − cn

J j=1

σ

−λ^∗_j c_n

T_K²_j

g_j(x^∗), zⁿ_j

= ∇x x² L(x^∗, λ^∗, μ^∗)( ˜dⁿ, ˜dⁿ) −

J j=1

σ

−λ^∗_j T_K²_j

g_j(x^∗), zⁿ_j

.