3 On global saddle points

(1)

to appear in Journal of Global Optimization, 2015

On the existence of saddle points for nonlinear second-order cone programming problems

Jinchuan Zhou ¹ Department of Mathematics

School of Science

Shandong University of Technology Zibo 255049, P.R. China E-mail: jinchuanzhou@163.com

Jein-Shan Chen ² Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan E-mail: jschen@math.ntnu.edu.tw

October 8, 2013

(1st revision on May 17, 2014) (2nd revision on August 31, 2014)

Abstract. In this paper, we study the existence of local and global saddle points for nonlinear second-order cone programming problems. The existence of local saddle points is developed by using the second-order sufficient conditions, in which a sigma-term is added to reflect the curvature of second-order cone. Furthermore, by dealing with the perturbation of the primal problem, we establish the existence of global saddle points, which can be applicable for the case of multiple optimal solutions. The close relationship between global saddle points and exact penalty representations are discussed as well.

Keywords. Local and global saddle points, second-order sufficient conditions, augmented Lagrangian, exact penalty representations.

AMS subject classifications. 90C26, 90C46.

1The author’s work is supported by National Natural Science Foundation of China (11101248, 11171247, 11271233), Shandong Province Natural Science Foundation (ZR2010AQ026, ZR2012AM016), and Young Teacher Support Program of Shandong University of Technology.

2Corresponding author. Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is supported by Ministry of Science and Technology, Taiwan.

(2)

1 Introduction

Recall that the second-order cone (SOC), also called the Lorentz cone or ice-cream cone, in IR^m+1 is defined as

K_m+1 = {(x₁, x₂) ∈ IR × IR^m| kx₂k ≤ x₁},

where k·k denotes the Euclidean norm. The order relation induced by this pointed closed convex cone Km+1 is given by

x Km+1 0 ⇐⇒ x ∈ IR^m+1, x1 ≥ kx2k.

In this paper, we consider the following nonlinear second-order cone programming (NSOCP) min f (x)

s.t. g_j(x) K_{mj +1} 0, j = 1, 2, . . . , J, (1) h(x) = 0,

where f : IRⁿ→ IR, h : IRⁿ→ IR^l, g_j : IRⁿ → IR^m^j⁺¹ are twice continuously differentiable functions, and K_m_j₊₁ is the second-order cone in IR^m^j⁺¹ for j = 1, 2, · · · , J .

For a given nonlinear programming problem, we can define another programming problem associated with it by using traditional Lagrangian functions. The original problem is called the primal problem, and the latter one is called the dual problem. Since the weak duality property always holds, our concern is on how to obtain the strong duality property (or zero duality gap property). In other words, we want to know when the primal and dual problems have the same optimal values, which provides the theoretical foundation for many primal-dual type methods. However, if we employ the traditional Lagrangian functions, then some convexity is necessary for achieving strong duality property. To overcome this drawback, we need to resort to the augmented Lagrangian functions, whose main advantage is ensuring the strong duality property without requiring convexity. In addition, the zero duality gap property coincides with the existence of global saddle points, provided that the optimal solution sets of the primal and dual problems are nonempty, respectively. Many researchers have studied the properties of augmented Lagrangian and the existence of saddle points. For example, Rockafellar and Wets [13]

proposed a class of augmented Lagrangian where augmented function is required to be convex functions. This was extended by Huang and Yang [6] where convexity condition is replaced by level-boundedness, and it was further generalized by Zhou and Yang [21]

where level-boundedness condition is replaced by so-called valley-at-zero property; see also [14] for more details. These important works give an unified frame for the augmented Lagrangian function and its duality theory. Meanwhile, Floudas and Jongen [5]

pointed out the crucial role of saddle points for the minimization of smooth functions with a finite number of stationary points. The necessary and/or sufficient conditions

(3)

to ensure the existence of local and/or global saddle points were investigated by many researchers. For example, the existence of local and global saddle points of Rockafellar’s augmented Lagrangian function was studied in [12]. Local saddle points of the generalized Mangasarian’s augmented Lagrangian were analyzed in [19]. The existences of local and global saddle points of p-th power nonlinear Lagrangian were discussed in [7, 8, 18].

For more references, please see [9, 10, 14, 16, 17, 20, 22].

All the results mentioned above are focused on either the standard nonlinear programming or the generalized minimizing problems [13]. The main purpose of this paper is to establish the existences of local and global saddle points of NSOCP (1) by sufficiently exploiting the special structure of SOC. As shown in nonlinear programming, the positive definiteness of ∇²_xxL over the critical cone is a sufficient condition for the existence of local saddle points. However, this classical result cannot be extended trivially to NSOCP (1) and the analysis is more complicated because IRⁿ₊ is polyhedral, whereas K_m+1 is non-polyhedral. Hence, we particulary study the sigma-term [4], which in some extend stands for the curvature of second-order cone. Our result shows that the local saddle point exists provided that the sum of ∇²_xxL and H is positive definite even if ∇²_xxL is indefinite (see Theorem 2.3). This undoubtedly clarifies the essential role played by the sigma-term. Moreover, by developing the perturbation of the primal problem, we establish the existence of global saddle points without restricting the optimal solution being unique, as required in [12, 16]. Furthermore, we study another important concept, exact penalty representation, and develop its new necessary and sufficient conditions.

The close relationship between global saddle points and exact penalty representations is established as well.

To end this section, we introduce some basic concepts which will be needed for our subsequent analysis. Let IRⁿ be n-dimensional real vector space. For x, y ∈ IRⁿ, the inner product is denoted by x^Ty or hx, yi. Given a convex subset A ⊆ IRⁿ and a point x ∈ A, the normal cone of A at x, denoted by N_A(x), is defined as

N_A(x) := {v ∈ IRⁿ| hv, z − xi ≤ 0, ∀z ∈ A}, and the tangent cone, denoted by T_A(x), is defined as

T_A(x) := N_A(x)^◦,

where NA(x)^◦ means the polar cone of NA(x). Given d ∈ TA(x), the outer second order tangent set is defined as

T_A²(x, d) =n

w ∈ IRⁿ

∃ t_n↓ 0 such that dist x + t_nd + 1

2t²_nw, A = o(t²_n)o . The support function of A is

σ(x | A) := sup{hx, zi | z ∈ A}.

(4)

We also write cl(A), int(A), and ∂(A) to stand for the closure, interior, and boundary of A, respectively. For the simplicity of notations, let us write K_j to stand for K_m_j₊₁ and K be the Cartesian product of these second-order cones, i.e., K := K₁× K₂× · · · K_J. In addition, we denote g(x) := (g₁(x), g₂(x), · · · , g_J(x)), p :=

J

X

j=1

(m_j+1), and S^∗ means the solution set of NSOCP (1). According to [13, Exercise 11.57], the augmented Lagrangian function for NSOCP (1) is written as

L_c(x, λ, µ, c)

:= f (x) + hµ, h(x)i + c

2kh(x)k²+ c 2

J

X

j=1

"

dist²

g_j(x) −λ_j c , K_j

−

λ_j c

2# . (2) Here c ∈ IR₊₊ := {ζ ∈ IR | ζ > 0} and (x, λ, µ) ∈ IRⁿ×IR^p×IR^lwith λ = (λ₁, λ₂, · · · , λ_J) ∈ IR^m¹⁺¹× IR^m²⁺¹× · · · × IR^m^J⁺¹.

Definition 1.1. Let L_c be given as in (2) and (x^∗, λ^∗, µ^∗) ∈ IRⁿ× IR^p× IR^l.

(a) The triple (x^∗, λ^∗, µ^∗) is said to be a local saddle point of L_c for some c > 0 if there exists δ > 0 such that

L_c(x^∗, λ, µ) ≤ L_c(x^∗, λ^∗, µ^∗) ≤ L_c(x, λ^∗, µ^∗), ∀x ∈ B(x^∗, δ), (λ, µ) ∈ IR^p× IR^l, (3) where B(x^∗, δ) denotes the δ-neighborhood of x^∗, i.e., B(x^∗, δ) := {x ∈ IRⁿ| kx − x^∗k ≤ δ}.

(b) The triple (x^∗, λ^∗, µ^∗) is said to be a global saddle point of L_c for some c > 0 if L_c(x^∗, λ, µ) ≤ L_c(x^∗, λ^∗, µ^∗) ≤ L_c(x, λ^∗, µ^∗), ∀x ∈ IRⁿ, (λ, µ) ∈ IR^p× IR^l. (4)

2 On local saddle points

In this section, we focus on the necessary and sufficient conditions for the existence of local saddle points. For simplicity, we let Q stand for a second-order cone without emphasizing its dimension, while using the notation Q ⊂ IR^m+1 to indicate that Q is regarded as a second-order cone in IR^m+1. In other words, the result holding for Q is also applicable to K_i for i = 1, . . . , J in the subsequent analysis. According to [13, Example 6.16] we know for a ∈ Q,

−b ∈ N_Q(a) ⇐⇒ Π_Q(a − b) = a

⇐⇒ dist(a − b, Q) = kbk (5)

⇐⇒ a ∈ Q, b ∈ Q, a^Tb = 0,

where the last equivalence comes from the fact that Q is a self-dual cone, i.e., (Q)^◦ = −Q.

(5)

Lemma 2.1. Let L_c be given as in (2). Then, the augmented Lagrangian function L_c(x, λ, µ) is nondecreasing with respect to c > 0.

Proof. See [13, Exercise 11.56]. 2

We now discuss the necessary conditions for local saddle points.

Theorem 2.1. Suppose (x^∗, λ^∗, µ^∗) is a local saddle point of L_c^∗. Then, (a) −λ^∗ ∈ NK(g(x^∗));

(b) L_c(x^∗, λ^∗, µ^∗) = f (x^∗) for all c > 0;

(c) x^∗ is a local optimal solution to NSOCP (1).

Proof. We first show that x^∗ is a feasible point of NSOCP (1), for which we need to verify two things: (i) h(x^∗) = 0, (ii) g_j(x^∗) Kj 0 for all j = 1, 2, . . . , J .

(i) Suppose h(x^∗) 6= 0. Taking µ = γh(x^∗) with γ → ∞, and applying the first inequality in (3) yields L_c^∗(x^∗, λ^∗, µ^∗) = ∞ which is a contradiction. Thus, h(x^∗) = 0.

(ii) Suppose g_j(x^∗) /∈ K_j for some j = 1, · · · , J . Then, there exist ˜λ_j ∈ K_j such that η := h˜λ_j, g_j(x^∗)i < 0. Therefore, for β ∈ IR

dist² g_j(x^∗) −β ˜λj

c^∗ , K_j

!

−

β ˜λj

c^∗

2

=

g_j(x^∗) − β ˜λ_j

c^∗ − Π_K_j g_j(x^∗) − β ˜λ_j c^∗

!

2

−

β ˜λ_j c^∗

2

=

g_j(x^∗) − ΠK_j g_j(x^∗) − β ˜λ_j c^∗

!

2

− 2

*β ˜λ_j

c^∗ , g_j(x^∗) − ΠK_j g_j(x^∗) − β ˜λ_j c^∗

!+

≥ dist²(g_j(x^∗), K_j) − 2β

* ˜λ_j

c^∗, g_j(x^∗) +

= dist²(g_j(x^∗), K_j) − 2β η

c^∗. (6)

Here the inequality comes from the facts that

g_j(x^∗) − ΠK_j g_j(x^∗) − β ˜λ_j c^∗

!

≥ kg_j(x^∗) − ΠK_j(g_j(x^∗))k = dist(g_j(x^∗), K_j)

and

D˜λ_j, ΠKj

g_j(x^∗) − (β ˜λ_j/c^∗)E

≥ 0

(6)

because ˜λ_j ∈ K_j and ΠK_j

g_j(x^∗) − (β ˜λ_j/c^∗)

∈ K_j. Taking β → ∞, it follows from (3) and (6) that L_c^∗(x^∗, λ^∗, µ^∗) is unbounded above which is a contradiction.

Plugging λ = 0 in the first inequality of (3) (i.e., L_c^∗(x^∗, 0, µ^∗) ≤ L_c^∗(x^∗, λ^∗, µ^∗)), we obtain

J

X

j=1

"

dist²

gj(x^∗) − λ^∗_j c^∗, Kj

−

λ^∗_j c^∗

2#

≥ 0, (7)

where we have used the feasibility of x^∗ as shown above.

On the other hand, we have dist

g_j(x^∗) −λ^∗_j c^∗, K_j

≤

g_j(x^∗) − λ^∗_j

c^∗ − g_j(x^∗)

=

λ^∗_j c^∗

,

where the inequality is due to the fact that gj(x^∗) ∈ Kj as shown above. This together with (7) ensures that

dist

g_j(x^∗) −λ^∗_j c^∗, K_j

=

λ^∗_j c^∗

. (8)

Combining (5) and (8) yields −λ^∗_j ∈ NKj(gj(x^∗)) for all j = 1, · · · , J , i.e., −λ^∗ ∈ N_K(g(x^∗)) by [13, Proposition 6.41]. This establishes part (a). Furthermore, it implies

dist

g_j(x^∗) −λ^∗_j c , K_j

=

λ^∗_j c

, ∀c > 0, (9)

because −λ^∗_j/c ∈ NK_j(g_j(x^∗)) for all c > 0 (since NK_j(g_j(x^∗)) is a cone). Hence L_c(x^∗, λ^∗, µ^∗) = f (x^∗) for all c > 0. This establishes part (b).

Now, we turn the attention to part (c). Suppose x ∈ B(x^∗, δ) is any feasible point of NSOCP (1). Then, from (3), we know

f (x) ≥ L_c^∗(x, λ^∗, µ^∗) ≥ L_c^∗(x^∗, λ^∗, µ^∗) = f (x^∗),

where the first inequality comes from the fact that x is feasible. This means x^∗ is a local optimal solution to NSOCP (1). The proof is complete. 2

For NSOCP (1), we say that Robinson’s constraint qualification holds at x^∗ if ∇h_i(x^∗) for i = 1, . . . , l are linearly independent and there exists d ∈ IRⁿ such that

∇h(x^∗)d = 0 and g(x^∗) + ∇g(x^∗)d ∈ int(K).

It is known that if x^∗ is a local solution to NSOCP (1) and Robinson’s constraint qualification holds at x^∗, then there exists (λ^∗, µ^∗) ∈ IR^p× IR^l such that the following Karush- Kuhn-Tucker (KKT) conditions

∇_xL(x^∗, λ^∗, µ^∗) = 0, h(x^∗) = 0, −λ^∗ ∈ N_K(g(x^∗)), (10)

(7)

or equivalently,

∇_xL(x^∗, λ^∗, µ^∗) = 0, h(x^∗) = 0, λ^∗ ∈ K, g(x^∗) ∈ K, (λ^∗)^Tg(x^∗) = 0, where L(x, λ, µ) is the standard Lagrangian function of NSOCP (1), i.e.,

L(x, λ, µ) := f (x) + hµ, h(x)i − hλ, g(x)i . (11) For convenience of subsequent analysis, we denote by Λ(x^∗) all Lagrangian multipliers (λ^∗, µ^∗) satisfying (10).

It is well-known that the second order sufficient conditions are utilized to ensure the existence of local saddle points. In the nonlinear programming, it requires the positive definiteness of ∇²_xxL over the critical cone. However, due to the non-polyhedric of second- order cone, an additional widely known sigma-term (or σ-term), which stands for the curvature of second-order cone, is required. In particular, it was noted in [4, page 177]

that the σ-term vanishes when the cone is polyhedral. Due to the important role played by σ-term in the analysis of second-order cone, before developing the sufficient conditions for the existence of local saddle points, we shall study some basic properties of σ-term which will be used in the subsequence analysis. First, based on the arguments given in [1, Theorem 29] we obtain the following result.

Theorem 2.2. Let x ∈ Q and d ∈ T_Q(x). Then, the support function of the outer second order tangent set T_Q²(x, d) is

σ y | T_Q²(x, d)

=











−y₁

x₁ d^T 1 0 0 −I_m

d, for y ∈ N_Q(x) ∩ {d}^⊥, x ∈ ∂Q\{0}, 0, for y ∈ N_Q(x) ∩ {d}^⊥, x /∈ ∂Q\{0}, +∞, for y /∈ N_Q(x) ∩ {d}^⊥.

Proof. We know from [4, Proposition 3.34] that

T_Q²(x, d) + T_T_Q_(x)(d) ⊂ T_Q²(x, d) ⊂ T_T_Q_(x)(d).

This implies

σ y | T_Q²(x, d) + σ y | T_T_Q_(x)(d)

= σ y | T_Q²(x, d) + T_T_Q_(x)(d)

≤ σ y | T_Q²(x, d) ≤ σ y | T_T_Q_(x)(d) . (12) Note that

σ y | T_T_Q_(x)(d) < +∞ ⇐⇒ σ y | T_T_Q_(x)(d) = 0 (13)

⇐⇒ y ∈ N_T_Q_(x)(d) (14)

(8)

⇐⇒ y ∈ (T_Q(x))^◦ = N_Q(x), y^Td = 0 (15) where the first and third equivalences come from the fact that T_T_Q_(x)(d) and T_Q(x) are cones, respectively. Thus, we only need to establish the exact formula of σ y | T_Q²(x, d), provided that (15) holds. In addition, it also indicates from (12) that σ y | T_Q²(x, d) = ∞ whenever y /∈ N_Q(x) ∩ {d}^⊥, since T_Q²(x, d) is nonempty for x ∈ Q and d ∈ T_Q(x) by [1, Lemma 27].

In fact, under condition (15), it follows from (12) and (13) that

σ y | T_Q²(x, d) ≤ σ y | TTQ(x)(d) = 0. (16) Furthermore, in light of condition (15), we discuss the following four cases.

(i) If x = 0, then 0 ∈ T_Q²(x, d) = T_Q(d) where the equality is due to [1, Lemma 27]. Thus, σ y | T_Q²(x, d) = σ (y | T_Q(d)) ≥ 0.

This together with (16) implies σ y | T_Q²(x, d) = 0.

(ii) If x ∈ int(Q), then it follows from (15) that y = 0. Hence, σ y | T_Q²(x, d) = 0.

(iii) If x ∈ ∂(Q)\{0} and d ∈ int(T_Q(x)), then it follows from (14) that y = 0 since d ∈ int(T_Q(x)). Hence σ y | T_Q²(x, d) = 0 = −(y₁/x₁)(d²₁− kd₂k²).

(iv) If x ∈ ∂(Q)\{0} and d ∈ ∂(T_Q(x)), then the desired result can be obtained by following the arguments given in [1, page 222]. We provide the proof for the sake of completeness. Note that σ y|T_Q²(x, d) is to maximize y₁w₁+ y₂^Tw₂ over all w satisfying

−w1x1+ w^T₂x2 ≤ d²₁− kd2k² (see [1, Lemma 27]). From y ∈ NQ(x), i.e., −y ∈ Q, x ∈ Q, and x^Ty = 0, we know −y₁ = αx₁ and −y₂ = −αx₂ with α = −_x^y¹

1 ≥ 0, see [1, page 208].

Thus,

hy, wi = y₁w₁+ y₂^Tw₂ = α w^T₂x₂− w₁x₁ ≤ α d²₁− kd₂k² = −y₁

x₁ d²₁− kd₂k² . The maximum can be obtained at (w₁, w₂) = (−_x^d²¹

1, −^kd_kx²^k²

2k²x₂). This establishes the desired expression. 2

Remark 2.1. Let A be a convex subset in IR^m+1. In the proof of Theorem 2.2, we use the inclusion T_A²(x, d) ⊂ T_T_A_(x)(d). It is known from [4, page 168] that these two sets are the same if A is polyhedral. But, for the non-polyhedral cone Q, the following example shows this inclusion maybe strict.

(9)

Example 2.1. For Q ⊂ IR³, let ¯x = (1, 1, 0) and ¯d = (1, 1, 1). Then, TQ(¯x) = {d = (d1, d2, d3) ∈ IR³| (d2, d3)^T(¯x2, ¯x3) − d1x¯1 ≤ 0}

= {d = (d₁, d₂, d₃) | d₂− d₁ ≤ 0}, which implies ¯d ∈ ∂T_Q(¯x). Hence,

T_Q²(¯x, ¯d) = {w = (w₁, w₂, w₃) | (w₂, w₃)^T(¯x₂, ¯x₃) − w₁x¯₁ ≤ ¯d²₁− k( ¯d₂, ¯d₃)k²}

= {w = (w₁, w₂, w₃) | w₂− w₁ ≤ −1}.

On the other hand, since T_T_Q_(¯_x)( ¯d) = cl(R_T_Q_(¯_x)( ¯d)), where R_T_Q_(¯_x)( ¯d) denotes the radi- cal (or feasible) cone of T_Q(¯x) at ( ¯d), then for each w ∈ T_T_Q_(¯_x)( ¯d), there exists w⁰ ∈ R_T_Q_(¯_x)( ¯d) → w such that ¯d + tw⁰ ∈ T_Q(¯x) for some t > 0, i.e.,

( ¯d₂, ¯d₃) + t(w₂⁰, w⁰₃)T

(¯x₂, ¯x₃) − ( ¯d₁+ tw⁰₁)¯x₁ ≤ 0,

which ensures that (w₂⁰, w⁰₃)^T(¯x2, ¯x3) − w₁⁰x¯1 ≤ 0. Now, taking limit yields w2 − w1 ≤ 0.

Thus, we obtain

T_T_Q_(¯_x)( ¯d) = {w = (w₁, w₂, w₃) | w₂− w₁ ≤ 0}

which says T_Q²(¯x, ¯d) ( TTQ(¯x)( ¯d). In fact, 0 ∈ T_T_Q_(¯_x)( ¯d), but 0 /∈ T_Q²(¯x, ¯d).

Corollary 2.1. For x ∈ Q and y ∈ NQ(x), we define

Θ(x, y) := T_Q(x) ∩ {y}^⊥= {d | d ∈ T_Q(x) and y^Td = 0}.

Then, σ y | T_Q²(x, d) is nonpositive and continuous with respect to d over Θ(x, y).

Proof. We first show that σ(y | T_Q²(x, d)) is nonpositive for d ∈ Θ(x, y). In fact, we know from Theorem 2.2 that σ y | T_Q²(x, d) = 0 when x = 0, or x ∈ int(Q), or x ∈ ∂(Q)\{0}

and d ∈ int(T_Q(x)). If x ∈ ∂(Q)\{0} and d ∈ ∂(T_Q(x)), then we have x₁d₁ = x^T₂d₂ by the formula of T_Q(x), see [1, Lemma 25]. Hence x₁|d₁| = |x^T₂d₂| ≤ kx₂kkd₂k which implies

|d₁| ≤ kd₂k because x₁ = kx₂k > 0. Note that −y₁ is nonnegative since −y ∈ Q. Then, applying Theorem 2.2 yields σ y | T_Q²(x, d) = −(y1/x₁)(d²₁− kd₂k²) ≤ 0. Thus, in any case, we have verified the nonpositivity of σ y | T_Q²(x, d) over Θ(x, y).

Next, we now show the continuity of σ y | T_Q²(x, d) with respect to d over Θ(x, y).

Indeed, if x = 0 or x ∈ int(Q), then σ y | T_Q²(x, d) = 0 for all d ∈ Θ(x, y) which, of course, is continuous. If x ∈ ∂Q\{0}, then σ y | T_Q²(x, d) = −(y₁/x₁)(d²₁ − kd₂k²) for d ∈ Θ(x, y) which is continuous with respect to d as well. 2

Remark 2.2. For a general closed convex cone Ω, σ(y | T_Ω²(x, d)) can be a discontinuous function of d; see [4, Page 178] or [15, Page 489]. But, when Ω is the second order cone Q, our result shows that this function is continuous.

(10)

For a convex subset A in IR^m+1, it is well known that the function dist²(x, A) is continuously differentiable with ∇dist²(x, A) = 2 (x − Π_A(x)). But, there are very limited results on second order differentiability unless some additional structure is imposed on A, for example, second order regularity, see [2, 3, 15].

Let φ(x) := dist²(x, Q) for Q ⊂ IR^m+1. Since Q is second order regular, then according to [15], φ possesses the following nice property: for any x, d ∈ IR^m+1, there holds that

lim

d0→d t↓0

φ(x + td⁰) − φ(x) − tφ⁰(x; d⁰)

1

2t² = V(x, d) (17)

where V(x, d) is the optimal value of the problem

min 2kd − zk²− 2σ x − Π_Q(x) | T_Q²(Π_Q(x), z)

s.t. z ∈ Θ (Π_Q(x), x − Π_Q(x)) . (18)

With these preparations, the sufficient conditions for the existence of local saddle points are given as below.

Theorem 2.3. Suppose x^∗ is a feasible point of the NSOCP (1) satisfying the following:

(i) x^∗ is a KKT point and (λ^∗, µ^∗) ∈ Λ(x^∗), i.e.,

∇_xL(x^∗, λ^∗, µ^∗) = 0 and − λ^∗ ∈ N_K(g(x^∗)).

(ii) the following second order conditions hold

∇²_xxL(x^∗, y^∗)(d, d) + d^TH(x^∗, λ^∗)d > 0, ∀d ∈ C(x^∗, λ^∗)\{0}, (19) where

C(x^∗, λ^∗) := n

d ∈ IRⁿ| ∇h(x^∗)d = 0, ∇g(x^∗)d ∈ TK(g(x^∗)) , (∇g(x^∗)d)^T (λ^∗) = 0o ,

and H(x^∗, λ^∗) :=

J

X

j=1

H^j(x^∗, λ^∗_j) with

H^j x^∗, λ^∗_j :=







− (λ^∗_j)₁

(g_j(x^∗))₁∇g_j(x^∗)^T 1 0 0 −I_m_j

∇g_j(x^∗), g_j(x^∗) ∈ ∂(K_j)\{0},

0, otherwise.

Then, (x^∗, λ^∗, µ^∗) is a local saddle point of L_c for some c > 0.

(11)

Proof. The first inequality in (3) follows from the fact that L_c(x^∗, λ^∗, µ^∗) = f (x^∗) by (5) since −λ^∗ ∈ NK(g(x^∗)), and that L_c(x^∗, λ, µ) ≤ f (x^∗) for all (λ, µ) ∈ IR^p× IR^l due to x^∗ being feasible.

We will prove the second inequality in (3) by contradiction, i.e., we cannot find c > 0 and δ > 0 such that f (x^∗) = L_c(x^∗, λ^∗, µ^∗) ≤ L_c(x, λ^∗, µ^∗) for all x ∈ B(x^∗, δ). In other words, there exists a sequence c_n → ∞ as n → ∞, and each fixed c_n, we always find a sequence {xⁿ_k} (noting that its sequence is dependent on cn) such that xⁿ_k → x^∗ as k → ∞ and

f (x^∗) > L_c_n(xⁿ_k, λ^∗, µ^∗). (20) To proceed, we denote tⁿ_k := kxⁿ_k − x^∗k and dⁿ_k := (xⁿ_k − x^∗)/kxⁿ_k − x^∗k. Assume, without loss of generality, that dⁿ_k → ˜dⁿ as k → ∞. First, we observe that

φ

g_j(xⁿ_k) −λ^∗_j c_n

= φ

g_j(x^∗) − λ^∗_j

c_n + tⁿ_k∇g_j(x^∗)dⁿ_k+ 1

2(tⁿ_k)²∇²g_j(x^∗)(dⁿ_k, dⁿ_k) + o (tⁿ_k)²

= φ

g_j(x^∗) − λ^∗_j cn

+ tⁿ_k

∇g_j(x^∗)dⁿ_k+1

2tⁿ_k∇²g_j(x^∗)(dⁿ_k, dⁿ_k)

+ o (tⁿ_k)²

= φ

g_j(x^∗) − λ^∗_j c_n

+ tⁿ_kφ⁰

g_j(x^∗) − λ^∗_j c_n

∇g_j(x^∗)dⁿ_k+ 1

+1 2(tⁿ_k)²V

g_j(x^∗) −λ^∗_j

c_n, ∇g_j(x^∗) ˜dⁿ

+ o (tⁿ_k)²

(21) where the second equality follows from the fact of φ being Lipschitz continuous (in fact, φ is continuously differentiable) and the last step is due to (17). From (18), V

g_j(x^∗) − λ^∗_j/c_n, ∇g_j(x^∗) ˜dⁿ

is the optimal value of the following problem

min

2k∇g_j(x^∗) ˜dⁿ− zk²− 2σ

−λ^∗_j c_n

T_K²_j(g_j(x^∗), z)

(22) s.t. z ∈ Θ(g_j(x^∗), −λ^∗_j)

where we have used the fact that Θ g_j(x^∗), −λ^∗_j/c_n = Θ(g_j(x^∗), −λ^∗_j) by definition since c_n6= 0, and Π_K_j g_i(x^∗) − (λ^∗_j/c_n) = gi(x^∗) because −λ^∗_j ∈ N_K_j(g_j(x^∗)) by (5).

Note that the optimal value of the above problem (22) is finite since σ is nonpositive by Corollary 2.1, and that the objective function is strongly convex (because k · k² is strongly convex and −σ is convex [4, Proposition 3.48]). Hence, the optimal solution of the problem (22) exists and is unique, say z_jⁿ, i.e.,

V

g_j(x^∗) −λ^∗_j

c_n, ∇g_j(x^∗) ˜dⁿ

= 2

∇g_j(x^∗) ˜dⁿ− z_jⁿ

2−2σ

−λ^∗_j c_n T_K²

j(g_j(x^∗), zⁿ_j)

, (23)

(12)

where z_jⁿ∈ Θ(g_j(x^∗), −λ^∗_j). Then, combining (21) and (23) yields

dist²

gj(xⁿ_k) − λ^∗_j c_n, Kj

−

λ^∗_j c_n

2

= −2tⁿ_k

λ^∗_j

c_n, ∇g_j(x^∗)dⁿ_k+ 1

+(tⁿ_k)²

k∇g_j(x^∗) ˜dⁿ− z_jⁿk²− σ

−λ^∗_j c_n

T_K²_j(g_j(x^∗), z_jⁿ)

+ o((tⁿ_k)²), (24) where we use the fact that dist g_j(x^∗) − (λ^∗_j/c_n), K_j = kλ^∗_j/c_nk and

φ⁰

gj(x^∗) −λ^∗_j c_n

= 2

gj(x^∗) − λ^∗_j

c_n − ΠKj

gj(x^∗) − λ^∗_j c_n

= −2λ^∗_j c_n.

Since f (x^∗) > L_c_n(xⁿ_k, λ^∗, µ^∗) by (20), applying the Taylor expansion, we obtain from (24) that

0 > f (xⁿ_k) − f (x^∗) + hµ^∗, h(xⁿ_k)i + c_n

2 kh(xⁿ_k)k² +c_n

2

J

X

j=1

"

dist²

g_j(xⁿ_k) −λ^∗_j cn

, K_j

−

λ^∗_j ck

2#

= tⁿ_k∇f (x^∗)^Tdⁿ_k +1

2(tⁿ_k)²(dⁿ_k)^T∇²f (x^∗)dⁿ_k+ o((tⁿ_k)²) +

µ^∗, tⁿ_k∇h(x^∗)dⁿ_k +1

2(tⁿ_k)²∇h(x^∗)(dⁿ_k, dⁿ_k) + o((tⁿ_k)²)

+c_n

2ktⁿ_k∇h(x^∗)dⁿ_k+ o(tⁿ_k)k² +c_n

2

J

X

j=1

− 2tⁿ_k

λ^∗_j

c_n, ∇g_j(x^∗)dⁿ_k +1

+(tⁿ_k)²

k∇gj(x^∗) ˜dⁿ− z_jⁿk²− σ

−λ^∗_j c_n

T_K²_j(gj(x^∗), z_jⁿ)

+ o((tⁿ_k)²)

. Dividing by (tⁿ_k)²/2 on both sides and taking limits as k → ∞ give

0 ≥ ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜dⁿ, ˜dⁿ) + c_nk∇h(x^∗) ˜dⁿk² (25) +c_n

J

X

j=1

k∇g_j(x^∗) ˜dⁿ− zⁿ_jk²− σ

−λ^∗_j cn

T_K²

j(g_j(x^∗), zⁿ_j)

where we use the fact that ∇_xL(x^∗, λ^∗, µ^∗) = 0, the first equality in KKT conditions (10).

Since −λ^∗_j ∈ N_K_j(g_j(x^∗)) from (10) and z_jⁿ ∈ Θ(g_j(x^∗), −λ^∗_j), applying Corollary 2.1 yields

σ

−λ^∗_j c_n

T_K²

j(g_j(x^∗), z_jⁿ)

= 1 c_nσ

−λ^∗_j T_K²

j(g_j(x^∗), z_jⁿ)

≤ 0

(13)

where the equality is due to the positive homogeneity of the support function, see [11].

Thus, it follows from (25) that

0 ≥ ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜dⁿ, ˜dⁿ) + c_nk∇h(x^∗) ˜dⁿk²+ c_n

J

X

j=1

k∇g_j(x^∗) ˜dⁿ− z_jⁿk².

Due to k ˜dⁿk = 1 for all n, we may assume, taking a subsequence if necessary, that ˜dⁿ→ ˜d.

Because c_ncan be made sufficiently large as n → ∞, we obtain from the above inequality that ∇h(x^∗) ˜dⁿ → 0 and ∇gj(x^∗) ˜dⁿ− z_jⁿ → 0. Therefore, ∇h(x^∗) ˜d = lim

n→∞∇h(x^∗) ˜dⁿ = 0 and

dist

∇g_j(x^∗) ˜d, Θ(g_j(x^∗), −λ^∗_j)

= lim

n→∞dist

∇g_j(x^∗) ˜dⁿ, Θ(g_j(x^∗), −λ^∗_j)

≤ lim

n→∞k∇g_j(x^∗) ˜dⁿ− z_jⁿk = 0

which implies ∇g_j(x^∗) ˜d ∈ Θ(g_j(x^∗), −λ^∗_j) for all j = 1, 2, · · · , J . Thus, we have ˜d ∈ C(x^∗, λ^∗). In addition, it follows from (25) again that

0 ≥ ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜dⁿ, ˜dⁿ) − c_n

J

X

j=1

σ

−λ^∗_j cn

T_K²

j g_j(x^∗), z_jⁿ

= ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜dⁿ, ˜dⁿ) −

J

X

j=1

σ

−λ^∗_j

T_K²_j gj(x^∗), z_jⁿ .

Note that σ

−λ^∗_j T_K²

j(g_j(x^∗), ∇g_j(x^∗) ˜d

= − ˜d^TH^j(x^∗, λ^∗_j) ˜d by Theorem 2.2. Taking the limits on both sides as n → ∞, using the continuity of σ by Corollary 2.1, and z_jⁿ→ ∇g_j(x^∗) ˜d (since ∇g_j(x^∗) ˜dⁿ− z_jⁿ→ 0), we obtain

0 ≥ ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜d, ˜d) −

J

X

j=1

σ

−λ^∗_j T_K²

j(g_j(x^∗), ∇g_j(x^∗) ˜d

= ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜d, ˜d) +

J

X

j=1

d˜^TH^j(x^∗, λ^∗_j) ˜d

= ∇²_xxL(x^∗, λ^∗, µ^∗)( ˜d, ˜d) + ˜d^TH(x^∗, λ^∗) ˜d

which contradicts (19) since ˜d ∈ C(x^∗, λ^∗) and ˜d 6= 0. Thus, the proof is complete. 2

For convex nonlinear programming, the saddle point has a close relation to the KKT point. Their relationship has been found in [4] by using the traditional Lagrangian functions (11). Here we further discuss their relationship for NSOCP via augmented Lagrangian functions (2).

(14)

Definition 2.1. The problem NSOCP (1) is said to be convex if the objective function f is a convex function, h is an affine mapping, and g is a convex mapping with respect to the set −K, i.e., for any x, y ∈ IRⁿ and t ∈ [0, 1], we have

g (tx + (1 − t)y) −Ktg(x) + (1 − t)g(y). (26) It is easy to see that g is convex with respect to −K if and only if g_j is convex with respect to −K_j for all j = 1, 2, · · · , J . In general, the square of a convex function may not be convex, for example, (x² − 1)² is not convex although x²− 1 is convex. Nonetheless, the square of the distance function is still convex, i.e., dist²(x, Q) is convex. In fact, dist²(x, Q) = inf{kx − yk² + δ_Q(y)| y ∈ IR^m+1} = k · k²δQ, where is the infimal convolution and δ is the indicator function [11]. This conclusion can be also obtained by noting that a differentiable function is convex if and only if its gradient is monotone, see [13]. Hence, it only need to show that ∇ dist²(x, Q) = 2(x − ΠQ(x)) is monotone, which is ensured by

∇ dist²(x, Q) − ∇ dist²(y, Q), x − y

= 2 hx − y − (Π_Q(x) − Π_Q(y)) , x − yi

≥ 2kx − yk²− 2 kΠ_Q(x) − Π_Q(y)k · kx − yk

= 2kx − yk ·h

kx − yk − kΠ_Q(x) − Π_Q(y)ki

≥ 0

where in the last step we use the fact that the metric projection is non-expansive, i.e., kΠ_Q(x) − Π_Q(y)k ≤ kx − yk. 2

The following lemma shows that the function −dist(·, Q) behaves like a monotone function.

Lemma 2.2. If x _Qy, then dist(x, Q) ≤ dist(y, Q).

Proof. Given x, y with x _Q y, i.e., x − y ∈ Q. Note that Q + Q = Q because Q is a convex cone, see [11]. Hence, we know Q + (x − y) ⊂ Q since x − y ∈ Q. Then, the desired result follows by

dist(x, Q) = inf

z∈Qkx − zk

≤ inf

z∈Q+(x−y)kx − zk======== inf^u:=z−x+y

u∈Qky − uk

= dist(y, Q).

2

The converse of Lemma 2.2 fails, which is illustrated by the following example.

(15)

Example 2.2. Consider K₂ = {(x₁, x₂) | x₁ ≥ |x₂|}. Then, for x = (1, 2) and y = (−1, −1), we have

dist(x, K₂) =√

2/2 <√

2 = dist(y, K₂).

But, we see x K2 y since x − y = (2, 3) /∈ K₂.

We next show that if the problem NSOCP (1) is convex, then the augmented La- grangian is also convex.

Theorem 2.4. If NSOCP (1) is convex, then L_c(x, λ, µ) is convex with respect to x for all (c, λ, µ) ∈ IR₊₊× IR^p× IR^l.

Proof. Since h : IRⁿ → IR^l is an affine mapping, then there exists a matrix M ∈ IR^l×n and q ∈ IR^l such that h(x) = M x + q. Thus, we know

hµ, h(x)i + (c/2)kh(x)k²

= hµ, M x + qi + (c/2)hM x + q, M x + qi

= (c/2)hx, M^TM xi + hM^Tµ + cM^Tq, xi + hµ + (c/2)q, qi

is convex due to M^TM being positive semi-definite. In view of the expression of L_c(x, λ, µ) given in (2), it remains to show the convexity of dist²

g_j(x) − λ_j c , K_j

. In fact, since g_j is convex with respect to −K_j, it follows from (26) that

g_j(tx + (1 − t)y) − λ_j

c K_j (t)

g_j(x) − λ_j c

+ (1 − t)

g_j(y) −λ_j c

. This together with Lemma 2.2 implies

dist

g_j(tx + (1 − t)y) − λ_j c , K_j

≤ dist

t

g_j(x) −λ_j c

+ (1 − t)

g_j(y) −λ_j c

, K_j

, and hence

dist²

gj(tx + (1 − t)y) − λ_j c , Kj

≤ dist²

t

gj(x) −λ_j c

+ (1 − t)

gj(y) − λ_j c

, Kj

≤ t dist²

gj(x) − λ_j c , Kj

+ (1 − t)dist²

gj(y) −λ_j c , Kj

where the last step is due to the convexity of dist²(x, K_j) as the arguments following (26).

2

For convex NSOCP (1), the following result states the relationship between global saddle points and KKT points.