Elastic-Mode Algorithms for Mathematical Programs with Equilibrium Constraints: Global Convergence and Stationarity Properties

(1)

Mathematical Programming manuscript No.

(will be inserted by the editor)

Mihai Anitescu · Paul Tseng · Stephen J. Wright

Elastic-Mode Algorithms for Mathematical Programs with Equilibrium Constraints: Global Convergence and Stationarity Properties

April 14, 2005

Abstract. The elastic-mode formulation of the problem of minimizing a nonlinear function subject to equilibrium constraints has appealing local properties in that, for a finite value of the penalty parameter, local solutions satisfying first- and second-order necessary optimality conditions for the original problem are also first- and second-order points of the elastic-mode formulation. Here we study global convergence properties of methods based on this formulation, which involve generating an (exact or inexact) first- or second-order point of the formulation, for nondecreasing values of the penalty parameter. Under certain regularity conditions on the active constraints, we establish finite or asymptotic convergence to points having a certain stationarity property (such as strong stationarity, M-stationarity, or C-stationarity). Numerical experience with these approaches is discussed. In particular, our analysis and the numerical evidence show that exact complementarity can be achieved finitely even when the elastic-mode formulation is solved inexactly.

Key words. Nonlinear programming, equilibrium constraints, complementarity constraints, elastic-mode formulation, strong stationarity, C-stationarity, M- stationarity.

AMS subject classifications 49M30, 49M37, 65K05, 90C30, 90C33

1. Introduction

We consider a mathematical program with equilibrium constraints (MPEC), defined as follows:

minxf (x) subject to g(x) ≥ 0, h(x) = 0, 0 ≤ G^Tx ⊥ H^Tx ≥ 0,

(1)

where f : IRⁿ → IR, g : IRⁿ → IR^p, and h : IRⁿ → IR^q are all twice continuously differentiable functions (at least in a neighborhood of all points generated by our methods), and G and H are n × m column submatrices of the n × n identity matrix (with no columns in common). Hence, the constraints G^Tx ≥ 0 and H^Tx ≥ 0 represent nonnegativity bound constraints on certain components of

M. Anitescu: Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, e-mail: anitescu@mcs.anl.gov

P. Tseng: Department of Mathematics, University of Washington, Seattle, WA 98195, e-mail:

tseng@math.washington.edu

S. J. Wright: Computer Sciences Department, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706, e-mail: swright@cs.wisc.edu

(2)

x, and the notation G^Tx ⊥ H^Tx signifies that (G^Tx)^T(H^Tx) = 0. This special form of the complementarity constraints does not sacrifice generality; it can always be attained by introducing artificial variables as needed. We use this form because some of our results require the nonnegativity constraints G^Tx ≥ 0 and H^Tx ≥ 0 to be satisfied exactly even when x is only an inexact solution of the subproblem in question. Such conditions are readily satisfied by most interior-point and active-set methods.

MPEC has been well studied in recent years, with many solution methods proposed; see [2, 3, 5, 13, 15–17, 19, 21, 23] and references therein. Although an MPEC can be formulated as a nonlinear program by rewriting the complementarity constraint as an equality constraint (G^Tx)^T(H^Tx) = 0 or as an inequality constraint (G^Tx)^T(H^Tx) ≤ 0, the resulting nonlinear program is highly degen- erate; that is, it does not satisfy the linear independence constraint qualification (LICQ) nor the Mangasarian-Fromovitz constraint qualification (MFCQ). Thus, in order to achieve global convergence, specialized methods have been proposed that exploit the special structure of the complementarity constraint. These methods generate a sequence of points in IRⁿ whose accumulation points satisfy, under suitable assumptions, certain necessary optimality conditions for the MPEC (1). Different types of necessary optimality conditions have been developed, the strongest and most desirable of which is strong stationarity [23]; see Definition 1 below. Under MPEC-LICQ (see Definition 2), strong stationarity is equivalent to the notion of B-stationarity [6]. Two weaker conditions, M-stationarity and C-stationarity [18, 23], will also be of interest (see Definition 3).

A regularization method of Scholtes [24] achieves M-stationarity under MPEC- LICQ and achieves strong stationarity under an additional upper-level strict complementarity (ULSC) condition. A relaxation method of Lin and Fukushima [15] and a penalty method of Hu and Ralph [10], penalizing the complementarity constraint, have similar global convergence properties. A smoothing method of Fukushima and Pang [6] achieves strong stationarity under MPEC-LICQ and an additional asymptotically weak nondegeneracy condition. All these methods are conceptual, in that they assume the generation of a sequence of points satisfying exactly certain second-order necessary optimality conditions. Only in the case of linear constraints has a practical method been developed (Fukushima and Tseng [7]). We are led to ask: Can global convergence (to C- or M- or strongly stationarity points) be achieved under weaker assumptions or for more practical methods?

In this paper, we study this question for a nonlinear programming formulation of (1) that uses an explicit penalization of the complementarity constraint, also known as the “elastic mode.” For a given penalty parameter c ≥ 0 and fixed upper bound ¯ζ ∈ [0, ∞), this formulation can be written as follows:

PF(c) : minx,ζf (x) + cζ + c(G^Tx)^T(H^Tx) subject to

g(x) ≥ −ζep, ζeq ≥ h(x) ≥ −ζeq, 0 ≤ ζ ≤ ¯ζ, G^Tx ≥ 0, H^Tx ≥ 0,

(2)

where e_lis the vector (1, 1, . . . , 1)^T with l components. A similar formulation was studied by Anitescu [1, 2], while a variant with ζ fixed at zero was investigated

(3)

by Ralph and Wright [21]. The penalty method in [10] is based on this variant.

Our analysis may also be extended to this variant, as well as to a mixed variant whereby ζ is fixed at zero for a subset of the constraints (see Section 5). For ¯ζ sufficiently large, a feasible point of (2) is easily found, and there are appealing correspondences between points x^∗that satisfy first-order optimality conditions for (1) and points (x^∗, 0) that satisfy first-order optimality conditions for (2) (see Theorem 2).

The algorithms we consider in this paper generate a sequence of (exact or inexact) first- or second-order points (x^k, ζk) of PF(ck), where {ck} is a positive nondecreasing sequence. We study the stationarity properties of the accumulation points of {x^k}. The upper bound constraint ζk ≤ ¯ζ helps to ensure the existence and boundedness of (x^k, ζk).

Our analyses draw on global convergence analyses of Scholtes [24] and An- itescu [2]; the latter studied a variant of (1) known as parametric mixed-P varia- tional inequalities. In Section 3, we study stationarity properties of termination points and accumulation points of {(x^k, ζk)}. In Subsection 3.1, (x^k, ζk) is an inexact first-order point of PF(ck), and we show that each feasible accumulation point satisfying MPEC-LICQ is C-stationary for (1). In Subsection 3.2, (x^k, ζk) is an exact second-order point of PF(ck), and we show (somewhat sur- prisingly) termination at a strongly stationary point for ck sufficiently large; otherwise accumulation points either are infeasible or fail to satisfy MPEC-LICQ.

In Subsection 3.3, (x^k, ζ_k) is an inexact second-order point of PF(c_k), and we show that each feasible accumulation point satisfying MPEC-LICQ is either M- stationary or strongly stationary (depending on boundedness of {c_k}). Moreover, if exact complementarity holds between bound constraints and their multipliers, then x^ksatisfies exactly the complementarity condition (G^Tx^k)^T(H^Tx^k) = 0 for all ck sufficiently large. In Subsection 3.4, we introduce a strengthened version of MPEC-LICQ and prove another result concerning exact satisfaction of the complementarity condition for sufficiently large ck–even when the subproblems PF(ck) are solved inexactly. In Subsection 3.5, we present a practical algorithm for generating (x^k, ζk) as an inexact second-order point of PF(ck).

Section 4 discusses a “regularized” nonlinear programming formulation of (1) [24] and presents examples to illustrate and compare the behavior of methods based on elastic-mode and regularized formulations. Section 5 presents some numerical experience, corroborating the aforementioned result of exact complementarity under finite penalty.

In what follows, we use k·k to denote the Euclidean norm k·k2. The notations O(·) and o(·) are used in the usual sense. We denote by eq a vector of length q whose entries are all 1, that is, eq = (1, 1, . . . , 1)^T.

2. Assumptions and Background

In this section, we summarize some known results concerning constraint qualifications and necessary optimality conditions for MPEC and its elastic-mode formulation. We discuss first-order stationarity conditions and constraint quali-

(4)

fications for MPEC (1) in Subsection 2.1 and first- and second-order stationarity conditions for PF(c) (2) in Subsection 2.2. Subsection 2.3 describes the corre- spondence between certain first-order points of the elastic form (2) and first-order points of the MPEC (1).

2.1. Stationarity Conditions and Constraint Qualifications for MPEC

We start by defining the following active sets at a feasible point x^∗ of MPEC (1):

Ig

def= {i ∈ {1, 2, . . . , p} | gi(x^∗) = 0}, (3a) I_G ^def= {i ∈ {1, 2, . . . , m} | G^T_ix^∗= 0}, (3b) I_H ^def= {i ∈ {1, 2, . . . , m} | H_i^Tx^∗= 0}, (3c) where Gi and Hi denote the ith column of G and H, respectively (in each case, a column from the identity matrix). Because x^∗ is feasible for (1), we have IG∪ IH= {1, 2, . . . , m}.

Using the active sets, we define our first notion of first-order stationarity for (1) as follows.

Definition 1. A feasible point x^∗ of (1) is strongly stationary if d = 0 solves the following linear program:

mind∇f (x^∗)^Td subject to

g(x^∗) + ∇g(x^∗)^Td ≥ 0, h(x^∗) + ∇h(x^∗)^Td = 0, G^T_i d = 0, i ∈ IG\IH,

H_i^Td = 0, i ∈ IH\IG, G^T_id ≥ 0, H_i^Td ≥ 0, i ∈ IG∩ IH.

(4)

Let us introduce Lagrange multipliers and define the MPEC Lagrangian as in Scholtes [24, Sec. 4]:

L(x, λ, µ, τ, ν) = f (x) − λ^Tg(x) − µ^Th(x) − τ^TG^Tx − ν^TH^Tx. (5) By combining the (necessary and sufficient) conditions for d = 0 to solve (4) with the feasibility conditions for x^∗, we see that x^∗is strongly stationary if and only if x^∗ satisfies, together with some multipliers (λ^∗, µ^∗, τ^∗, ν^∗), the following conditions:

∇xL(x^∗, λ^∗, µ^∗, τ^∗, ν^∗) = 0, (6a)

0 ≤ λ^∗ ⊥ g(x^∗) ≥ 0, (6b)

h(x^∗) = 0, (6c)

τ^∗ ⊥ G^Tx^∗≥ 0, (6d)

ν^∗ ⊥ H^Tx^∗≥ 0, (6e)

τ_i^∗≥ 0, i ∈ IG∩ IH, (6f) ν_i^∗≥ 0, i ∈ I_G∩ I_H. (6g)

(5)

Under the following constraint qualification at x^∗, the multipliers (λ^∗, µ^∗, τ^∗, ν^∗) are in fact unique.

Definition 2. The MPEC-LICQ holds at a feasible point x^∗of (1) if the following set of vectors is linearly independent:

K^def= {∇g_i(x^∗)}_i∈I_g∪ {∇h_i(x^∗)}i=1,2,...,q∪ {G_i}_i∈I_G∪ {H_i}_i∈I_H. (7) The following result, dating back to Luo, Pang, and Ralph [17] but stated here in the form of Scheel and Scholtes [23, Theorem 2], shows that, under MPEC-LICQ, strong stationarity is a set of (first-order) necessary optimality conditions for the MPEC.

Theorem 1. Suppose that x^∗ is a local minimizer of (1). If the MPEC-LICQ holds at x^∗, then x^∗is strongly stationary, and the multiplier vector (λ^∗, µ^∗, τ^∗, ν^∗) that satisfies the conditions (6) is unique.

Our analysis also uses two weaker notions of first-order stationarity for (1) that have been studied in previous works; see, for example, Outrata [18] and Scheel and Scholtes [23].

Definition 3. (a) A point x^∗ is C-stationary if there exist multipliers

(λ^∗, µ^∗, τ^∗, ν^∗) satisfying (6) except that the conditions (6f), (6g) are replaced by τ_i^∗ν_i^∗≥ 0, for each i ∈ IG∩ IH.

(b) A point x^∗ is M-stationary if it is C-stationary and if either τ_i^∗≥ 0 or ν_i^∗≥ 0 for each i ∈ IG∩ IH.

Notice that M-stationarity allows such situations as τ_i^∗< 0 and µ^∗_i = 0 for some i ∈ I_G ∩ I_H but does not allow the situation τ_i^∗ < 0 and µ^∗_i < 0, which is allowed by C-stationarity. In particular, strongly stationary ⇒ M-stationary ⇒ C-stationary.

2.2. Necessary Optimality Conditions for PF(c)

In this subsection, we discuss the exact and inexact first- and second-order necessary optimality conditions for PF(c) defined in (2). We start by defining the Lagrangian for this problem as follows:

L_c(x, ζ, λ, µ⁻, µ⁺, τ, ν) = f (x) + cζ + c(G^Tx)^TH^Tx − λ^T(g(x) + ζe_p)(8)

−(µ⁺)^T(ζeq− h(x)) − (µ⁻)^T(ζeq+ h(x)) − τ^TG^Tx − ν^TH^Tx.

The Karush-Kuhn-Tucker first-order necessary optimality conditions for this problem are as follows:

∇xLc(x, ζ, λ, µ⁻, µ⁺, τ, ν) = 0, (9a) c − e^T_pλ − e^T_qµ⁻− e^T_qµ⁺= π⁻− π⁺, (9b) 0 ≤ (π⁻, π⁺) ⊥ (ζ, ¯ζ − ζ) ≥ 0, (9c) 0 ≤ λ ⊥ g(x) + ζe_p≥ 0, (9d)

(6)

0 ≤ µ⁺⊥ ζe_q− h(x) ≥ 0, (9e) 0 ≤ µ⁻ ⊥ ζeq+ h(x) ≥ 0, (9f)

0 ≤ τ ⊥ G^Tx ≥ 0, (9g)

0 ≤ ν ⊥ H^Tx ≥ 0. (9h)

We call (x, ζ) satisfying these conditions a first-order point of PF(c). Since these conditions cannot be satisfied exactly in practice, we consider the following inexact first-order conditions.

Definition 4. We say that (x, ζ) is an -first-order point of PF(c) ( ≥ 0) if there exist multipliers (λ, µ⁻, µ⁺, τ, ν, π⁻, π⁺) satisfying

k∇xLc(x, ζ, λ, µ⁻, µ⁺, τ, ν)k∞≤ ,

|c − e^T_pλ − e^T_qµ⁻− e^T_qµ⁺− π⁻+ π⁺| ≤ ,

0 ≤ τ, G^Tx ≥ 0, τ^TG^Tx ≤ ,

0 ≤ ν, H^Tx ≥ 0, ν^TH^Tx ≤ .

(10)

The conditions (10) are well suited to situations in which PF(c) is solved by interior-point methods or active-set methods, since such methods can enforce the bound constraints G^Tx ≥ 0 and H^Tx ≥ 0 explicitly (also the nonnegativity constraints on the multipliers, in the case of interior-point methods), while allowing the constraints involving nonlinear functions to be satisfied inexactly.

We now introduce the notions of approximately active constraints and of exact and inexact second-order (stationary) points of PF(c).

Definition 5. Given a function r : IRⁿ → IR, a constraint r(x) ≥ 0 or r(x) = 0 of a nonlinear program is δ-active (δ ≥ 0) at a point ˆx if |r(ˆx)| ≤ δ. The constraint is active at ˆx if r(ˆx) = 0.

Definition 6. We say that (x, ζ) is a second-order point of PF(c) if there exist multipliers (λ, µ⁻, µ⁺, τ, ν, π⁻, π⁺) satisfying (9) (so (x, ζ) is a first-order point of PF(c)) and

˜

u^T∇²_(x,ζ)(x,ζ)L_c(x, ζ, λ, µ⁻, µ⁺, τ, ν)˜u ≥ 0,

for all ˜u ∈ IRⁿ⁺¹in the null space of the gradients of all active constraints of (2) at (x, ζ).

Definition 7. We say that (x, ζ) is an (, δ)-second-order point of PF(c) (, δ ≥ 0) if there exist multipliers (λ, µ⁻, µ⁺, τ, ν, π⁻, π⁺) satisfying (10) (so (x, ζ) is an -first-order point of PF(c)) and

˜ u^T∇²

(x,ζ)(x,ζ)Lc(x, ζ, λ, µ⁻, µ⁺, τ, ν)˜u ≥ −Ck˜uk²,

for all ˜u ∈ IRⁿ⁺¹that are simultaneously in the null space of the gradients of all active bound constraints (G^Tx ≥ 0, H^Tx ≥ 0, 0 ≤ ζ ≤ ¯ζ) of (2) at (x, ζ) and in

(7)

the null space of the gradients of δ-active nonbound constraints (g(x) ≥ −ζe_p, ζe_q ≥ h(x) ≥ −ζe_q) at (x, ζ). Here C ≥ 0 is an arbitrary constant independent of (x, ζ).

We shall see in Subsection 5.3 that the bounded indefiniteness condition given in Definition 7 is numerically easier to verify than the more standard positive semidefiniteness condition (corresponding to C = 0). In particular, when we use an off-the-shelf code to solve PF(c), we generally have no knowledge and no control of how the active constraints are computed, if they are explicitly computed at all. Hence, it is difficult to check numerically whether the final point output by the code satisfies the positive semidefiniteness condition because this condition is sensitive to the value of the (unknown) tolerance δ. On the other hand, as our numerical experience in Subsection 5.3 suggests, the bounded indefiniteness condition seems fairly insensitive to δ.

2.3. Relating First-Order Points of the MPEC and the Elastic Form

The following result identifies certain first-order points of PF(c) (2) with the strongly stationary points of the MPEC (1).

Theorem 2. If (x, ζ) is a first-order point of PF(c) with c ≥ 0 and x is feasible for (1), then (x, 0) is also a first-order point of PF(c), and x is strongly stationary for (1).

Proof. To prove the first claim, we show that when x is feasible for (1), ζ can be replaced by 0 in the conditions (9) and they will still be satisfied, without changes to the other variables. It is easy to see that the conditions (9d), (9e), and (9f) continue to hold after this substitution, while (9b) is not affected. Also, we have from (9c) that

0 ≤ π⁺⊥ ¯ζ − ζ ≥ 0. (11)

If ¯ζ = 0, we must have that ζ = 0 already, so that the substitution of 0 for ζ is inconsequential. If ¯ζ − ζ > 0, we must have π⁺ = 0, so (11) still holds after ζ is replaced by 0. The final case is ζ = ¯ζ > 0 with π⁺ > 0. By the conditions 0 ≤ π⁻ ⊥ ζ ≥ 0, we have π⁻ = 0, so the right-hand side (9b) is negative. On the other hand, since x is feasible in (1) and ζ > 0, we have g(x) + ζe_p > 0, ζe_q− h(x) > 0, and ζe_q + h(x) > 0, so it follows by complementarity in (9d), (9e), and (9f) that λ = 0 and µ⁻= µ⁺= 0. Hence, the left-hand side of (9b) is nonnegative, a contradiction. Thus, we must have π⁺= 0, so the conditions (11) will continue to hold after we replace ζ by 0. The first statement of the theorem is proved.

For the second statement, we can identify (9a) with (6a) by setting x^∗ = x, ζ = 0, and

τ^∗= τ − cH^Tx, ν^∗= ν − cG^Tx, λ^∗= λ, µ^∗= µ⁻− µ⁺. (12)

(8)

3. Global Convergence Results

In this section we state and prove results for methods in which PF(ck) is solved for a nondecreasing sequence of positive scalars {ck}. By “solved” we mean that either an exact or inexact first- or second-order point x^k of PF(ck) is computed; we analyze various cases in the subsections below. We are interested particularly in techniques that achieve exact complementarity finitely; that is, (G^Txk)^T(H^Tx^k) = 0 for all iterates k with ck exceeding some threshold c^∗.

3.1. A Sequence of Inexact First-Order Points

Here we consider the situation in which an inexact first-order point (x^k, ζ_k) of PF(c_k) is generated, for k = 0, 1, . . ., and give conditions under which accumulation points of {x^k} are C-stationary. The proof is long and somewhat technical.

It borrows some ideas from the proofs of Scholtes [24, Theorem 3.1] and An- itescu [2, Theorem 2.5].

Theorem 3. Let {c_k} be a positive sequence, nondecreasing with k, and {k} be a nonnegative sequence with {c_k_k} → 0. Suppose that (x^k, ζ_k) is an _k-first- order point of PF(c_k), k = 0, 1, . . .. Let x^∗ be any accumulation point of {x^k} that is feasible for (1) and satisfies MPEC-LICQ. Then x^∗ is C-stationary for (1), and for any S ⊂ {0, 1, . . .} with {x^k}_k∈S→ x^∗, we have {ζ_k}_k∈S → 0.

Proof. Suppose without loss of generality that {x^k} → x^∗. Since c_k ≥ c0 >

0 and {c_k_k} → 0, we have {k} → 0. Let (λ^k, µ^−k, µ^+k, τ^k, ν^k, π^−k, π^+k) be multipliers associated with (x^k, ζ_k) (from (10)).

From the final row of (10), we have that, for all k,

ν_i^k(H_i^Tx^k) ≤ (ν^k)^T(H^Tx^k) ≤ k, i = 1, 2, . . . , m, (13) so for i /∈ I_H, since H_i^Tx^k is bounded away from zero, we have that ν_i^k = O(_k).

By similar reasoning, we have that τ_i^k = O(k) for i /∈ IG. Using these two facts, we can write the first row of (10) as follows:

0 = ∇f (x^k) −

p

X

i=1

λ^k_i∇g_i(x^k) −

q

X

i=1

(µ^−k_i − µ^+k_i )∇h_i(x^k)

−X

i∈I_G

(τ_i^k− ckH_i^Tx^k)Gi− X

i∈I_H

(ν_i^k− ckG^T_i x^k)Hi

+ck

X

i /∈IG

(H_i^Tx^k)Gi+ ck

X

i /∈IH

(G^T_i x^k)Hi+ O(k).

Since x^∗is feasible for (1), we have I_G∪IH= {1, 2, . . . , m}, and the set of indices i /∈ I_G is simply I_H\I_G. Similarly, i /∈ I_H ⇔ i ∈ I_G\I_H. Hence, we can restate

(9)

the relation above as follows:

0 = ∇f (x^k) −

p

X

i=1

λ^k_i∇gi(x^k) −

q

X

i=1

(µ^−k_i − µ^+k_i )∇hi(x^k)

− X

i∈I_G∩IH

(τ_i^k− ckH_i^Tx^k)Gi− X

i∈I_G∩IH

(ν_i^k− ckG^T_ix^k)Hi (14)

− X

i∈IG\IH

(τ_i^k− ckH_i^Tx^k)Gi− ck(G^T_i x^k)Hi

− X

i∈I_H\I_G

(ν_i^k− ckG^T_ix^k)H_i− ck(H_i^Tx^k)G_i + O(k).

We examine the final summation in (14) more closely. This term can be written as follows:

X

i∈IH\IG

(ν_i^k− c_kG^T_i x^k)H_i− c_k(H_i^Tx^k)G_i

= X

i∈I_H\IG

(ν_i^k− ckG^T_i x^k)

Hi+H_i^Tx^k G^T_ix^kGi

− ν_i^kH_i^Tx^k

G^T_ix^kGi (15)

= X

i∈IH\IG

(ν_i^k− ckG^T_i x^k)

Hi+H_i^Tx^k G^T_ix^kGi

+ O(k),

where the final inequality is a consequence of {G^T_i x^k} → G^T_ix^∗> 0 for i ∈ I_H\IG

and 0 ≤ ν_i^kH_i^Tx^k≤ k (see (13)). Hence, by defining

H˜_i^k^def=







Hi+H_i^Tx^k

G^T_i x^kGi, for i ∈ IH\IG , H_i, for i ∈ I_G∩ IH,

(16)

we deduce from (15) that X

i∈I_H\IG

(ν_i^k− ckG^T_i x^k)H_i− ck(H_i^Tx^k)G_i = X

i∈I_H\IG

(ν_i^k− ckG^T_ix^k) ˜H_i^k+ O(_k).

(17) Since {H_i^Tx^k/G^T_ix^k} → 0 for i ∈ IH\IG, we have from (16) that

{ ˜H_i^k} → H_i, for i ∈ I_H.

A similar definition of ˜G^k_i for i ∈ IG yields for the second-to-last summation in (14) that

X

i∈IG\IH

(τ_i^k− ckH_i^Tx^k)Gi− ck(G^T_ix^k)Hi = X

i∈IG\IH

(τ_i^k− ckH_i^Tx^k) ˜G^k_i + O(k).

(18)

(10)

By substituting (17) and (18) into (14) and using the definitions of ˜H_i^k and ˜G^k_i, we have

0 = ∇f (x^k) −

p

X

i=1

λ^k_i∇gi(x^k) −

q

X

i=1

(µ^−k_i − µ^+k_i )∇hi(x^k) (19)

−X

i∈I_G

(τ_i^k− ckH_i^Tx^k) ˜G^k_i − X

i∈I_H

We turn now to the term in (19) involving λ^k. By taking a further subsequence if necessary, we assume that there is a constant ρ > 0 such that gi(x^k) ≥ ρ for all i /∈ Ig and all k. From the fourth row of (10) we have

|(g(x^k) + ζkep)^Tλ^k| ≤ k and therefore X

i6∈Ig

(gi(x^k) + ζk)λ^k_i ≤ k−X

i∈Ig

(gi(x^k) + ζk)λ^k_i ≤ k+ k

X

i∈Ig

λ^k_i,

where the second inequality follows from the fact that λ^k_i ≥ 0 and gi(x^k) + ζk ≥

−k for all i (due to the fourth row of (10)). Since i /∈ Ig ⇒ gi(x^k) + ζk ≥ gi(x^k) ≥ ρ > 0, it follows that

ρX

i /∈Ig

λ^k_i ≤ k+ k

X

i∈I_g

λ^k_i, for all k. (20)

WhenP

i∈I_gλ^k_i ≥ 1, we have immediately from (20) that P

i /∈Igλ^k_i P

i∈I_gλ^k_i ≤ 2k

ρ . (21)

Then

p

X

i=1

λ^k_i∇gi(x^k) =X

i∈Ig

λ^k_i

"

∇gi(x^k) + P

j /∈I_gλ^k_j∇gj(x^k) P

j∈I_gλ^k_j

#

=X

i∈Ig

λ^k_ig˜i,

where the vector ˜g^k_i is defined in the obvious way. Because of (21) and {x^k} → x^∗, we have {˜g^k_i} → ∇gi(x^∗). Otherwise, whenP

i∈I_gλ^k_i < 1, we have from (20) that X

i /∈Ig

λ^k_i ≤ 2k

ρ = O(k), (22)

so that

p

X

i=1

λ^k_i∇gi(x^k) =X

i∈I_g

λ^k_ig˜^k_i + O(k),

where we set ˜g^k_i ^def= ∇g_i(x^k). Thus, in both cases, we have that

p

X

i=1

λ^k_i∇gi(x^k) =X

i∈I_g

λ^k_i˜g^k_i + O(_k) and {˜g_i^k} → ∇gi(x^∗), i ∈ I_g.

(11)

Using the first relation, we can write (19) as follows:

0 = ∇f (x^k) −X

i∈Ig

λ^k_i˜g_i^k−

q

X

i=1

(µ^−k_i − µ^+k_i )∇hi(x^k) (23)

−X

i∈I_G

(τ_i^k− ckH_i^Tx^k) ˜G^k_i − X

i∈I_H

Since x^∗satisfies MPEC-LICQ, we can invoke Lemma 2 to deduce from (23) the existence of λ^∗_i for i ∈ Ig, τ_i^∗ for i ∈ IG, and ν_i^∗for i ∈ IH such that

0 = ∇f (x^∗) −X

i∈I_g

λ^∗_i∇g_i(x^∗) −

q

X

i=1

µ^∗_i∇h_i(x^∗) − X

i∈I_G

τ_i^∗G_i− X

i∈I_H

ν_i^∗H_i,

and, moreover,

{λ^k_i} → λ^∗_i, for i ∈ Ig, (24a) {µ^−k_i − µ^+k_i } → µ^∗_i, for i = 1, 2, . . . , q, (24b) {τ_i^k− ckH_i^Tx^k} → τ_i^∗, for i ∈ I_G, (24c) {ν_i^k− ckG^T_ix^k} → ν_i^∗, for i ∈ IH. (24d) We now analyze (24c) and (24d) for i ∈ I_G∩ IH. Since τ_i^k, ν_i^k, G^T_i x^k, and H_i^Tx^k are all nonnegative, we have

(τ_i^k− ckH_i^Tx^k)(ν_i^k− ckG^T_ix^k)

= τ_i^kν_i^k+ c²_k(H_i^Tx^k)(G^T_i x^k) − c_k(τ_i^kG^T_i x^k+ ν_i^kH_i^Tx^k)

≥ −ck(τ_i^kG^T_i x^k+ ν_i^kH_i^Tx^k)

≥ −2ckk,

where the final inequality follows from (10). Taking limits as k → ∞ and using {ckk} → 0, we conclude that τ_i^∗ν_i^∗≥ 0 for i ∈ IG∩ IH, implying C-stationarity.

To complete the proof, we show by contradiction that {ζ_k} → 0. If this limit did not hold, we could assume by taking a subsequence if necessary that ζ_k≥ ζ > 0 for all k. Since x^∗is feasible, we have that g_i(x^∗) ≥ 0 for all i, so for all k sufficiently large we have

gi(x^k) + ζk ≥ ζ/2, for i = 1, 2, . . . , p.

Hence, we have from the fourth row of (10) that e^T_pλ^k ≤ 2k/ζ,

for all k sufficiently large. Similarly, since h(x^∗) = 0, we have that ζk− hi(x^k) ≥ ζ/2, ζk+ hi(x^k) ≥ ζ/2, for i = 1, 2, . . . , q.

Hence

e^T_qµ^−k≤ 2_k/ζ, e^T_qµ^+k≤ 2_k/ζ,

(12)

for all k sufficiently large. This together with the second row of (10) yields π^−k− π^+k= c^k+ O(^k).

Since π^+k≥ 0, this implies π^−k≥ c^k+ O(^k). Also, from the third row of (10), we have ζkπ^−k≤ k. Thus

ζ_k ≤ _k

π^−k ≤ _k

ck+ O(k)→ 0 as k → ∞, contradicting our positive lower bound on ζk.

Without loss of generality, we could assume in Theorem 3 that {c_k} is increas- ing (rather than nondecreasing). However, allowing {c_k} to be nondecreasing is convenient when, for example, (x^k, ζk) is the point generated at the kth iteration of an iterative method that allows ck to remain unchanged from one iteration to the next; see Algorithm Elastic-Inexact in Section 3.5.

The following corollary gives additional global convergence properties of the sequence {(x^k, ζ_k)}.

Corollary 1. Suppose that the assumptions of Theorem 3 hold, where x^∗ is an accumulation point of {x^k} that is C-stationary for (1) and satisfies MPEC- LICQ. Then for any S ⊂ {0, 1, . . .} such that {x^k}k∈S → x^∗, we have that

{ckG^T_ix^k}k∈S → 0, for i ∈ IG\IH, (25a) {ckH_i^Tx^k}k∈S → 0, for i ∈ IH\IG, (25b) {c_k(G^T_i x^k)(H_i^Tx^k)}_k∈S → 0, for i ∈ I_G∩ I_H, (25c)

{ckζk}k∈S → 0. (25d)

Proof. We first prove (25b); the proof of (25a) is analogous. If {ck} is bounded (from above by ¯c, say), then the result follows from

0 ≤ c_kH_i^Tx^k≤ ¯cH_i^Tx^k → ¯cH_i^Tx^∗= 0, as k ∈ S, k → ∞, i ∈ I_H\IG. Suppose instead that {ck} ↑ ∞. Assume for contradiction that there is some S ⊂ S, some i ∈ I¯ H\IG, and some constant ρ > 0 such that ckH_i^Tx^k≥ ρ for all k ∈ ¯S. From the final row of (10), we have that ν_i^kH_i^Tx^k ≤ (ν^k)^TH^Tx^k ≤ k, implying

ν_i^kckH_i^Tx^k≤ ckk→ 0, as k ∈ ¯S, k → ∞.

It follows from ckH_i^Tx^k ≥ ρ that {ν_i^k}_{k∈ ¯}_S → 0. From the limit (24d), we then have that

{ckG^T_ix^k}_{k∈ ¯}_S → −ν_i^∗.

Since {c_k} ↑ ∞, this limit implies that {G^T_ix^k}_{k∈ ¯}_S → 0. Since {x^k} → x^∗, it follows that G^T_i x^∗ = 0, implying that i ∈ I_G. This contradicts our choice of i ∈ IH\IG, so (25b) must hold in this case too.

If {c_k} is bounded, then (25c) follows from the feasibility of x^∗ for (1). Sup- pose instead that {c_k} ↑ ∞. Assume for contradiction that there is some ¯S ⊂ S,

(13)

some i ∈ I_G∩ I_H, and some constant ρ > 0 such that c_k(G^T_i x^k)(H_i^Tx^k) ≥ ρ for all k ∈ ¯S. Thus by (24c), we have

τ_i^k= c_kH_i^Tx^k+ O(1) ≥ ρ

G^T_i x^k + O(1) ≥ ρ 2G^T_ix^k,

for all k ∈ ¯S sufficiently large. However, from the second-to-last row of (10), we have τ_i^kG^T_ix^k ≤ (τ^k)^T(G^Tx^k) ≤ _k, so that τ_i^k≤ _k/G^T_ix^k, yielding the desired contradiction since {k} → 0.

To prove (25d), we see from the third row of (10) that, for all k,

c_kζ_k≤ ζ_k(e^T_pλ^k+ e^T_qµ^−k+ e^T_qµ^+k+ π^−k− π^+k) + _kζ_k. (26) Because of (24a) and {ζk}k∈S → 0 (Theorem 3), we have that {ζke^T_pλ^k}k∈S → 0.

Similarly, it is immediate that {_kζ_k}k∈S→ 0. From the fifth and sixth rows of (10) and (24b), we also have

ζk(e^T_qµ^+k+ e^T_qµ^−k)

≤ h(x^k)^T(µ^+k− µ^−k) + 2k

≤ kh(x^k)k_∞kµ^+k− µ^−kk₁+ 2_k

≤ (ζk+ k)kµ^+k− µ^−kk1+ 2k→ 0, as k ∈ S, k → ∞.

Lastly, from the third row of (10), we have ζk(π^−k− π^+k) ≤ k− ¯ζπ^+k ≤ k. Hence, by taking limits in (26), we have the desired result (25d).

In Theorem 3, we assumed that the accumulation point x^∗ is feasible for (1). This assumption is fairly mild and, as we show below, is satisfied under the following assumptions on {(x^k, ζk)} and {ck}.

Assumption 1 (a) {f (x^k)} is bounded from below.

(b) {f (x^k) + c_kζ_k+ c_k(G^Tx^k)^T(H^Tx^k)} is bounded from above.

(c) There exist positive sequences {ω_k} → 0, {ηk} → ∞ such that ck+1≥ ηk+1

whenever ζ_k+ (G^Tx^k)^T(H^Tx^k) ≥ ω_k.

Assumption 1(a) holds if f is bounded from below over the feasible set of PF(ck). Assumption 1(b) holds if (i) the method for solving PF(ck) has the property that the final point (x^k, ζk) it generates has objective value no greater than that of the starting point whenever the starting point is feasible for PF(ck);

and (ii) this method is started at (¯x, 0), with ¯x a feasible point of (1). Then (¯x, 0) is feasible for PF(ck), with objective value f (¯x), so that

f (x^k) + ckζk+ ck(G^Tx^k)^T(H^Tx^k) ≤ f (¯x), for all k.

Assumption 1(c) holds if we choose c_k+1≥ max{c_k, η_k+1} whenever

(G^Tx^k)^T(H^Tx^k) +ζ_k ≥ ω_k. Assumption 1 contrasts with the infeasible-point MPEC-LICQ assumption used in [10, Lemma 3.2].

Lemma 1. Let {ck} be a positive sequence, nondecreasing with k, and {k} be a nonnegative sequence with {_k} → 0. Suppose that (x^k, ζ_k) is an _k-first-order point of PF(c_k), k = 0, 1, . . ., and that Assumption 1 is satisfied. Then every accumulation point of {x^k} is feasible for (1).

(14)

Proof. It suffices to show that

{ζk} → 0, {(G^Tx^k)^T(H^Tx^k)} → 0. (27) Then any accumulation point (x^∗, ζ_∗) of {(x^k, ζk)} satisfies (G^Tx^∗)^T(H^Tx^∗) = 0 and ζ_∗= 0, implying that x^∗is feasible for (1). (The other constraints of (1) are satisfied by x^∗, from rows 4 to 8 of (10) and {k} → 0.)

We divide our argument into two cases. First, suppose that

ζk+ (G^Tx^k)^T(H^Tx^k) < ωk, (28) for all k sufficiently large. Since ζ_k ≥ 0, G^Tx^k ≥ 0, H^Tx^k ≥ 0 for all k and {ω_k} → 0, the bound (28) implies (27). Second, suppose that (28) fails to hold for all k in some infinite subsequence. Then, by Assumption 1(c), ck+1≥ ηk+1for all k in this subsequence. Since {ck} is nondecreasing and {ηk} → ∞, we have that {ck} ↑ ∞. Assumptions 1(a) and 1(b) imply that {ckζk+ ck(G^Tx^k)^T(H^Tx^k)} is bounded from above. Since ζk ≥ 0, G^Tx^k ≥ 0, H^Tx^k≥ 0 for all k and {ck} ↑ ∞, (27) follows.

3.2. A Sequence of Exact Second-Order Points

In this subsection, we consider the situation in which an exact second-order point (x^k, ζ_k) of PF(c_k) is generated (Definition 6), for k = 0, 1, . . ., with {c_k} ↑ ∞.

Algorithm Elastic-Exact

Choose c_k > 0, k = 0, 1, . . ., with {c_k} ↑ ∞;

for k = 0, 1, 2 . . .

Find a second-order point (x^k, ζ_k) of PF(c_k) with Lagrange multipliers (λ^k, µ^−k, µ^+k, τ^k, ν^k, π^−k, π^+k);

if ζ_k= 0 and (G^Tx^k)^T(H^Tx^k) = 0, STOP.

end (if ) end (for)

We show below that either the algorithm terminates finitely—in which case, the final iterate x^k is strongly stationary by Theorem 2—or each accumulation point of {x^k} either is infeasible or fails to satisfy MPEC-LICQ.

Theorem 4. If Algorithm Elastic-Exact does not terminate finitely, then every accumulation point x^∗ of {x^k} either is infeasible for (1) or fails to satisfy MPEC-LICQ.

Proof. Assume for contradiction that the algorithm does not terminate finitely and that there is an accumulation point x^∗ that is feasible for (1) and satisfies MPEC-LICQ. Let S ⊂ {0, 1, . . .} index the subsequence for which {x^k}k∈S → x^∗. Since (x^k, ζk) is a first-order point of PF(ck), with multipliers λ^k, µ^−k, µ^+k, τ^k, ν^k,

π^−k, π^+k, Theorem 3 (with k ≡ 0) shows that x^∗ is C-stationary. Our aim is

(15)

to show that in fact x^k is feasible for (1) for sufficiently large k ∈ S, and hence the algorithm terminates finitely.

For any k, if ζ_k > 0, then (9c) would imply that π^−k= 0, and hence ck− e^T_pλ^k− e^T_q(µ^−k+ µ^+k) = −π^+k≤ 0. (29) Moreover, (9e) and (9f) would imply that, for each i, either µ^−k_i = 0 or µ^+k_i = 0, and hence kµ^−k+ µ^+kk = kµ^−k− µ^+kk. Since (24a) and (24b) hold when restricted to k ∈ S, and since {c_k} → ∞, we have that (29) cannot hold for all k ∈ S sufficiently large. Thus, we have

ζk= 0, for all k ∈ S sufficiently large.

We next show that because (x^k, ζ_k) satisfies the second-order necessary optimality condition for PF(c_k), we must have (G^T_jx^k)(H_j^Tx^k) = 0 for all j ∈ I_G∩I_H and all k ∈ S sufficiently large. Suppose not. By passing to a further subsequence if necessary, there must be an index j ∈ IG∩ IH such that (G^T_jx^k)(H_j^Tx^k) 6= 0 for all k ∈ S. Define a direction d^k satisfying the following conditions:

∇gi(x^k)^Td^k = 0, for i ∈ Ig,

∇h(x^k)^Td^k = 0,

G^T_id^k = 0, for i ∈ IG with i 6= j, H_i^Td^k = 0, for i ∈ IH with i 6= j,

G^T_jd^k = 1, H_j^Td^k = −1.

(30)

Since {x^k}k∈S → x^∗ and MPEC-LICQ holds at x^∗, the gradients in this definition are linearly independent for all k ∈ S sufficiently large, in which case d^k satisfying these equations is well defined. In fact, we can choose d^k so that

kd^kk = O(1).

Since {(x^k, ζ_k)}_k∈S → (x^∗, 0), the set of active constraints of (2) at (x^k, ζ_k) is a subset of the active constraints at (x^∗, 0) for all k ∈ S sufficiently large, in which case the direction (d^k, 0) lies in the direction set described in Definition 6 corresponding to ck, (x^k, ζk, λ^k, µ^−k, µ^+k, τ^k, ν^k). (Notice that the constraints G^T_jx ≥ 0 and H_j^Tx ≥ 0 are not active at (x^k, ζk) because (G^T_jx^k)(H_j^Tx^k) 6= 0.) Also, (9d) implies λ^k_i = 0, i 6∈ Ig, for all k ∈ S sufficiently large.

From Definition (8) and using λ^k_i = 0 for i 6∈ I_g, we have

∇²_xxL_c_k(x^k, ζ^k, λ^k, µ^+k, µ^−k, τ^k, ν^k) = ∇²f (x^k) −X

i∈I_g

λ^k_i∇²g_i(x^k)

−

q

X

i=1

(µ^−k_i − µ^+k_i )∇²hi(x^k)

+ck m

X

i=1

GiH_i^T + HiG^T_i ,