The semismooth-related properties of a merit function and a descent method for the nonlinear complementarity problem

(1)

Journal of Global Optimization, vol. 36, pp. 565-580, 2006

The semismooth-related properties of a merit function and a descent method for the nonlinear complementarity problem

Jein-Shan Chen ¹ Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan

February 1, 2005 (revised March 11, 2006)

Abstract This paper is a follow-up of the work [1] where an NCP-function and a descent method were proposed for the nonlinear complementarity problem. An unconstrained re- formulation was formulated due to a merit function based on the proposed NCP-function.

We continue to explore properties of the merit function in this paper. In particular, we show that the gradient of the merit function is globally Lipschitz continuous which is im- portant from computational aspect. Moreover, we show that the merit function is SC¹ function which means it is continuously differentiable and its gradient is semismooth. On the other hand, we provide an alternative proof, which uses the new properties of the merit function, for the convergence result of the descent method considered in [1].

Key words. Complementarity, SC¹ function, merit function, semismooth function, descent method.

1 Introduction

In the past decades, the well-known nonlinear complementarity problem (NCP) has attracted much attention due to its various applications in operations research, economics, and engineering [7, 12, 18]. The NCP is to find a point x ∈ IRⁿ such that

x ≥ 0, F (x) ≥ 0, hx, F (x)i = 0, (1)

where h·, ·i is the Euclidean inner product and F = (F₁, F₂, · · · , F_n)^T maps from IRⁿto IRⁿ. We assume that F is continuously differentiable throughout this paper.

1E-mail: jschen@math.ntnu.edu.tw, TEL: 886-2-29320206, FAX: 886-2-29332342.

(2)

There have been many methods proposed for solving the NCP [10, 12, 18]. Among which, one of the most popular approaches that has been studied intensively recently is to reformulate the NCP as an unconstrained minimization problem [6, 8, 11, 14, 15, 30].

Such a function that can constitute an equivalent unconstrained minimization problem for the NCP is called a merit function. In other words, a merit function is a function whose global minima are coincident with the solutions of the original NCP. For constructing a merit function, the class of functions, so-called NCP-functions and defined as below, serves an important role.

Definition 1.1 A function φ : IR² → IR is called an NCP-function if it satisfies

φ(a, b) = 0 ⇐⇒ a ≥ 0, b ≥ 0, ab = 0. (2)

A popular NCP-function intensively studied recently is the well-known Fischer-Burmeister NCP-function [8, 9, 26] defined as

φ(a, b) = √

a²+ b²− (a + b). (3)

Let Φ : IRⁿ → IRⁿ be

Φ(x) =







φ(x1 , F1(x))

·

· φ(xn , Fn(x))







. (4)

Then the function Ψ : IRⁿ→ IR+ defined by Ψ(x) := 1

2kΦ(x)k² = 1 2

Xn

i=1

φ(x_i , F_i(x))² (5)

is a merit function for the NCP, i.e., the NCP can be recast as an unconstrained minimization:

x∈IRminⁿΨ(x). (6)

In the paper [1], an NCP-function which is an extension of the Fischer-Burmeister function (3) was studied. More specifically, they define φ_p : IR² → IR by

φ_p(a, b) := k(a, b)k_p− (a + b), (7) where k(a, b)k_p denotes the p-norm of (a, b), i.e., k(a, b)k_p = ^q^p|a|^p+ |b|^p. In other words, in the function φ_p, the 2-norm of (a, b) in the Fischer-Burmeister function (3) is replaced by more generally a p-norm of (a, b) with p ≥ 2. This function φp is still an NCP-function

(3)

as was noted in Tseng’s paper [28]. Nonetheless, there was no further study on this NCP- function even for p = 3 until the recent paper [1] by the author. Following the function φ_p, we can further define ψp : IR² → IR+ by

ψ_p(a, b) := 1

2|φ_p(a, b)|². (8)

The function ψ_p is a nonnegative NCP-function and smooth on IR² with some favorable properties, see [1]. In this paper, we continue to explore properties of ψ_p as will be seen in Sec. 3. Analogous to Φ, the function Φp : IRⁿ→ IRⁿ given by

Φp(x) =







φ_p(x₁ , F₁(x))

·

· φ_p(x_n , F_n(x))







(9)

yields a merit function Ψ_p : IRⁿ → IR₊ for the NCP where Ψp(x) := 1

2kΦp(x)k² = 1 2

Xn

i=1

φp(xi , Fi(x))² =

Xn

i=1

ψp(xi , Fi(x)). (10) As shown in [1], Ψ_p is a continuously differentiable merit function for the NCP. Therefore, classical iterative methods such as Newton method can be applied to the unconstrained smooth minimization of the NCP, i.e.,

x∈IRminⁿΨp(x). (11)

On the other hand, derivative-free methods have also attracted much attention which do not require computation of derivatives of F [11, 14, 29]. Derivative-free methods, taking advantages of particular properties of a merit function, are suitable for problems where the derivatives of F are not available or expensive. In this paper, we also study a derivative-free descent algorithm for solving the NCP based on the merit function Ψ_p in Sec. 4. Indeed, the descent method was considered in [1], we apply the new properties of ψ_p explored in this paper to provide an alternative proof for the convergence result.

Throughout this paper, IRⁿ denotes the space of n-dimensional real column vectors and ^T denotes transpose. For any differentiable function f : IRⁿ → IR, ∇f (x) denotes the gradient of f at x. For any differentiable mapping F = (F₁, ..., F_m)^T : IRⁿ → IR^m,

∇F (x) = [∇F1(x) · · · ∇Fm(x)] denotes the transpose Jacobian of F at x. We write z = ◦(α) with α ∈ IR and z ∈ IRⁿ to mean kzk/|α| tends to zero as α → 0. Also, we denote by kxk_p the p-norm of x and by kxk the Euclidean norm of x. In the whole paper, we assume p is an integer greater than or equal to 2.

(4)

2 Preliminaries

In this section, we recall some background concepts and review some known materials which are crucial to the subsequent analysis. We begin with the monotonicity of a mapping. Let F : IRⁿ → IRⁿ, then F is monotone if hx − y, F (x) − F (y)i ≥ 0, for all x, y ∈ IRⁿ; F is strictly monotone if hx − y, F (x) − F (y)i > 0, for all x, y ∈ IRⁿ and x 6= y; and F is strongly monotone with modulus µ > 0 if hx − y, F (x) − F (y)i ≥ µkx − yk², for all x, y ∈ IRⁿ. Next, we recall the so-called semismooth functions. First, we say that F is strictly continuous (also called ‘locally Lipschitz continuous’) at x ∈ IRⁿ [25, Chap. 9] if there exist scalars κ > 0 and δ > 0 such that

kF (y) − F (z)k ≤ κky − zk ∀y, z ∈ IRⁿ with ky − xk ≤ δ, kz − xk ≤ δ;

and F is strictly continuous if F is strictly continuous at every x ∈ IRⁿ. If δ can be taken to be ∞, then F is Lipschitz continuous with Lipschitz constant κ. Define the function lipF : IRⁿ → [0, ∞] by

lipF (x) := lim sup

y,z→x y6=z

kF (y) − F (z)k ky − zk .

Then F is strictly continuous at x if and only if lipF (x) is finite. We say F is directionally differentiable at x ∈ IRⁿ if

F⁰(x; h) := lim

t→0⁺

F (x + th) − F (x)

t exists ∀h ∈ IRⁿ;

and F is directionally differentiable if F is directionally differentiable at every x ∈ IRⁿ. F is differentiable (in the Fr´echet sense) at x ∈ IRⁿ if there exists a linear mapping ∇F (x) : IRⁿ → IRⁿ such that

F (x + h) − F (x) − ∇F (x)h = o(khk).

We say that F is continuously differentiable if F is differentiable at every x ∈ IRⁿ and ∇F is continuous.

If F is strictly continuous, then F is almost everywhere differentiable by Rademacher’s Theorem–see [4] and [25, Sec. 9J]. In this case, the generalized Jacobian ∂F (x) of F at x (in the Clarke sense) can be defined as the convex hull of the generalized Jacobian ∂_BF (x), where

∂BF (x) :=

½

xlim^j→x∇F (x^j)|F is differentiable at x^j ∈ IRⁿ

¾

.

The notation ∂B is adopted from [20]. In [25, Chap. 9], the case of n = 1 is considered and the notations “ ¯∇” and “ ¯∂” are used instead of, respectively, “∂_B” and “∂”.

Assume F : IRⁿ → IRⁿ is strictly continuous. We say F is semismooth at x if F is directionally differentiable at x and, for any V ∈ ∂F (x + h), we have

F (x + h) − F (x) − V h = o(khk).

(5)

We say F is ρ-order semismooth at x (0 < ρ < ∞) if F is semismooth at x and, for any V ∈ ∂F (x + h), we have

F (x + h) − F (x) − V h = O(khk^1+ρ).

The following lemma, proven by Sun and Sun [27, Thm. 3.6] using the definition of generalized Jacobian,(Sun and Sun did not consider the case of o(khk) but their argument readily applies to this case.) enables one to study the semismooth property of F by ex- amining only those points x ∈ IRⁿ where F is differentiable and thus work only with the Jacobian of F , rather than the generalized Jacobian.

Lemma 2.1 Suppose F : IRⁿ → IRⁿ is strictly continuous and directionally differentiable in a neighborhood of x ∈ IRⁿ. Then, for any 0 < ρ < ∞, the following two statements (where O(·) depends on F and x only) are equivalent:

(a) For any h ∈ IRⁿ and any V ∈ ∂F (x + h),

F (x + h) − F (x) − V h = o(khk) (respectively, O(khk^1+ρ)).

(b) For any h ∈ IRⁿ such that F is differentiable at x + h,

F (x + h) − F (x) − ∇F (x + h)h = o(khk) (respectively, O(khk^1+ρ)).

We say F is semismooth (respectively, ρ-order semismooth) if F is semismooth (re- spectively, ρ-order semismooth) at every x ∈ IRⁿ. We say F is strongly semismooth if it is 1-order semismooth. Convex functions and piecewise continuously differentiable func- tions are examples of semismooth functions. The composition of two (respectively, ρ-order) semismooth functions is also a (respectively, ρ-order) semismooth function. The property of semismoothness plays an important role in nonsmooth Newton methods [20, 22] as well as in some smoothing methods. For extensive discussions of semismooth functions, see [9, 16, 22].

Now, we review some useful properties about φ_p, ψ_p defined as in (7) and (8), respectively which will be used for the analysis in the subsequent sections. We notice that the function φ_p reduces to the Fischer-Burmeister function given as in (3) when p = 2. Thus, most properties are extensions of properties of Fischer-Burmeister function. For detailed proofs of them, please refer to [1].

Lemma 2.2 ([1, Prop. 3.1]) Let φ_p : IR² → IR be defined as (7) where p ≥ 2. Then (a) φp is an NCP-function, i.e., it satisfies (2).

(6)

(b) φ_p is sub-additive, i.e., φ_p(w + w⁰) ≤ φ_p(w) + φ_p(w⁰) for all w, w⁰ ∈ IR².

(c) φ_p is positively homogeneous, i.e., φ_p(αw) = αφ_p(w) for all w ∈ IR² and α ≥ 0.

(d) φ_p is convex, i.e., φ_p(αw + (1 − α)w⁰) ≤ αφ_p(w) + (1 − α)φ_p(w⁰) for all w, w⁰ ∈ IR² and α ∈ (0, 1).

(e) φ_p is Lipschitz continuous with L₁ = 1 +√

2, i.e., |φ_p(w) − φ_p(w⁰)| ≤ L₁kw − w⁰k; or with L₂ = 1 + 2^(1−1/p), i.e., |φ_p(w) − φ_p(w⁰)| ≤ L₂kw − w⁰k_p for all w, w⁰ ∈ IR². Lemma 2.2(b) and (c) imply that φ_p is sublinear, i.e., it satisfies

φ_p(αw + βw⁰) ≤ αφ_p(w) + βφ_p(w⁰)

for all w, w⁰ ∈ IR² and α, β ≥ 0. This can be seen by the fact [3, Prop. 3.11] that a function from IRⁿ to IR is sublinear if and only if it is positively homogeneous and sub-additive.

Note that the sublinear condition is stronger than convexity. In fact, under Lemma 2.2(c), Lemma 2.2(b) is equivalent to Lemma 2.2(d). This is from [24, Thm. 4.7] that a positively homogeneous function is convex if and only if it is sub-additive.

Lemma 2.3 ([1, Prop. 3.2]) Let φ_p, ψ_p be defined as (7) and (8), respectively, where p ≥ 2.

Then

(a) ψ_p is an NCP-function, i.e., it satisfies (2).

(b) ψ_p(a, b) ≥ 0 for all (a, b) ∈ IR².

(c) ψp is continuously differentiable everywhere. Moreover, ∇aψp(0, 0) = ∇bψp(0, 0) = 0 and

∇_aψ_p(a, b) =

Ã a^p−1 k(a, b)k^p−1p

− 1

!

φ_p(a, b)

∇_bψ_p(a, b) =

Ã b^p−1 k(a, b)k^p−1p

− 1

!

φ_p(a, b), (12)

for (a, b) 6= (0, 0) with p is even, whereas

∇_aψ_p(a, b) =

Ãsgn(a) · a^p−1 k(a, b)k^p−1p

− 1

!

φ_p(a, b)

∇_bψ_p(a, b) =

Ãsgn(b) · b^p−1 k(a, b)k^p−1p

− 1

!

φ_p(a, b), (13)

for (a, b) 6= (0, 0) with p is odd.

(d) ∇_aψ_p(a, b) · ∇_bψ_p(a, b) ≥ 0 for all (a, b) ∈ IR². The equality holds if and only if φp(a, b) = 0.

(7)

(e) ∇_aψ_p(a, b) = 0 ⇐⇒ ∇_bψ_p(a, b) = 0 ⇐⇒ φ_p(a, b) = 0.

Lemma 2.4 ([1, Prop. 3.5]) Let Ψ_p : IRⁿ→ IR be defined as (10) where p ≥ 2. Assume F is either strongly monotone or uniform P -function, then the level sets L(Ψp, γ) are bounded for all γ ∈ IR.

In additional to the above properties of φ_p and ψ_p, we still need the following two lemmas for the analysis in the subsequent sections.

Lemma 2.5 ([13, (1.3)]) Let x ∈ IRⁿ and 1 < p₁ < p₂. Then kxk_p₂ ≤ kxk_p₁ ≤ n^(1/p¹^−1/p²⁾kxk_p₂.

Lemma 2.6 If F : D ⊆ IRⁿ → IR^m has a second derivative at each point of a convex set D₀ ⊆ D, then

k∇F (y) − ∇F (x)k ≤ sup

0≤t≤1k∇²F (x + t(y − x))k · ky − xk.

Proof. This is Theorem 3.3.5 of [17] (page 78). 2

3 The semismooth-related properties of the NCP and merit functions

In this section, we study some semismooth-related properties of φ_p including semismooth and almost smooth properties as well as SC¹ and LC¹ properties of ψ_p. The semismooth property is very important from the computational point of view. In particular, it plays a fundamental role in the superlinear convergence analysis of generalized Newton methods, see [20, 22, 31]. The classes of SC¹ and LC¹ functions have been a subject of interest in relation to the development of minimization algorithm. We will introduce their definitions later. We begin this section by showing that the functions φp and Φp are semismooth (in fact, they are strongly semismooth as shown in Corollary 3.1) . Its proof is easy and routine.

Proposition 3.1 The function Φp : IRⁿ → IRⁿ defined as (9) is semismooth.

(8)

Proof. We notice that φ_p is convex by Lemma 2.2(d), and hence is a semismooth function.

We also observe that each component of Φ_p(x) is the composite of the convex function φp : IR² → IR and the differentiable function (xi, Fi(x))^T : IRⁿ → IR². Since convex and differentiable functions are semismooth and the composition of semismooth functions is semismooth, it yields that Φ_p is semismooth. 2

An important concept in relation to semismooth function is the SC¹ function, so we next introduce its definition as below.

Definition 3.1 A function f : IRⁿ→ IR is said to be an SC¹ function if f is continuously differentiable and its gradient is semismooth.

We can view SC¹ functions are functions lying between C¹ and C² functions. By defining SC¹ functions, many results regarding the minimization of C² functions can be extended to the minimization of SC¹ functions, see [19] and references therein. For appli- cations and more details of SC¹ functions, please refer to the excellent book [5]. Prop. 3.2 shows that ψ_p is an SC¹ function; hence, if every F_i is SC¹ function then so is Ψ_p. Before presenting its proof, we need a very important and crucial technical lemma, which states

∇ψp is globally Lipschitz continuous. The lemma will not only be used in the proof of Prop. 3.2 but also for the analysis of convergence result of the descent algorithm in Sec. 4.

Lemma 3.1 The gradient of the function ψp defined as (8) is Lipschitz continuous, that is, there exists L > 0 such that

k∇ψ_p(a, b) − ∇ψ_p(c, d)k ≤ Lk(a, b) − (c, d)k, (14) for all (a, b), (c, d) ∈ IR².

Proof. Following the gradient of ψ_p given as in (12) and (13) and then applying the chain rule and quotient rule (the computation is routine though tedious, so we omit the details), we have the following two cases.

If p is even and (a, b) 6= (0, 0), then

∇²_aaψ_p(a, b) =

Ã a^p−1 k(a, b)k^p−1p

− 1

!₂

+ (p − 1)a^p−2b^p k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

,

∇²_abψ_p(a, b) = ∇²_baψ_p(a, b) =

Ã a^p−1 k(a, b)k^p−1p

− 1

! Ã b^p−1 k(a, b)k^p−1p

− 1

!

,

−(p − 1)a^p−1b^p−1 k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

,

∇²_bbψ_p(a, b) =

Ã b^p−1 k(a, b)k^p−1p

− 1

!₂

+ (p − 1)a^pb^p−2 k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

.

(9)

It is clear that |a|^p−1 k(a, b)k^p−1p

≤ 1 and it also follows

|a|^p−2· |b|^p ≤

µ

max{|a|, |b|}

¶_2p−2

≤

µ

p

q

|a|^p+ |b|^p

¶_2p−2

≤ k(a, b)k^2p−2_p ,

that |a|^p−2|b|^p

k(a, b)k^2p−2p

≤ 1. Similarly, |a|^p|b|^p−2 k(a, b)k^2p−2p

≤ 1. (15)

On the other hand, by Lemma 2.5, we have

|a| + |b| ≤√ 2√

a²+ b² =√

2k(a, b)k₂ ≤√

2 · 2^(1/2−1/p)k(a, b)k_p = 2^(1−1/p)k(a, b)k_p. Applying all the above, we can give an upper bound for ∇²_aaψ_p(a, b) as below.

¯¯

¯¯∇²_aaψ_p(a, b)

¯¯

≤

Ã a^p−1 k(a, b)k^p−1p

+ 1

!₂

+(p − 1)|a|^p−2|b|^p k(a, b)k^2p−2p

+ (p − 1)|a|^p−2|b|^p· (|a| + |b|) k(a, b)k^2p−1p

≤ 4 + (p − 1) +(p − 1)|a|^p−2|b|^p· 2^(1−1/p)k(a, b)k_p k(a, b)k^2p−1p

≤ 4 + (p − 1) + (p − 1)2^(1−1/p)

= 4 + (p − 1)

·

1 + 2^(1−1/p)

¸

,

where the last inequality holds due to (15). By the same arguments, we also have

¯¯

¯¯∇²_bbψ_p(a, b)

¯¯

¯¯≤ 4 + (p − 1)

·

1 + 2^(1−1/p)

¸

.

Now, we estimate the upper bound for ∇²_abψp(a, b) = ∇²_baψp(a, b) as below.

¯¯

¯¯∇²_abψ_p(a, b)

¯¯

¯¯=

¯¯

¯¯∇²_baψ_p(a, b)

¯¯

≤

¯¯

¯

a^p−1 k(a, b)k^p−1p

− 1

¯¯

¯·

¯¯

¯

b^p−1 k(a, b)k^p−1p

− 1

¯¯

¯

+(p − 1)|a|^p−1|b|^p−1 k(a, b)k^2p−1p

µ

k(a, b)k_p+ (|a| + |b|)

¶

≤

Ã |a|^p−1 k(a, b)k^p−1p

+ 1

! Ã |b|^p−1 k(a, b)k^p−1p

+ 1

!

+(p − 1)|a|^p−1|b|^p−1 k(a, b)k^2p−2p

+ (p − 1)|a|^p−1|b|^p−1· (|a| + |b|) k(a, b)k^2p−1p

≤ 4 + (p − 1) + (p − 1)|a|^p−1|b|^p−1· 2^(1−1/p)k(a, b)k_p k(a, b)k^2p−1p

≤ 4 + (p − 1) + (p − 1)2^(1−1/p)

= 4 + (p − 1)

·

1 + 2^(1−1/p)

¸

,

(10)

where the third and fourth inequalities are true by the similar result as (15), that is,

|a|^p−1|b|^p−1 k(a, b)k^2p−2p

≤ 1.

If p is odd and (a, b) 6= (0, 0), then we obtain

∇²_aaψ_p(a, b) =

Ãsgn(a) · a^p−1 k(a, b)k^p−1p

− 1

!₂

+sgn(a)sgn(b) · (p − 1)a^p−2b^p k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

,

∇²_abψ_p(a, b) = ∇²_baψ_p(a, b) =

Ãsgn(a) · a^p−1 k(a, b)k^p−1p

− 1

! Ãsgn(b) · b^p−1 k(a, b)k^p−1p

− 1

!

,

−sgn(a)sgn(b) · (p − 1)a^p−1b^p−1 k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

,

∇²_bbψ_p(a, b) =

Ãsgn(b) · b^p−1 k(a, b)k^p−1p

− 1

!₂

+ sgn(a)sgn(b) · (p − 1)a^pb^p−2 k(a, b)k^2p−1p

µ

k(a, b)k_p− (a + b)

¶

.

In fact, the upper bounds for ∇²_aaψp(a, b), ∇²_abψp(a, b), ∇²_bbψp(a, b) remain the same by fol- lowing exactly the same steps as in the case where p is even. Thus, there exist a constant L > 0 independent of (a, b) such that

k∇²ψ_p(a, b)k ≤ L, ∀ (a, b) 6= (0, 0) ∈ IR². Then, by Lemma 2.6, we have

k∇ψ_p(a, b) − ∇ψ_p(c, d)k ≤ Lk(a, b) − (c, d)k, (16) for all (a, b), (c, d) ∈ IR² with (0, 0) 6∈ [(a, b), (c, d)]. Moreover, (16) also holds in case (a, b) = (c, d) = (0, 0) since ∇_aψ_p(a, b) = ∇_bψ_p(a, b) = 0. Therefore, we can assume (a, b) 6= (0, 0). From Lemma 2.3(c), ψ_p is continuously differentiable for all (a, b) ∈ IR² with ∇ψp(0, 0) = (0, 0); then using a continuity argument, we obtain (16) remains true for all (c, d) ∈ IR². Thus, (16) holds for all (a, b), (c, d) ∈ IR² which says ψ_p is globally Lipschitz continuous. 2

Proposition 3.2 The function ψp defined as in (8) is an SC¹ function. Hence, if every F_i is an SC¹ function, then the function Ψ_p given as (10) is also an SC¹ function.

Proof. It is known by Lemma 2.3(c) that ψ_p is continuously differentiable, it remains to show that the gradient of ψp is semismooth. From Lemma 3.1, ∇ψp is Lipschitz continuous;

hence is strictly continuous (locally Lipschitz continuous). Therefore, to check semismooth- ness of ∇ψ_p, we only need to show that ∇ψ_p satisfies Lemma 2.1(b). More specifically, we only need to check semismoothness at (0, 0) because at other points ∇ψ_p is continuously differentiable (see the proof of Lemma 3.1), hence is semismooth. For this purpose, we will

(11)

have to verify that the equation in Lemma 2.1(b) is satisfied, i.e., for any (h₁, h₂) ∈ IR² such that ∇ψ_p is differentiable at (h₁, h₂), we have

∇ψ_p(h₁, h₂) − ∇ψ_p(0, 0) − ∇²ψ_p(h₁, h₂) · h = ◦(k(h₁, h₂)k). (17) To prove (17), we have two cases where p is even and p is odd.

For p is even, we denote (Ξ1, Ξ2) the left-hand side of (17). Then, we have

"

Ξ₁ Ξ₂

#

:=

"

k₁ k₂

#

· φp(h1, h2) −

"

0 0

#

(18)

−







k₁²+

µ(p−1)h^p−2₁ h^p₂ k(h1,h2)k^2p−1p

¶

φ_p(h₁, h₂) k₁· k₂− k₃φ_p(h₁, h₂) k₁· k₂− k₃φ_p(h₁, h₂) k²₂+

µ(p−1)h^p₁h^p−2₂ k(h1,h2)k^2p−1p

¶

φ_p(h₁, h₂)





·

"

h₁ h2

#

,

where

k₁ =

µ h^p−1₁ k(h1, h2)k^p−1p

− 1

¶

, k₂ =

µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

, (19)

k₃ = (p − 1)h^p−1₁ h^p−1₂ k(h₁, h₂)k^2p−1p

.

By plugging (19) into (18) and writing out Ξ₁ and Ξ₂, we obtain that Ξ₁ = 0 and Ξ₂ = 0.

To see this, we compute Ξ₁ as below:

Ξ₁ =

µ h^p−1₁ k(h1, h2)k^p−1p

− 1

¶

φ_p(h₁, h₂) −

µ h^p−1₁ k(h1, h2)k^p−1p

− 1

¶₂

h₁

−(p − 1)h^p−1₁ h^p₂ k(h₁, h₂)k^2p−1p

· φp(h1, h2) −

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h2

+(p − 1)h^p−1₁ h^p₂ k(h₁, h₂)k^2p−1p

· φ_p(h₁, h₂)

= φ_p(h₁, h₂)

"µ

h^p−1₁ k(h1, h2)k^p−1p

− 1

¶

− (p − 1)h^p−1₁ h^p₂ k(h1, h2)k^2p−1p

+ (p − 1)h^p−1₁ h^p₂ k(h1, h2)k^2p−1p

#

−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶₂

h₁−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h₂

= φ_p(h₁, h₂)

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶

−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶₂

h₁

−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h2

=

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶ "

φ_p(h₁, h₂) −

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶

h₁−

µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h₂

#

(12)

=

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶ "

k(h₁, h₂)k_p − h^p₁+ h^p₂ k(h₁, h₂)k^p−1p

#

=

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶

· 0

= 0 ,

where the second-to-last equality is true since h^p₁+h^p₂ = k(h₁, h₂)k^p_pwhen p is even. Similarly,

Ξ₂ =

µ h^p−1₂ k(h1, h2)k^p−1p

− 1

¶

φ_p(h₁, h₂) −

µ h^p−1₂ k(h1, h2)k^p−1p

− 1

¶₂

h₂

−(p − 1)h^p₁h^p−1₂ k(h₁, h₂)k^2p−1p

· φ_p(h₁, h₂) −

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h₁

+(p − 1)h^p₁h^p−1₂ k(h₁, h₂)k^2p−1p

· φ_p(h₁, h₂)

= φp(h1, h2)

"µ

h^p−1₂ k(h1, h2)k^p−1p

− 1

¶

− (p − 1)h^p₁h^p−1₂ k(h1, h2)k^2p−1p

+ (p − 1)h^p₁h^p−1₂ k(h1, h2)k^2p−1p

#

−

µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶₂

h₂−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h₁

= φ_p(h₁, h₂)

µ h^p−1₂ k(h1, h2)k^p−1p

− 1

¶

−

µ h^p−1₂ k(h1, h2)k^p−1p

− 1

¶₂

h₂

−

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h1

=

µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶ "

φ_p(h₁, h₂) −

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶

h₁−

µ h^p−1₂ k(h₁, h₂)k^p−1p

− 1

¶

h₂

#

=

µ h^p−1₂ k(h1, h2)k^p−1p

− 1

¶ "

k(h₁, h₂)k_p − h^p₁+ h^p₂ k(h1, h2)k^p−1p

#

=

µ h^p−1₁ k(h₁, h₂)k^p−1p

− 1

¶

· 0

= 0 ,

where the second-to-last equality is true since h^p₁+ h^p₂ = k(h₁, h₂)k^p_p when p is even. From the above two expressions of Ξ₁ and Ξ₂, it implies that (17) is satisfied. Thus, ∇ψ_p is semismooth at (0, 0) for the case where p is even.

For p is odd, following the same arguments leads to the same verifications. Therefore, we complete proving that ∇ψ_p is semismooth, and hence ψ_p is SC¹ function. The second statement follows immediately from this result. 2

We want to point out one thing that, for p = 2, ψ_p was already proved an SC¹ function in [5, 6] (Indeed, it was first formally shown in [6]). Prop. 3.2 is a general extension for any

(13)

p ≥ 2 and its proof is much more complicated than the case of p = 2. In addition to SC¹ functions, we also introduce LC¹ functions here.

Definition 3.2 A function f : IRⁿ → IR is called an LC¹ function if f is continuously differentiable and its gradient is locally Lipschitz continuous.

The class of LC¹ minimization problems was studied in [21], where the local, superlinear convergence of an approximate Newton method was established under a semismoothness assumption on the gradient function at a solution point. It is obvious that any SC¹ func- tion is an LC¹ function. With the results of Lemma 3.1 and Prop. 3.2, we therefore has the following corollaries.

Corollary 3.1 If every F_i is an LC¹ function, then the function Φ_p given as (9) is strongly semsmooth.

Proof. We know that φp is semismooth, indeed, it is strongly semismooth. This can be seen by Lemma 2.2(c), Lemma 3.1 and Theorem 7 of [23]. Also every LC¹ function is strongly semismooth. Thus, the result follows. 2

Corollary 3.2 The function ψ_p defined as in (8) is an LC¹ function. Hence, if every F_i is an LC¹ function, then the function Ψ_p given as (10) is also an LC¹ function.

Some other issues related to semismooth functions are concepts of piecewise smooth and almost smooth functions. It is well-known that piecewise smooth functions are examples of semismooth functions and there have emerged other examples of semismooth functions that are not piecewise smooth recently, see [23] and references therein. In particular, these examples include the p-norm function with 1 < p < ∞ defined on IRⁿ where n ≥ 2, the Euclidean norm function, pseudo-smooth NCP-functions, smoothing functions, etc.. To close this section, we point out that the NCP-function studied in this paper and [1] is indeed strongly almost smooth since it is based on the p-norm function. We briefly state definition of almost smooth functions and the result as below.

Definition 3.3 The almost smooth (respectively, strongly almost smooth) functions are functions that are semismooth (respectively, strongly semismooth) on the whole space IRⁿ and smooth everywhere except on sets with ”dimension” less than n − 1 in the sense that the sets do not locally partition IRⁿ into multiple connected components.

By applying Lemma 2.2(c), 3.1 and a result in [23], we immediately have an interesting property in relation to strongly almost smoothness for Φ_p. For more details regarding to almost smooth and strongly almost smooth functions, please refer to the recent paper [23].

(14)

Proposition 3.3 If every F_i is an LC¹ function, then the function Φ_p defined as (9) is strongly almost smooth function.

Proof. This result follows by Lemma 2.2(c), Prop. 3.1, and Theorem 7 of [23]. 2

4 A descent method

In this section, we study an almost the same descent method as in Sec. 4 of [1] for solving the unconstrained minimization (11), which does not require the derivative of F involved in the NCP. In fact, we consider the same search direction for the algorithm as in [1]:

d^k:= −∇bψp(x^k, F (x^k)), (20) except the way to obtain the step-size is slightly different (see Step 3). Such a way to find step-size can also be found in the literature, for instance in [11]. Using the property of ψ_p being globally Lipschitz continuous (see Lemma 3.1), we have an alternative proof for the convergence result of the same descent method considered as in [1]. We state the detailed steps as below.

Algorithm 4.1

(Step 0) Choose x⁰ ∈ IRⁿ, ε ≥ 0, σ ∈ (0, 1), β ∈ (0, 1) and set k := 0.

(Step 1) If Ψ_p(x^k) ≤ ε, then Stop.

(Step 2) Let

d^k:= −∇bψp(x^k, F (x^k)).

(Step 3) Compute a step-size tk:= β^m^k, where mk is the smallest nonnegative integer m satisfying the Armijo-type condition:

Ψ_p(x^k+ β^md^k) ≤ Ψ_p(x^k) − σβ^2mkd^kk². (21) (Step 4) Set x^k+1:= x^k+ t_kd^k, k := k + 1 and Go to Step 1.

We wish to show the global convergence result for Algorithm 4.1 under the strongly monotone assumption of F . The following lemmas plus Lemma 3.1 will enable the conver- gence result for the algorithm. In what follows, we assume that the parameter ε used in Algorithm 4.1 is set to be zero and Algorithm 4.1 generates an infinite sequence {x^k}.

Lemma 4.1 ([1, Lem. 4.1]) Let x^k ∈ IRⁿ and F be a monotone function. Then the search direction defined as (20) satisfies the descent condition ∇Ψ_p(x^k)^Td^k < 0 as long as x^k is not a solution of the NCP. Moreover, if F is strongly monotone with modulus µ > 0 then

∇Ψp(x^k)^Td^k ≤ −µkd^kk².