The semismooth-related properties of a merit function and a descent method for the nonlinear

(1)

DOI 10.1007/s10898-006-9027-y O R I G I NA L A RT I C L E

The semismooth-related properties of a merit function and a descent method for the nonlinear

complementarity problem

Jein-Shan Chen

Received: 13 March 2006 / Accepted: 20 March 2006 / Published online: 14 June 2006

Abstract This paper is a follow-up of the work [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted for publication (2004)] where an NCP-function and a descent method were proposed for the nonlinear complementarity problem. An unconstrained reformula- tion was formulated due to a merit function based on the proposed NCP-function.

We continue to explore properties of the merit function in this paper. In particular, we show that the gradient of the merit function is globally Lipschitz continuous which is important from computational aspect. Moreover, we show that the merit function is SC¹function which means it is continuously differentiable and its gradient is semismooth. On the other hand, we provide an alternative proof, which uses the new properties of the merit function, for the convergence result of the descent method considered in [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted for publication (2004)].

Keywords Complementarity· SC¹function· Merit function · Semismooth function · Descent method

1 Introduction

In the past decades, the well-known nonlinear complementarity problem (NCP) has attracted much attention due to its various applications in operations research, eco- nomics, and engineering [6, 11, 17]. The NCP is to find a point x∈ IRⁿsuch that

x≥ 0, F(x) ≥ 0, x, F(x) = 0, (1)

where·, · is the Euclidean inner product and F = (F1, F2,. . . , Fn)^T maps from IRⁿ to IRⁿ. We assume that F is continuously differentiable throughout this paper.

There have been many methods proposed for solving the NCP [9, 11, 17]. Among which, one of the most popular approaches that has been studied intensively recently

J.-S. Chen (

B

⁾

Department of Mathematics National Taiwan Normal University Taipei 11677, Taiwan

e-mail: jschen@math.ntnu.edu.tw

(2)

is to reformulate the NCP as an unconstrained minimization problem [5, 7, 10, 13, 14, 28]. Such a function that can constitute an equivalent unconstrained minimization problem for the NCP is called a merit function. In other words, a merit function is a function whose global minima are coincident with the solutions of the original NCP.

For constructing a merit function, the class of functions, so-called NCP-functions and defined as below, serves an important role.

Definition 1.1 A functionφ : IR²→ IR is called an NCP-function if it satisfies

φ(a, b) = 0 ⇐⇒ a ≥ 0, b ≥ 0, ab = 0. (2)

A popular NCP-function intensively studied recently is the well-known Fischer–

Burmeister NCP-function [7, 8, 24] defined as φ(a, b) =

a²+ b²− (a + b). (3)

Let: IRⁿ→ IRⁿbe

(x) =





φ(x1, F₁(x)) ... φ(xn, Fn(x))



 . (4)

Then the function : IRⁿ→ IR+defined by

(x) := 1

2(x)²= 1 2

n i=1

φ(xi, F_i(x))² (5)

is a merit function for the NCP, i.e., the NCP can be recast as an unconstrained minimization:

xmin∈IRⁿ(x). (6)

In the paper [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)], an NCP- function which is an extension of the Fischer–Burmeister function (3) was studied.

More specifically, they defineφp: IR²→ IR by

φp(a, b) := (a, b)p− (a + b), (7) where(a, b)pdenotes the p-norm of(a, b), i.e., (a, b)p = ^p

|a|^p+ |b|^p. In other words, in the functionφp, the 2-norm of (a, b) in the Fischer–Burmeister function (3) is replaced by more generally a p-norm of(a, b) with p ≥ 2. This function φpis still an NCP-function as was noted in Tseng’s paper [26]. Nonetheless, there was no further study on this NCP-function even for p= 3 until the recent paper [Chen, J.-S.:

J. Optimiz. Theory Appl., Submitted (2004)] by the author. Following the functionφp, we can further defineψp: IR²→ IR₊by

ψp(a, b) :=1

2|φp(a, b)|². (8)

The functionψpis a nonnegative NCP-function and smooth on IR²with some favor- able properties, see [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)]. In this paper, we continue to explore properties ofψpas will be seen in Sect. 3. Analogous to, the function p: IRⁿ→ IRⁿgiven by

(3)

p(x) =





φp(x1, F₁(x)) ... φp(xn, Fn(x))



 (9)

yields a merit functionp: IRⁿ→ IR+for the NCP where

p(x) := 1

2p(x)²= 1 2

n i=1

φp(xi, F_i(x))²=

n i=1

ψp(xi, F_i(x)). (10)

As shown in [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)],pis a continuously differentiable merit function for the NCP. Therefore, classical iterative methods such as Newton method can be applied to the unconstrained smooth minimization of the NCP, i.e.,

xmin∈IRⁿp(x). (11)

On the other hand, derivative-free methods have also attracted much attention which do not require computation of derivatives of F [10, 13, 27]. Derivative-free methods, taking advantages of particular properties of a merit function, are suitable for problems where the derivatives of F are not available or expensive. In this paper, we also study a derivative-free descent algorithm for solving the NCP based on the merit functionp in Sect. 4. Indeed, the descent method was considered in [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)], we apply the new properties ofψp

explored in this paper to provide an alternative proof for the convergence result.

Throughout this paper, IRⁿdenotes the space of n-dimensional real column vectors and^T denotes transpose. For any differentiable function f : IRⁿ→ IR, ∇f (x) denotes the gradient of f at x. For any differentiable mapping F= (F1,. . . , Fm)^T: IRⁿ→ IR^m,

∇F(x) = [∇F1(x) · · · ∇Fm(x)] denotes the transpose Jacobian of F at x. We write z= ◦(α) with α ∈ IR and z ∈ IRⁿto meanz/|α| tends to zero as α → 0. Also, we denote byxp the p-norm of x and byx the Euclidean norm of x. In the whole paper, we assume p is an integer greater than or equal to 2.

2 Preliminaries

In this section, we recall some background concepts and review some known materi- als which are crucial to the subsequent analysis. We begin with the monotonicity of a mapping. Let F : IRⁿ → IRⁿ, then F is monotone ifx − y, F(x) − F(y) ≥ 0, for all x, y∈ IRⁿ; F is strictly monotone ifx−y, F(x)−F(y) > 0, for all x, y ∈ IRⁿand x= y;

and F is strongly monotone with modulusµ > 0 if x−y, F(x)−F(y) ≥ µx−y², for all x, y∈ IRⁿ. Next, we recall the so-called semismooth functions. First, we say that F is strictly continuous (also called ‘locally Lipschitz continuous’) at x∈ IRⁿ[23, Chap. 9]

if there exist scalarsκ > 0 and δ > 0 such that

F(y) − F(z) ≤ κy − z ∀y, z ∈ IRⁿwithy − x ≤ δ, z − x ≤ δ;

and F is strictly continuous if F is strictly continuous at every x∈ IRⁿ. Ifδ can be taken to be∞, then F is Lipschitz continuous with Lipschitz constant κ. Define the function lipF : IRⁿ→ [0, ∞] by

(4)

lipF(x) := lim sup

y,z→x y=z

F(y) − F(z)

y − z .

Then F is strictly continuous at x if and only if lipF(x) is finite. We say F is directionally differentiable at x∈ IRⁿif

F(x; h) := lim

t→0⁺

F(x + th) − F(x)

t exists ∀h ∈ IRⁿ;

and F is directionally differentiable if F is directionally differentiable at every x∈ IRⁿ. F is differentiable (in the Fréchet sense) at x∈ IRⁿif there exists a linear mapping

∇F(x): IRⁿ→ IRⁿsuch that

F(x + h) − F(x) − ∇F(x)h = o(h).

We say that F is continuously differentiable if F is differentiable at every x∈ IRⁿand

∇F is continuous.

If F is strictly continuous, then F is almost everywhere differentiable by Rademach- er’s Theorem—see [3] and [23, Sect. 9J]. In this case, the generalized Jacobian∂F(x) of F at x (in the Clarke sense) can be defined as the convex hull of the generalized Jacobian∂BF(x), where

∂BF(x) :=

lim

x^j→x∇F(x^j)F is differentiable at x^j∈ IRⁿ

.

The notation∂Bis adopted from [19]. In [23, Chap. 9], the case of n= 1 is considered and the notations “ ¯∇” and “¯∂” are used instead of, respectively, “∂B” and “∂”.

Assume F : IRⁿ → IRⁿis strictly continuous. We say F is semismooth at x if F is directionally differentiable at x and, for any V∈ ∂F(x + h), we have

F(x + h) − F(x) − Vh = o(h).

We say F isρ-order semismooth at x (0 < ρ < ∞) if F is semismooth at x and, for any V∈ ∂F(x + h), we have

F(x + h) − F(x) − Vh = O(h¹^+ρ).

The following lemma, proven by Sun and Sun [25, Thm. 3.6] using the definition of generalized Jacobian,(Sun and Sun did not consider the case of o(h) but their argument readily applies to this case.) enables one to study the semismooth property of F by examining only those points x∈ IRⁿwhere F is differentiable and thus work only with the Jacobian of F, rather than the generalized Jacobian.

Lemma 2.1 Suppose F : IRⁿ → IRⁿis strictly continuous and directionally differen- tiable in a neighborhood of x∈ IRⁿ. Then, for any 0< ρ < ∞, the following two statements(where O(·) depends on F and x only) are equivalent:

(a) For any h∈ IRⁿand any V∈ ∂F(x + h),

F(x + h) − F(x) − Vh = o(h) (respectively, O(h^1+ρ)).

(b) For any h∈ IRⁿsuch that F is differentiable at x+ h,

F(x + h) − F(x) − ∇F(x + h)h = o(h) (respectively, O(h^1+ρ)).

(5)

We say F is semismooth (respectively,ρ-order semismooth) if F is semismooth (respectively, ρ-order semismooth) at every x ∈ IRⁿ. We say F is strongly semi- smooth if it is 1-order semismooth. Convex functions and piecewise continuously differentiable functions are examples of semismooth functions. The composition of two (respectively, ρ-order) semismooth functions is also a (respectively, ρ-order) semismooth function. The property of semismoothness plays an important role in nonsmooth Newton methods [19, 21] as well as in some smoothing methods. For extensive discussions of semismooth functions, see [8, 15, 21].

Now, we review some useful properties aboutφp,ψp defined as in (7) and (8), respectively which will be used for the analysis in the subsequent sections. We notice that the functionφpreduces to the Fischer–Burmeister function given as in (3) when p= 2. Thus, most properties are extensions of properties of Fischer–Burmeister function. For detailed proofs of them, please refer to [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)].

Lemma 2.2 ([Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004), Prop. 3.1]) Let φp: IR²→ IR be defined as (7) where p ≥ 2. Then

(a) φpis an NCP-function, i.e., it satisfies(2).

(b) φpis sub-additive, i.e.,φp(w + w) ≤ φp(w) + φp(w) for all w, w∈ IR². (c) φpis positively homogeneous, i.e.,φp(αw) = αφp(w) for all w ∈ IR²andα ≥ 0.

(d) φpis convex, i.e.,φp(αw + (1 − α)w) ≤ αφp(w) + (1 − α)φp(w) for all w, w∈ IR² andα ≥ 0.

(e) φpis Lipschitz continuous with L₁= 1 +√

2, i.e.,|φp(w) − φp(w)| ≤ L1w − w;

or with L₂= 1 + 2^(1−1/p), i.e.,|φp(w) − φp(w)| ≤ L2w − wpfor all w, w∈ IR². Lemma 2.2(b) and (c) imply thatφpis sublinear, i.e., it satisfies

φp(αw + βw) ≤ αφp(w) + βφp(w)

for all w, w ∈ IR² andα, β ≥ 0. This can be seen by the fact [1, Prop. 3.11] that a function from IRⁿto IR is sublinear if and only if it is positively homogeneous and sub- additive. Note that the sublinear condition is stronger than convexity. In fact, under Lemma 2.2(c), Lemma 2.2(b) is equivalent to Lemma 2.2(d). This is from [22, Thm.

4.7] that a positively homogeneous function is convex if and only if it is sub-additive.

Lemma 2.3 ([Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004), Prop. 3.2]) Let φp,ψpbe defined as(7) and (8), respectively, where p ≥ 2. Then

(a) ψpis an NCP-function, i.e., it satisfies (2).

(b) ψp(a, b) ≥ 0 for all (a, b) ∈ IR².

(c) ψpis continuously differentiable everywhere. Moreover,∇aψp(0, 0) = ∇bψp(0, 0) = 0 and

∇aψp(a, b) =

a^p⁻¹

(a, b)^p−1p

− 1

φp(a, b),

∇bψp(a, b) =

b^p−1

(a, b)^p−1p

− 1

φp(a, b), (12)

for(a, b) = (0, 0) with p is even, whereas

(6)

∇aψp(a, b) =

sgn(a) · a^p−1

(a, b)^p−1p

− 1

φp(a, b),

∇bψp(a, b) =

sgn(b) · b^p−1

(a, b)^pp⁻¹

− 1

φp(a, b), (13)

for(a, b) = (0, 0) with p is odd.

(d) ∇aψp(a, b) · ∇bψp(a, b) ≥ 0 for all (a, b) ∈ IR². The equality holds if and only if φp(a, b) = 0.

(e) ∇aψp(a, b) = 0 ⇐⇒ ∇bψp(a, b) = 0 ⇐⇒ φp(a, b) = 0.

Lemma 2.4 ([Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004), Prop. 3.5]) Let

p: IRⁿ→ IR be defined as (10) where p ≥ 2. Assume F is either strongly monotone or uniform P-function, then the level setsL(p,γ ) are bounded for all γ ∈ IR.

In additional to the above properties ofφpandψp, we still need the following two lemmas for the analysis in the subsequent sections.

Lemma 2.5 ([12, (1.3)]) Let x ∈ IRⁿand 1< p1< p2. Then

xp2 ≤ xp₁≤ n^(1/p¹^−1/p²⁾xp2.

Lemma 2.6 If F : D⊆ IRⁿ→ IR^mhas a second derivative at each point of a convex set D₀⊆ D, then

∇F(y) − ∇F(x) ≤ sup

0≤t≤1∇²F(x + t(y − x)) · y − x.

Proof This is Theorem 3.3.5 of [16] (page 78). ₂

3 The semismooth-related properties of the NCP and merit functions

In this section, we study some semismooth-related properties ofφpincluding semi- smooth and almost smooth properties as well as SC¹ and LC¹ properties of ψp. The semismooth property is very important from the computational point of view.

In particular, it plays a fundamental role in the superlinear convergence analysis of generalized Newton methods, see [19, 21, 29]. The classes of SC¹and LC¹functions have been a subject of interest in relation to the development minimization algorithm.

We will introduce their definitions later. We begin this section by showing that the functionsφpandpare semismooth (in fact, they are strongly semismooth as shown in Corollary 3.1). Its proof is easy and routine.

Proposition 3.1 The functionp: IRⁿ→ IRⁿdefined as(9) is semismooth.

Proof We notice thatφp is convex by Lemma 2.2(d), and hence is a semismooth function. We also observe that each component ofp(x) is the composite of the convex functionφp: IR² → IR and the differentiable function (xi, F_i(x))^T: IRⁿ → IR². Since convex and differentiable functions are semismooth and the composition of semismooth functions is semismooth, it yields thatpis semismooth. 2 An important concept in relation to semismooth function is the SC¹ function, so we next introduce its definition as below.

(7)

Definition 3.1 A function f : IRⁿ→ IR is said to be an SC¹function if f is continuously differentiable and its gradient is semismooth.

We can view SC¹functions are functions lying between C¹ and C² functions. By defining SC¹functions, many results regarding the minimization of C² functions can be extended to the minimization of SC¹ functions, see [18] and references therein.

For applications and more details of SC¹functions, please refer to the excellent book [4]. Prop. 3.2 shows thatψpis an SC¹function; hence, if every F_iis SC¹function then so isp. Before presenting its proof, we need a very important and crucial technical lemma, which states∇ψpis globally Lipschitz continuous. The lemma will not only be used in the proof of Prop. 3.2 but also for the analysis of convergence result of the descent algorithm in Sect. 4.

Lemma 3.1 The gradient of the functionψpdefined as(8) is Lipschitz continuous, that is, there exists L> 0 such that

∇ψp(a, b) − ∇ψp(c, d) ≤ L(a, b) − (c, d), (14) for all(a, b), (c, d) ∈ IR².

Proof Following the gradient ofψp given as in (12) and (13) and then applying the chain rule and quotient rule (the computation is routine though tedious, so we omit the details), we have the following two cases.

If p is even and(a, b) = (0, 0), then

∇_aa²ψp(a, b) =

a^p−1

(a, b)^pp⁻¹

− 1 2

+(p − 1)a^p−2b^p

(a, b)^2pp ⁻¹

(a, b)p− (a + b)

,

∇_ab² ψp(a, b) = ∇_ba² ψp(a, b) =

a^p⁻¹

(a, b)^p−1p

− 1

b^p⁻¹

(a, b)^p−1p

− 1

,

−(p − 1)a^p⁻¹b^p⁻¹

(a, b)^2p−1p

(a, b)p− (a + b)

,

∇_bb² ψp(a, b) =

b^p−1

(a, b)^p−1p

− 1 2

+(p − 1)a^pb^p−2

(a, b)^2p−1p

(a, b)p− (a + b)

.

It is clear that |a|^p⁻¹

(a, b)^p−1p

≤ 1 and it also follows

|a|^p−2· |b|^p≤

max{|a|, |b|}

_2p−2

≤

p

|a|^p+ |b|^p2p−2

≤ (a, b)^2p−2p , that

|a|^p−2|b|^p

(a, b)^2p−2p

≤ 1. Similarly, |a|^p|b|^p−2

(a, b)^2p−2p

≤ 1. (15)

On the other hand, by Lemma 2.5, we have

|a| + |b| ≤√ 2

a²+ b²=√

2(a, b)2≤√

2· 2^(1/2−1/p)(a, b)p= 2^(1−1/p)(a, b)p.

(8)

Applying all the above, we can give an upper bound for∇_aa² ψp(a, b) as below.

2

aaψp(a, b)

≤

a^p⁻¹

(a, b)^p−1p

+ 1 ₂

+(p − 1)|a|^p⁻²|b|^p

(a, b)^2p−2p

+(p − 1)|a|^p⁻²|b|^p· (|a| + |b|)

(a, b)^2p−1p

≤ 4 + (p − 1) +(p − 1)|a|^p⁻²|b|^p· 2^(1−1/p)(a, b)p

(a, b)^2pp ⁻¹

≤ 4 + (p − 1) + (p − 1)2^(1−1/p)

= 4 + (p − 1)

1+ 2^(1−1/p)

,

where the last inequality holds due to (15). By the same arguments, we also have

2

bbψp(a, b)

1+ 2^(1−1/p)

.

Now, we estimate the upper bound for∇_ab² ψp(a, b) = ∇_ba² ψp(a, b) as below.

2

abψp(a, b) _ba² ψp(a, b)

≤ a^p⁻¹

(a, b)^pp⁻¹

− 1 · b^p⁻¹

(a, b)^pp⁻¹

− 1

+(p − 1)|a|^p⁻¹|b|^p⁻¹

(a, b)^2p−1p

(a, b)p+ (|a| + |b|)

≤

|a|^p−1

(a, b)^p−1p

+ 1

|b|^p−1

(a, b)^p−1p

+ 1

+(p − 1)|a|^p−1|b|^p−1

(a, b)^2p−2p

+(p − 1)|a|^p−1|b|^p−1· (|a| + |b|)

(a, b)^2pp ⁻¹

≤ 4 + (p − 1) +(p − 1)|a|^p−1|b|^p−1· 2^(1−1/p)(a, b)p

(a, b)^2p−1p

≤ 4 + (p − 1) + (p − 1)2^(1−1/p)

= 4 + (p − 1)

1+ 2^(1−1/p)

,

where the third and fourth inequalities are true by the similar result as (15), that is,

|a|^p⁻¹|b|^p⁻¹

(a, b)^2p−2p

≤ 1.

If p is odd and(a, b) = (0, 0), then we obtain

(9)

∇aa²ψp(a, b) =

sgn(a) · a^p−1

(a, b)^p−1p

− 1 2

+sgn(a)sgn(b) · (p − 1)a^p−2b^p

(a, b)^2p−1p

(a, b)p− (a + b)

,

∇ab² ψp(a, b) = ∇ba²ψp(a, b) =

sgn(a) · a^p⁻¹

(a, b)^pp⁻¹

− 1

sgn(b) · b^p⁻¹

(a, b)^pp⁻¹

− 1

,

−sgn(a)sgn(b) · (p − 1)a^p⁻¹b^p⁻¹

(a, b)^2pp⁻¹

(a, b)p− (a + b)

,

∇bb² ψp(a, b) =

sgn(b) · b^p⁻¹

(a, b)^pp⁻¹

− 1 2

+sgn(a)sgn(b) · (p − 1)a^pb^p⁻²

(a, b)^2pp⁻¹

(a, b)p− (a + b)

.

In fact, the upper bounds for∇_aa² ψp(a, b), ∇_ab² ψp(a, b), ∇_bb² ψp(a, b) remain the same by following exactly the same steps as in the case where p is even. Thus, there exist a constant L> 0 independent of (a, b) such that

∇²ψp(a, b) ≤ L, ∀ (a, b) = (0, 0) ∈ IR². Then, by Lemma 2.6, we have

∇ψp(a, b) − ∇ψp(c, d) ≤ L(a, b) − (c, d), (16) for all (a, b), (c, d) ∈ IR² with (0, 0) ∈ [(a, b), (c, d)]. Moreover, (16) also holds in case(a, b) = (c, d) = (0, 0) since ∇aψp(a, b) = ∇bψp(a, b) = 0. Therefore, we can assume(a, b) = (0, 0). From Lemma 2.3(c), ψpis continuously differentiable for all (a, b) ∈ IR²with∇ψp(0, 0) = (0, 0); then using a continuity argument, we obtain (16) remains true for all(c, d) ∈ IR². Thus, (16) holds for all(a, b), (c, d) ∈ IR²which says

ψpis globally Lipschitz continuous. 2

Proposition 3.2 The functionψpdefined as in(8) is an SC¹function. Hence, if every Fiis an SC¹function, then the functionpgiven as(10) is also an SC¹function.

Proof It is known by Lemma 2.3(c) thatψpis continuously differentiable, it remains to show that the gradient ofψpis semismooth. From Lemma 3.1, ∇ψpis Lipschitz continuous; hence is strictly continuous (locally Lipschitz continuous). Therefore, to check semismoothness of∇ψp, we only need to show that∇ψpsatisfies Lemma 2.1(b).

More specifically, we only need to check semismoothness at(0, 0) because at other points∇ψpis continuously differentiable (see the proof of Lemma 3.1), hence is semismooth. For this purpose, we will have to verify that the equation in Lemma 2.1(b) is satisfied, i.e., for any(h1, h₂) ∈ IR²such that∇ψpis differentiable at(h1, h₂), we have

∇ψp(h1, h2) − ∇ψp(0, 0) − ∇²ψp(h1, h2) · h = ◦((h1, h2)). (17) To prove (17), we have two cases where p is even and p is odd.

For p is even, we denote(1,2) the left-hand side of (17). Then, we have

1

2

:=

k₁ k₂

· φp(h1, h₂) −

0 0

−





 k²₁+

(p−1)h^p−2₁ h^p₂

(h1,h2)^2p−1p

φp(h1, h₂) k1· k2− k3φp(h1, h₂) k₁· k2− k3φp(h1, h₂) k²₂+

(p−1)h^p₁h^p−2₂

(h1,h₂)^2p−1p

φp(h1, h₂)





 ·

h₁ h2

,

(18)

(10)

where

k₁=

h^p−1₁

(h1, h₂)^pp⁻¹

− 1

,

k2=

h^p₂⁻¹

(h1, h₂)^p−1p

− 1

, (19)

k₃= (p − 1)h^p−1₁ h^p−1₂

(h1, h₂)^2pp⁻¹

.

By plugging (19) into (18) and writing out1 and2, we obtain that1 = 0 and 2= 0. To see this, we compute 1as below:

1=

h^p₁⁻¹

(h1, h₂)^p−1p

− 1

φp(h1, h₂) −

h^p₁⁻¹

(h1, h₂)^p−1p

− 1

2

h₁

−(p − 1)h^p₁⁻¹h^p₂

(h1, h₂)^2p−1p

· φp(h1, h2) −

h^p₁⁻¹

(h1, h₂)^p−1p

− 1

h^p−1₂

(h1, h₂)^p−1p

− 1

h2

+(p − 1)h^p₁⁻¹h^p₂

(h1, h2)^2p−1p

· φp(h1, h2)

= φp(h1, h2)

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

−(p − 1)h^p₁⁻¹h^p₂

(h1, h2)^2pp⁻¹

+(p − 1)h^p₁⁻¹h^p₂

(h1, h2)^2pp⁻¹

−

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

2

h₁−

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

h^p−1₂

(h1, h2)^pp⁻¹

− 1

h2

= φp(h1, h2)

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

−

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

2

h₁

−

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

h2

=

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

φp(h1, h2) −

h^p₁⁻¹

(h1, h2)^pp⁻¹

− 1

h1−

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

h2

=

h^p−1₁

(h1, h2)^pp⁻¹

− 1

(h1, h2)p− h^p₁+ h^p₂

(h1, h2)^pp⁻¹

=

h^p−1₁

(h1, h2)^pp⁻¹

− 1

· 0

= 0 ,

where the second-to-last equality is true since h^p₁ + h^p₂ = (h1, h₂)^ppwhen p is even.

Similarly,

(11)

2=

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

φp(h1, h2) −

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

2

h2

−(p − 1)h^p1h^p₂⁻¹

(h1, h2)^2p−1p

· φp(h1, h₂) −

h^p₁⁻¹

(h1, h2)^p−1p

− 1

h^p₂⁻¹

(h1, h2)^p−1p

− 1

h₁

+(p − 1)h^p₁h^p−1₂

(h1, h2)^2pp⁻¹

· φp(h1, h2)

= φp(h1, h₂)

h^p₂⁻¹

(h1, h2)^p−1p

− 1

−(p − 1)h^p1h^p₂⁻¹

(h1, h2)^2p−1p

+(p − 1)h^p1h^p₂⁻¹

(h1, h2)^2p−1p

−

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

2

h2−

h^p−1₁

(h1, h2)^pp⁻¹

− 1

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

h1

= φp(h1, h2)

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

−

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

2

h2

−

h^p₁⁻¹

(h1, h2)^p−1p

− 1

h^p−1₂

(h1, h2)^p−1p

− 1

h1

=

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

φp(h1, h2) −

h^p−1₁

(h1, h2)^pp⁻¹

− 1

h₁−

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

h2

=

h^p₂⁻¹

(h1, h2)^pp⁻¹

− 1

(h1, h2)p− h^p₁+ h^p₂

(h1, h2)^pp⁻¹

=

h^p₁⁻¹

(h1, h2)^p−1p

− 1

· 0

= 0 ,

where the second-to-last equality is true since h^p₁ + h^p₂ = (h1, h2)^ppwhen p is even.

From the above two expressions of1and2, it implies that (17) is satisfied. Thus,

∇ψpis semismooth at(0, 0) for the case where p is even.

For p is odd, following the same arguments leads to the same verifications. There- fore, we complete proving that∇ψpis semismooth, and henceψpis SC¹function. The second statement follows immediately from this result. 2 We want to point out one thing that, for p = 2, ψp was already proved an SC¹ function in [4, 5] (Indeed, it was first formally shown in [5]). Prop. 3.2 is a general extension for any p≥ 2 and its proof is much more complicated than the case of p = 2.

In addition to SC¹functions, we also introduce LC¹functions here.

Definition 3.2 A function f : IRⁿ→ IR is called an LC¹ function if f is continuously differentiable and its gradient is locally Lipschitz continuous.

The class of LC¹ minimization problems was studied in [20], where the local, superlinear convergence of an approximate Newton method was established under a semismoothness assumption on the gradient function at a solution point. It is obvious that any SC¹function is an LC¹function. With the results of Lemma 3.1 and Prop.

3.2, we therefore has the following corollaries.

Corollary 3.1 If every F_iis an LC¹function, then the functionpgiven as(9) is strongly semsmooth.

(12)

Proof We know thatφpis semismooth, indeed, it is strongly semismooth. This can be seen by Lemma 2.2(c), Lemma 3.1 and Theorem 7 of [Qi, L., Tseng, P.: Math. Oper.

Res., Submitted (2002)]. Also every LC¹ function is strongly semismooth. Thus, the

result follows. ₂

Corollary 3.2 The functionψpdefined as in(8) is an LC¹function. Hence, if every F_i is an LC¹function, then the functionpgiven as(10) is also an LC¹function.

Some other issues related to semismooth functions are concepts of piecewise smooth and almost smooth functions. It is well-known that piecewise smooth functions are examples of semismooth functions and there have emerged other examples of semismooth functions that are not piecewise smooth recently, see [Qi, L., Tseng, P.: Math. Oper. Res., Submitted (2002)] and references therein. In particular, these examples include the p-norm function with 1< p < ∞ defined on IRⁿwhere n≥ 2, the Euclidean norm function, pseudo-smooth NCP-functions, smoothing functions, etc.. To close this section, we point out that the NCP-function studied in this paper and [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)] is indeed strongly almost smooth since it is based on the p-norm function. We briefly state definition of almost smooth functions and the result as below.

Definition 3.3 The almost smooth (respectively, strongly almost smooth) functions are functions that are semismooth (respectively, strongly semismooth) on the whole space IRⁿand smooth everywhere except on sets with “dimension” less than n− 1 in the sense that the sets do not locally partition IRⁿinto multiple connected components.

By applying Lemma 2.2(c), 3.1 and a result in [Qi, L., Tseng, P.: Math. Oper. Res., Submitted (2002)], we immediately have an interesting property in relation to strongly almost smoothness forp. For more details regarding to almost smooth and strongly almost smooth functions, please refer to the recent paper [Qi, L., Tseng, P.: Math.

Oper. Res., Submitted (2002)].

Proposition 3.3 If every Fiis an LC¹function, then the functionpdefined as(9) is strongly almost smooth function.

Proof This result follows by Lemma 2.2(c), Prop. 3.1, and Theorem 7 of [Qi, L., Tseng,

P.: Math. Oper. Res., Submitted (2002)]. ₂

4 A descent method

In this section, we study an almost the same descent method as in Sect. 4 of [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)] for solving the unconstrained mini- mization (11), which does not require the derivative of F involved in the NCP. In fact, we consider the same search direction for the algorithm as in [Chen, J.-S.: J. Optimiz.

Theory Appl., Submitted (2004)]:

d^k:= −∇bψp(x^k, F(x^k)), (20) except the way to obtain the step-size is slightly different (see Step 3). Such a way to find step-size can also be found in the literature, for instance in [10]. Using the property ofψp being globally Lipschitz continuous (see Lemma 3.1), we have an

(13)

alternative proof for the convergence result of the same descent method considered as in [Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004)]. We state the detailed steps as below.

Algorithm 4.1 (Step 0) Choose x⁰∈ IRⁿ,ε ≥ 0, σ ∈ (0, 1), β ∈ (0, 1) and set k := 0.

(Step 1) Ifp(x^k) ≤ ε, then Stop.

(Step 2) Let

d^k:= −∇bψp(x^k, F(x^k)).

(Step 3) Compute a step-size t_k:= β^m^k, where m_kis the smallest nonnegative inte- ger m satisfying the Armijo-type condition:

p(x^k+ β^md^k) ≤ p(x^k) − σβ^2md^k². (21) (Step 4) Set x^k+1:= x^k+ tkd^k, k := k + 1 and Go to Step 1.

We wish to show the global convergence result for Algorithm 4.1 under the strongly monotone assumption of F. The following lemmas plus Lemma 3.1 will enable the convergence result for the algorithm. In what follows, we assume that the parameter ε used in Algorithm 4.1 is set to be zero and Algorithm 4.1 generates an infinite sequence{x^k}.

Lemma 4.1 ([Chen, J.-S.: J. Optimiz. Theory Appl., Submitted (2004), Lem. 4.1]) Let x^k∈ IRⁿand F be a monotone function. Then the search direction defined as (20) sat- isfies the descent condition∇p(x^k)^Td^k< 0 as long as x^kis not a solution of the NCP.

Moreover, if F is strongly monotone with modulusµ > 0 then ∇p(x^k)^Td^k≤ −µd^k². Lemma 4.2 If F is strongly monotone, then the NCP has at most one solution.

Proof Suppose there are two solutionsζ^∗, x^∗∈ IRⁿsuch that F(ζ^∗), ζ^∗ = 0,

F(ζ^∗) ≥ 0, ζ^∗≥ 0 and

F(x^∗), x^∗ = 0, F(x^∗) ≥ 0, x^∗≥ 0.

By F is strongly monotone, we haveF(ζ^∗) − F(x^∗), ζ^∗− x^∗ > 0. However,

F(ζ^∗) − F(x^∗), ζ^∗− x^∗

= F(ζ^∗), ζ^∗ + F(x^∗), x^∗ − F(ζ^∗), x^∗ − F(x^∗), ζ^∗

= −F(ζ^∗), x^∗ − F(x^∗), ζ^∗

≤ 0,

where the inequality is due to F(ζ^∗), ζ^∗, F(x^∗), x^∗are all nonnegative. Hence, it is a contradiction and therefore there is at most one solution for the NCP. ₂ Proposition 4.1 Suppose that F is continuously differentiable and strongly monotone with modulusµ > 0. Let x⁰ ∈ IRⁿ be any starting point andL(x⁰) denote its level set. Assume∇F is Lipschitz continuous inL(x⁰). Then the sequence {x^k} generated by Algorithm 4.1 converges to the unique solution of the NCP.

Proof From Lemma 3.1 and the assumption of∇F being Lipschitz continuous, we obtain∇pis also Lipschitz continuous inL(x⁰), i.e.,

∇p(x) − ∇p(y) ≤ Lx − y, ∀x, y ∈L(x⁰)