Applied Mathematics and Computation

(1)

Geometric views of the generalized Fischer–Burmeister function and its induced merit function

Huai-Yin Tsai, Jein-Shan Chen

^⇑^,1

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

a r t i c l e i n f o

Keywords:

Curvature Surface Level curve NCP-function Merit function

a b s t r a c t

In this paper, we study geometric properties of surfaces of the generalized Fischer–Burmei- ster function and its induced merit function. Then, a visualization is proposed to explain how the convergent behaviors are inﬂuenced by two descent directions in merit function approach. Based on the geometric properties and visualization, we have more intuitive ideas about how the convergent behavior is affected by changing parameter. Furthermore, geometric view indicates how to improve the algorithm to achieve our goal by setting proper value of the parameter in merit function approach.

1. Introduction

The nonlinear complementarity problem (NCP) is to ﬁnd a point x 2 Rⁿsuch that

x P 0; FðxÞ P 0; hx; FðxÞi ¼ 0; ð1Þ

where h; i is the Euclidean inner product and F ¼ ðF₁; . . . ;F_nÞ^Tis a map from Rⁿto Rⁿ. We assume that F is continuously differentiable throughout this paper. The NCP has attracted much attention because of its wide applications in the ﬁelds of eco- nomics, engineering, and operations research[8,11,16], to name a few.

Many methods have been proposed to solve the NCP; see[1,14,16,20,22,25]and the references therein. One of the most powerful and popular approach is to reformulate the NCP as a system of nonlinear equations[21,23,28], or an unconstrained minimization problem[9,10,12,15,18,19,24,27]. The objective function that can constitute an equivalent unconstrained minimization problem is called a merit function, whose global minima are coincident with the solutions of the original NCP. To construct a merit function, a class of functions, called NCP-functions and deﬁned below, plays a signiﬁcant role.

A function / : R²! R is called an NCP-function if it satisﬁes

/ða; bÞ ¼ 0 () a P 0; b P 0; ab ¼ 0: ð2Þ

Equivalently, / is an NCP-function if the set of its zeros is the two nonnegative semiaxes. An important NCP-function, which plays a central role in the development of efﬁcient algorithms for the solution of the NCP, is the well-known Fischer–Burmeister (FB) NCP-function[12,13]deﬁned as

⇑Corresponding author.

E-mail addresses:tasiwhyin@gmail.com(H.-Y. Tsai),jschen@math.ntnu.edu.tw(J.-S. Chen).

1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Ofﬁce. The author’s work is supported by Ministry of Science and Technology, Taiwan.

Contents lists available atScienceDirect

Applied Mathematics and Computation

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / a m c

(2)

/ða; bÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a²þ b² q

ða þ bÞ: ð3Þ

With the NCP function, we can obtain an equivalent formulation of the NCP by a system of equations:

UðxÞ ¼

/ðx1;F1ðxÞÞ

/ðxn;FnðxÞÞ 0

BB BB BB

@

1 CC CC CC A

¼ 0: ð4Þ

In other words, we have

x solves the NCP ()UðxÞ ¼ 0:

In view of this, we deﬁne a real-valued functionW:Rⁿ! Rþ

WðxÞ :¼1

2kUðxÞk²¼1 2

Xⁿ

i¼1

/²ðxi;FiðxÞÞ: ð5Þ

It is known thatWa merit function of the NCP, i.e., the NCP is equivalent to an unconstrained minimization problem:

minx2Rⁿ W_ðxÞ: _ð6Þ

Merit functions is frequently used in designing numerical algorithms for solving the NCP. In particular, we can apply an iterative algorithm to minimize the merit function with hope of obtaining its global minimum.

Recently, the so-called generalized Fischer–Burmeister function was proposed in[3,4]. More speciﬁcally, they considered /_p:R²! R and

/pða; bÞ :¼ kða; bÞk_p ða þ bÞ; ð7Þ

where p > 1 is an arbitrary fixed real number and kða; bÞk_pdenotes the p-norm of ða; bÞ, i.e., kða; bÞk_p¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jaj^pþ jbj^p pp

. In other words, in the function /_p, the 2-norm of ða; bÞ in the FB function is replaced by a more general p-norm. The function /_pis still an NCP-function, which naturally induces another NCP-function w_p:R²! Rþgiven by

w_pða; bÞ :¼1

2j/pða; bÞj²: ð8Þ

For any given p > 1, the function w_pis shown to possess all favorable properties of the FB function w; see[2–4]. It plays an important part in our study throughout the paper. LikeU, the operatorUp:Rⁿ! Rⁿdeﬁned as

UpðxÞ ¼

/_pðx1;F1ðxÞÞ

/_pðxn;FnðxÞÞ 0

BB BB BB

@

1 CC CC CC A

ð9Þ

yields a family of merit functionsWp:Rⁿ! Rþfor the NCP:

WpðxÞ :¼1

2kUpðxÞk²¼Xⁿ

i¼1

wpðxi;FiðxÞÞ: ð10Þ

Analogously, the NCP is equivalent to an unconstrained minimization problem:

min

x2Rⁿ WpðxÞ: ð11Þ

It was shown that if F is monotone[15]or an P0-function[10], then any stationary point ofWis a global minima of the unconstrained minimization min_x2RⁿWðxÞ, and hence solves the NCP. The similar results were generalized toWp-case in [4]. On the other hand, there are many classical iterative methods applied to this unconstrained minimization of the NCP.

Derivative-free methods[29]are suitable for problems where the derivatives of F are not available or expansive. Some derivative-free algorithms with global convergence results were proposed to solve the NCP based on generalized Fischer–Burmei- ster merit function. For example,[4,5]pointed out that the performance of the algorithm is inﬂuenced by parameter p. In addition, there have been observed some phenomenon in the derivative-free algorithm studied in[5]. More speciﬁcally, there occurs kind of ‘‘cliff’’ in the convergent behavior depicted asFig. 1.

During these years, we are frequently asked about what is the main factor causing this and how parameter p affects convergent behavior? These are what we are eager to know of. In light of our earlier numerical experience, we ﬁnd that ﬁguring out the geometric properties of /_p and w_p may be a key way to answer the aforementioned puzzles. In view of this

(3)

motivation, we aim to do analysis from geometric view in this paper. More speciﬁcally, the objective of this paper is to study the relation between convergent behavior and parameter p via aspect of geometry in which the graphs of /_pand w_pcan be regarded as families of surfaces embedded in R³.

This paper is organized as follows. In Section2, we propose some geometric properties of /_pand present its surface structure by ﬁgures. In Section3, we study properties of w_p, and summarize the comparison between /_pand w_p. In Section4, we investigate a geometric visualization to see possible convergence behavior with different p by a few examples. Finally, we state the conclusion.

2. Geometric view of /_p

In this section, we study some geometric properties of /_pand interpret their meanings. We present the family of surfaces of /_pða; bÞ where p 2 ð1; þ1Þ, seeFigs. 2 and 3. When we fix a real number p with 1 < p < þ1,Fig. 3gives us intuitive image that the surface shape is indeed influenced by the value of p. From the definition of p-norm, we know that kða; bÞk₁:¼ jaj þ jbj, and kða; bÞk₁:¼ maxfjaj; jbjg. It is trivial that /pða; bÞ ! /1ða; bÞ :¼ jaj þ jbj ða þ bÞ pointwisely, see Fig. 3(a) and (b). On the other hand /_pða; bÞ ! /₁ða; bÞ :¼ maxfjaj; jbjg ða þ bÞ pointwisely, seeFig. 3(e) and (f). Note that /₁ða; bÞ is not an NCP function because when a > 0 and b > 0, we have /₁ða; bÞ ¼ 0 whereas /₁ða; bÞ is an NCP function but not differentiable when a ¼ b.

Next, we give some lemmas which will be used in subsequent analysis.

Lemma 2.1 [6, Lemma 3.1]. If a > 0 and b > 0, then ða þ bÞ^p>a^pþ b^pfor all p 2 ð1; þ1Þ.

Fig. 1. ‘‘Cliff’’ phenomenon that appears in some derivative-free algorithm.

−10

−5 0

5

10 −10

−5 0

5

10

−10 0 10 20 30 40

b−axis a−axis

z−axis

Fig. 2. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½10; 10 ½10; 10.

(4)

Lemma 2.2 [17, Lemma 1.3]. Let x ¼ ðx1;x2; . . . ;xnÞ 2 Rⁿ and kxk_p:¼ Pn i¼1jxij^p

¹_p

. If 1 < p₁<p₂, then kxk_p

2

6kxk_p

1

6 n

p11_p2¹

kxk_p₂.

Lemma 2.3 [5, Lemma 3.2]. Let /_p:R²! R be given as in(7)where p 2 ð1; þ1Þ. Then, 2 2¹^p

j minfa; bgj 6 j/pða; bÞj 6 2 þ 2 ¹^p

j minfa; bgj:

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

Fig. 3. The surface of z ¼ /pða; bÞ with different p.

(5)

Proposition 2.1. Let /_p:R²! R be given as in(7)where p 2 ð1; þ1Þ. Then,

(a) ða > 0 and b > 0Þ () /_pða; bÞ < 0;

(b) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () /_pða; bÞ ¼ 0;

(c) b ¼ 0 and a < 0 ) /_pða; bÞ ¼ 2a > 0;

(d) a ¼ 0 and b < 0 ) /_pða; bÞ ¼ 2b > 0.

Proof

(a) If a > 0 and b > 0, it is easy to see /_pða; bÞ < 0 by Lemma 2.1. Conversely, because ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jaj^pþ jbj^p pp

Pjaj and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

jaj^pþ jbj^p pp

Pjbj, we have ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jaj^pþ jbj^p pp

Pmaxfjaj; jbjg. Suppose a 6 0 or b 6 0, then we have maxfjaj; jbjg P ða þ bÞ which implies /_pða; bÞ P 0. This is a contradiction.

(b) By deﬁnition of /_pða; bÞ, we know

/_pða; 0Þ ¼ jaj a ¼ 0 a P 0;

2a a < 0;

/_pð0; bÞ ¼ jbj b ¼ 0 b P 0;

2b b < 0;

which say that ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ ) /_pða; bÞ ¼ 0. Conversely, suppose /pða; bÞ ¼ 0. If a < 0 or b < 0, mimicking the arguments of part (a) yields

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jaj^pþ jbj^p qp

>maxfjaj; jbjg > a þ b;

which implies /_pða; bÞ > 0. Thus, there must hold a P 0 and b P 0. Furthermore, one of a and b must be 0 from part (a).

The proofs of (c) and (d) are direct from the proof of part (b). h

Proposition 2.1(a)shows that /_pða; bÞ is negative on the ﬁrst quadrant of R²-plane, seeFig. 4, whileProposition 2.1(b) shows that /_pða; bÞ ¼ 0 can only happen on the nonnegative semiaxes (i.e., a P 0; b ¼ 0 or a ¼ 0; b P 0). In fact, this proposition is also equivalent to saying that /_pða; bÞ is an NCP-function. In addition,Proposition 2.1(b)–(d) indicate that the value of p does not affect the value /_pða; bÞ on the a-axis and b-axis.

Proposition 2.2. Let /p:R²! R be given as in(7)where p 2 ð1; þ1Þ. Then,

(a) /_pða; bÞ ¼ /pðb; aÞ;

(b) /_pis convex, i.e.,

/pð

a

w þ ð1

a

Þw⁰Þ 6

a

/pðwÞ þ ð1

a

Þ/pðw⁰Þ for all w; w⁰2 R²and

a

2 ½0; 1;

(c) if 1 < p₁<p₂, then /_p

1ða; bÞ P /_p₂ða; bÞ.

Proof. The veriﬁcations for part (a) and (b) are straightforward, we omit them. Part (c) is true by applyingLemma 2.2. h

Proposition 2.2(a)shows the symmetric property of /_pða; bÞ which means there have a couple of points on plane between line a ¼ b having the same height. In other words, surface z ¼ /_pða; bÞ has the same structure on second and forth quadrant of the plane, seeFigs. 4–6.Proposition 2.2(b)says that the shape of surface is convex because the function /_pis convex while Proposition 2.2(c)implies that the value of /_pis decreasing when the value of p is increasing. In summary, the value of p would affect geometric structure.

Proposition 2.3. If fða^k;b^kÞg # R² with ða^k! 1Þ or ðb^k! 1Þ or ða^k! þ1 and b^k! þ1Þ, then j/pða^k;b^kÞj ! þ1 for k ! þ1.

Proof. This can be found in[26, p. 20]. h

Proposition 2.3implies the increasing direction on surface. This can be seen from the contour graph of z ¼ /_pða; bÞ which is plotted inFig. 4, where the deep color presents the lower height. In order to understand the structure of the surface, it is nature to investigate special curves on the surface. We consider a family of curves

a

r;p:R! R³deﬁned as follows:

a

r;pðtÞ :¼ r þ t; r t; / pðr þ t; r tÞ

ð12Þ

(6)

where r 2 R and p 2 ð1; þ1Þ are two arbitrary ﬁxed real number. These curves can be viewed as the intersection of surface z ¼ /_pða; bÞ and plane a þ b ¼ 2r, seeFig. 6. We study some properties regarding these special curves.

Lemma 2.4. Let /_p:R²! R be given as in(7)where p 2 ð1; þ1Þ. Fix any r 2 R, we deﬁne f : R ! R as f ðtÞ :¼ /_pðr þ t; r tÞ, then f is a convex function.

Proof. We know that /_pis a convex function byProposition 2.3and observe that f is a composition of /_pand an affine function. Thus, f is convex since it is a composition of a convex function and an affine function (the composition of two convex functions is not necessarily convex, however, our case does guarantee the convexity because one of them is affine). h

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

50 100 150 200 250 300 350

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0 50 100 150 200 250 300 350

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0 50 100 150 200 250 300

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250 300

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250

Fig. 4. Level curves of z ¼ /pða; bÞ with different p.

(7)

Theorem 2.1. Let /_p:R²! R be given as in(7)where p 2 ð1; þ1Þ. Suppose a and b are constrained on the curve determined by a þ b ¼ 2r (r 2 R) and the surface. Then, /_pða; bÞ attains its minima /_pðr; rÞ ¼ 2¹^pjrj 2r along this curve at ða; bÞ ¼ ðr; rÞ.

Proof. We know that /_pða; bÞ is differentiable except ð0; 0Þ, therefore we discuss two cases as follows.

0 2

4 6

8

10 0 2 4 6 8 10

−6

−4

−2 0

b−axis a−axis

z−axis

Fig. 5. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½0; 10 ½0; 10.

Fig. 6. The curve intersected by surface z ¼ /pða; bÞ and plane a þ b ¼ 2r.

(8)

(i) Case (1): r ¼ 0. Because a þ b ¼ 0; a and b have opposite sign to each other except a ¼ b ¼ 0, fromProposition 2.1, we know /_pða; bÞ P 0 under this case. Thus, when ða; bÞ ¼ ð0; 0Þ; /pða; bÞ attains its minima zero.

(ii) Case (2): r – 0. Fix r and p > 1. Let f : R ! R and g : R ! R be respectively deﬁned as

f ðtÞ :¼ /_pðr þ t; r tÞ; gðtÞ :¼ jr þ tj^pþ jr tj^p:

Then, we calculate that

f⁰ðtÞ ¼ g⁰ðtÞ pðgðtÞÞ^p1^p

and g⁰ðtÞ ¼ p sgnðr þ tÞðr þ tÞh ^p1 sgnðr tÞðr tÞ^p1i :

We know gðtÞ > 0 for all t 2 R. It is clear g⁰ð0Þ ¼ 0, and hence f⁰ð0Þ ¼ 0. ByLemma 2.4, f ðtÞ is convex on R. In addition, it is also continuous, therefore, t ¼ 0 is a critical point of f ðtÞ which is also a global minimizer of f ðtÞ. The proof is done since a ¼ b ¼ r and /_pðr; rÞ ¼ 2¹^pjrj 2r when t ¼ 0. h

Lemma 2.4andTheorem 2.1show that the curve determined by the plane a þ b ¼ 2r and the surface z ¼ /_pða; bÞ is convex and attains minima when a ¼ b, seeFig. 7. We now study curvature of the family of curves

a

r;pdeﬁned as in(12)at point

r; r; /_r;pðr; rÞ

. Because function /_p is not differentiable at ða; bÞ ¼ ð0; 0Þ (i.e., r ¼ 0), we choose two points

t0;t0;/_0;pðt0;t0Þ

and t 0;t0;/_0;pðt0;t0Þ

where t0>0, and calculate the value of cosine function of the angle between

a

0;pðt0Þ;

a

0;pðt0Þ, seeFig. 8.

Proposition 2.4. Let

a

r;p:R! R³be deﬁned as in(12), and cospðhÞ be cosine function of the angle between two vectors

a

0;pðt0Þ and

a

0;pðt0Þ where t0>0. Then,

(a) cospðhÞ ¼ ²

2p6

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2²^p2

²

þ32

q ;

(b) cos_pðhÞ ! ¹₃as p ! 1, and cos_pðhÞ ! ₃₃⁵ as p ! þ1;

(c) if 1 < p₁<p₂, then cosp₁ðhÞ < cosp₂ðhÞ.

Proof

(a) By direct computation, we obtain

cospðhÞ ¼

a

0;pðt0Þ

a

0;pðt0Þ

k

a

0;pðt0Þkk

a

0;pðt0Þk¼ 2²^p 6 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2²^pþ 6

þ 2¹^p^þ2

r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2²^pþ 6

2¹^p^þ2

r ¼ 2²^p 6

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2²^p 2

2

þ 32

r :

(b) From part (a), let f : ð1; þ1Þ ! R be f ðpÞ :¼ cospðhÞ. Then f ðpÞ is continuous on ð1; þ1Þ. By taking the limit, we have cospðhÞ ! ¹₃as p ! 1, and cospðhÞ ! ₃₃⁵ as p ! þ1.

(c) From part (b), we know f⁰ðpÞ ¼ ^{6 1}

ln 2

ð pÞ²²^p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ð2²^p2Þ 2

þ32

q which implies f⁰ðpÞ > 0 for all p > 1. Therefore, f ðpÞ is a strictly increasing

function on ð1; þ1Þ. h

Proposition 2.5. Let

a

r;p:R! R³be deﬁned as in(12). Then the following hold.

(a) The curvature at point

a

r;pð0Þ ¼ r; r; / _pðr; rÞ

is

j

pð0Þ ¼^ðp1Þ2

1p1 jrj . (b)

j

pð0Þ ! 0 as p ! 1 and

j

pð0Þ ! þ1 as p ! þ1.

(c) If 1 < p₁<p₂, then

j

_p₁ð0Þ <

j

_p₂ð0Þ.

Proof

(a) Because

a

r;pðtÞ ¼ r þ t; r t; / _pðr þ t; r tÞ

, we know

a

⁰_r;pð0Þ ¼ ð1; 1; 0Þ and

a

⁰⁰_r;pð0Þ ¼ 0; 0;ðp 1Þ2¹^p jrj

! :

(9)

Recall the formulation of curvature

j

pðtÞ ¼j

a

⁰_r;pðtÞ ^

a

⁰⁰_r;pðtÞj j

a

⁰r;pðtÞj³ ;

where wage operator means the outer product of two vectors. Thus, we have

j

pð0Þ ¼j

a

⁰_r;pð0Þ ^

a

⁰⁰_r;pð0Þj

j

a

⁰_r;pð0Þj³ ¼ðp 1Þ2¹^p¹ jrj :

(b) Let f : ð1; þ1Þ ! R be deﬁned as

f ðpÞ :¼

j

pð0Þ ¼ðp 1Þ2¹^p¹ jrj ;

then obviously f ðpÞ is continuous on R. Thus, the desired result follows by taking the limit directly.

(c) From part (b), we compute that

f⁰ðpÞ ¼2¹^p¹

jrj 1 ln 2 p þln 2

p²

;

which implies f⁰ðpÞ > 0 for all p 2 ð1; þ1Þ. Then f ðpÞ is strictly increasing on ð1; þ1Þ. h

−0.1 −0.05 0 0.05 0.1

−0.0615

−0.061

−0.0605

−0.06

−0.0595

−0.059

−0.0585

−10 −5 0 5 10

−2 0 2 4 6 8 10

−0.5 0 0.5

−5

−4.9

−4.8

−4.7

−4.6

−4.5

−4.4

−0.5 0 0.5

15 15.1 15.2 15.3 15.4 15.5 15.6

Fig. 7. The curve f ðtÞ ¼ /pðr þ t; r tÞ.

(10)

The above two propositions shows how p affect the geometric structure, seeFig. 9(a) and (b).Proposition 2.5(b)says that when p ! 1 the curve becomes a straight line, seeFig. 9(c). Note that when p ! þ1 the curve becomes more and more sharp at the point. This curve is not differentiable when t ¼ 0, seeFig. 9(d). To sum up, from all properties we presented in this section we realize that p indeed affect the geometric behavior of surface z ¼ /_pða; bÞ both locally and globally.

3. Geometric view of w_p

In previous section, we see that generalized FB function /_pis convex and differentiable everywhere except ð0; 0Þ. To the contrast, the function w_pða; bÞ deﬁned as in(8)is non-convex, but continuously differentiable everywhere. Nonetheless, /_p and w_phave many similar geometric properties as will be seen later. In this section, we study some properties like what we have done in Section2and compare the difference between w_pand /_p(seeFigs. 10 and 11).

Proposition 3.1. Let wp:R²! R be given as in(8)where p 2 ð1; þ1Þ. Then,

(a) w_pða; bÞ P 0; 8ða; bÞ 2 R²; (b) w_pða; bÞ ¼ w_pðb; aÞ; 8ða; bÞ 2 R²;

(c) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () w_pða; bÞ ¼ 0;

(d) b ¼ 0 and a < 0 ) w_pða; bÞ ¼ 2a²>0;

(e) a ¼ 0 and b < 0 ) w_pða; bÞ ¼ 2b²>0;

(f) w_pis continuously differentiable everywhere.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3 3.5 4

p=1.1 p=1.5 p=2 p=3p=10

−0.10 −0.05 0 0.05 0.1

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

−0.10 −0.05 0 0.05 0.1

0.02 0.04 0.06 0.08 0.1 0.12

Fig. 8. Angle between vectorsa0;pðt0Þ anda0;pðt0Þ.

(11)

Proof. Parts (d) and (e) come fromPropositions 2.5(c) and 2.1(d), please see[2–4]for the rest. h

Proposition 2.2(c)says that the value of /_pis decreasing with respect to p. To the contrast, w_pdoes not have such property.

More speciﬁcally, it is true for w_p to hold such property only on certain quadrants.

Proposition 3.2. Suppose 1 < p₁<p₂and ða; bÞ 2 R². Then,

(a) if a < 0 or b < 0, then w_p

1ða; bÞ P w_p₂ða; bÞ;

(b) if a > 0 and b > 0, then w_p

1ða; bÞ 6 w_p

2ða; bÞ.

Proof

(a) This is clear fromProposition 2.2(c).

(b) Suppose a > 0 and b > 0, from Proposition 2.1(a), we have /_pða; bÞ < 0. Then Proposition 2.2(c) yields /_p₁ða; bÞ P /_p₂ða; bÞ, and hence /²_p

1ða; bÞ 6 /²_p

2ða; bÞ. h

Since w_pis not convex in general. The counterpart ofTheorem 2.1is as below.

Theorem 3.1. Let w_pða; bÞ be deﬁned as(8)with a þ b ¼ 2r. Then, the following hold.

(a) If r 2 R^þand a > 0; b > 0, then w_pða; bÞ attains maxima 2 ²^p¹ 2¹^p^þ1þ 2

r²when ða; bÞ ¼ ðr; rÞ.

(b) If r 2 R[ f0g, then w_pða; bÞ attains minima 2 ²^p¹þ 2¹^p^þ1þ 2

r²when ða; bÞ ¼ ðr; rÞ.

−0.5 0 0.5

−0.5

−0.45

−0.4

−0.35

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05 0

p=1.1 p=1.5 p=2p=3 p=10

−0.53 0 0.5

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

p=1.1 p=1.5 p=2p=3 p=10

−0.5 0 0.5

3.986 3.988 3.99 3.992 3.994 3.996 3.998 4

−0.53 0 0.5

3.1 3.2 3.3 3.4 3.5 3.6 3.7

Fig. 9. The curvaturejpð0Þ at pointar;pð0Þ.

(12)

Proof

(a) When a > 0 and b > 0,Proposition 2.1(a)says that /_pða; bÞ < 0. Since /²_pða; bÞ > 0, byTheorem 2.1, the minima of /_pða; bÞ becomes maxima of w_pða; bÞ.

(b) This is a consequence ofTheorem 2.1. h

The aforementioned results show w_p has many similar properties like /_p hold, seeFigs. 11 and 12, where we denote w₁ða; bÞ :¼¹₂j/₁ða; bÞj²and w₁ða; bÞ ¼¹₂j/₁ða; bÞj². However, there still are some differences between /_pand w_p. For example, w_pis not convex whereas /_pis.Fig. 13depicts the increasing direction of w_p. Note that w_pða; bÞ is nonnegative and has different properties when a > 0 and b > 0, seeFig. 11.

In order to further understand the geometric properties, we deﬁne a family of curves as follows:

b_r;pðtÞ :¼ r þ t; r t; w pðr þ t; r tÞ

; ð13Þ

where r is a ﬁxed real number, and t 2 R. This family of curves can be regarded as intersection of plane a þ b ¼ 2r and surface z ¼ w_pða; bÞ, seeFig. 14.

Proposition 3.3. Let br;p:R! R³be deﬁned as in(13). Then the following hold.

(a) The curvature at point b_r;pð0Þ ¼ r; r; w _pðr; rÞ

is

j

pð0Þ ¼ ðp 1Þ2¹^p1 2¹^p¹ . (b)

j

pð0Þ ! 0 as p ! 1 and

j

pð0Þ ! þ1 as p ! þ1.

(c) If 1 < p₁<p₂, then

j

p₁ð0Þ <

j

p₂ð0Þ.

Proof

(a) From b_r;pðtÞ ¼ r þ t; r t; w _pðr þ t; r tÞ

, we know

b⁰_r;pð0Þ ¼ ð1; 1; 0Þ and b⁰_r;pð0Þ ¼ 0; 0; ðp 1Þ2 ²^p sgnðrÞðp 1Þ2¹^p^þ1

;

which yields

j

pðrÞ ¼jb⁰_r;pð0Þ ^ b⁰_r;pð0Þj

jb⁰_r;pð0Þj³ ¼ ðp 1Þ2¹^p1 2¹^p¹ :

(b) Let f : ð1; þ1Þ ! R be deﬁned as f ðpÞ :¼

j

pð0Þ ¼ ðp 1Þ2¹^p1 2¹^p¹ . Then the result follows by taking the limit directly.

(c) From part (b), it can be veriﬁed that f⁰ðpÞ > 0 for all p 2 ð1; þ1Þ. Thus, f ðpÞ is strictly increasing on ð1; þ1Þ. h

Fig. 14depicts the change of the curve when we have different value of p in which we can see the change of curvature when p is close to one or inﬁnity. We state an addendum to part (a) here: the curvature at another two special points b_r;pðrÞ ¼ ð0; 2r; 0Þ, br;pðrÞ ¼ ð2r; 0; 0Þ is the same, namely,

j

pðrÞ ¼

j

pðrÞ ¼¹₂. Note that although w_pis differentiable everywhere, the mean curvature at ð0; 0Þ does not exist. To end up this section, we summarize the similarity and difference between /_pand w_p as below.

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

Fig. 10. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½10; 10 ½10; 10.

(13)

/pða; bÞ w_pða; bÞ

Difference Convex Nonconvex

differentiable everywhere except ð0; 0Þ/_pða; bÞ < 0 when a > 0 and b > 0

differentiable everywhere wpða; bÞ P 0;8ða; bÞ 2 R² Similarity (1) NCP-function

(2) Symmetry (i.e. /pða; bÞ ¼ /pðb; aÞ and wpða; bÞ ¼ wpðb; aÞ) (3) The function is not affected by p on axes

(4) When ða^k! 1Þ or ðb^k! 1Þ or ða^k;b^k! þ1Þ there have j/pða^k;b^kÞj ! 1 and jwpða^k;b^kÞj ! 1 (5) Non-coercive

4. Geometric analysis of merit function in descent algorithms

In this section, we employ derivative-free descent algorithms presented in[4,5]to solve the unconstrained minimization problem(11)by using the merit function(10). We then compare two algorithms and study their convergent behavior by investigating an intuitive visualization. We ﬁrst list these two algorithms as below.

Algorithm 4.1 [4, Algorithm 4.1].

(Step 0) Given real numbers p > 1 and a starting point x⁰2 Rⁿ. Choose the parameters

r

2 ð0; 1Þ; b 2 ð0; 1Þ and

e

P0. Set k :¼ 0.

(Step 1) IfWpðx^kÞ 6

e

, then stop.

(Step 2) Let mkbe the smallest nonnegative integer m satisfying Wpðx^kþ b^md^kÞ 6 ð1

r

b^2mÞWpðx^kÞ;

where

d^k:¼ rbw_pðx^k;Fðx^kÞÞ and

rbw_pðx; FðxÞÞ :¼ rbw_pðx1;F1ðxÞÞ; . . . ;rbw_pðxn;FnðxÞÞT

: (Step 3) Set x^kþ1:¼ x^kþ b^m^kd^k, k :¼ k þ 1 and go to Step 1.

Algorithm 4.2 [5, Algorithm 4.1].

(Step 0) Given real numbers p > 1 and

a

P0 and a starting point x⁰2 Rⁿ. Choose the parameters

r

2 ð0; 1Þ; b 2 ð0; 1Þ;

c

2 ð0; 1Þ and

e

P0. Set k :¼ 0.

(Step 1) IfWa;pðx^kÞ 6

e

, then stop.

(Step 2) Let m_kbe the smallest nonnegative integer m satisfying

0 2

4 6

8

10 0 2 4 6 8 10

0 5 10 15 20

b−axis a−axis

z−axis

Fig. 11. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½0; 10 ½0; 10.

(14)

Wa;pðx^kþ b^md^kð

c

^mÞÞ 6 ð1

r

b^2mÞWa;pðx^kÞ;

where

d^kð

c

^mÞ :¼ rbw_a;pðx^k;Fðx^kÞÞ

c

^mraw_a;pðx^k;Fðx^kÞÞ and

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600 800

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600 800

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

100 200 300 400 500

b−axis a−axis

z−axis

Fig. 12. The surface of z ¼ /pða; bÞ with different p.

(15)

raw_a_;pðx; FðxÞÞ :¼raw_a_;pðx1;F1ðxÞÞ; . . . ;raw_a_;pðxn;FnðxÞÞT

; rbw_a_;pðx; FðxÞÞ :¼rbw_a_;pðx1;F1ðxÞÞ; . . . ;rbw_a_;pðxn;FnðxÞÞT

: (Step 3) Set x^kþ1:¼ x^kþ b^m^kd^kð

c

^m^kÞ, k :¼ k þ 1 and go to Step 1.

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

1 2 3 4 5 6 7 x 10⁴

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

1 2 3 4 5 6 7 x 10⁴

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 x 10⁴

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10⁴

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 x 10⁴

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 x 10⁴

Fig. 13. Level curves of z ¼ wpða; bÞ with different p.

(16)

InAlgorithm 4.2, w_a_;p:R²! Rþis an NCP-function deﬁned by

w_a;pða; bÞ :¼

a

2ðmaxf0; abgÞ²þ wpða; bÞ ¼

a

2ðabÞ²_þþ1

2ðkða; bÞk_p ða þ bÞÞ²

with

a

P0 being a real parameter. When

a

¼ 0, the function w_a_;preduces to w_p. For comparing these two algorithms, we take

a

¼ 0 when we useAlgorithm 4.2in this section. Note that the descent direction inAlgorithm 4.1is lack of a certain symmetry whereasAlgorithm 4.2adopts a symmetric search direction. Under the assumption of monotonicity, i.e.,

hx y; FðxÞ FðyÞi P 0 for all x; y 2 Rⁿ;

the error bound is proposed andAlgorithm 4.2is shown to have locally R-linear convergence rate in[5]. In other words, there exists a positive constant

j

2such that

kx^k xk 6

j

2 max Wa;pðx^kÞ; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Wa;pðx^kÞ

q

¹₂

when

a

¼ 0:

Furthermore, the convergence rate ofAlgorithm 4.2has a close relation with the constant

log^c

b L1þ

r

CðB;

a

;pÞ

where CðB;

a

;pÞ ¼ 2 2¹^p4

a

B²þ 2 þ 2 ¹^p2:

Therefore, when the value of p decreases, the convergence rate ofAlgorithm 4.2becomes worse and worse, see Remark 4.1 in [5].

Recall that merit functionWpðxÞ is sum of n nonnegative functions wp, i.e.,

WpðxÞ ¼Xⁿ

i¼1

wpðxi;FiðxÞÞ:

This encourages us to view each component w_pðx^k_i;Fiðx^kÞÞ for i ¼ 1; 2 . . . ; n as the motion with different velocity on the same surface z ¼ w_pða; bÞ at each iteration. Due to our study in Sections2 and 3, we observe a visualization that help us understand the convergent behavior in details.Fig. 20depicts the visualization in a four-dimensional NCP inExample 4.3. The merit function of this NCP isWpðxÞ ¼P4

i¼1w_pðxi;FiðxÞÞ. We plot point sequences ðx ^k_i;Fiðx^kÞÞ

for i ¼ 1; 2; 3; 4 together with different color and level curve of surface w_1:1ða; bÞ inFig. 20(a). Vertical line represents value of x_i, horizontal line represents value of FiðxÞ and skew line means xi¼ FiðxÞ. We take initial point x⁰¼ ð0; 0; 0; 0Þ which implies Fðx⁰Þ ¼ ð6; 2; 1; 3Þ, and observe convergent behavior separately with different i from initial point to the solution x¼ ð ffiffiffi

p6

=2; 0; 0; 1=2Þ which is on the horizontal line in this ﬁgure. Furthermore, we observe the position of point sequence on the surface inFig. 20(a) and merit function which is the sum of their height at each iteration shown as inFig. 20(b).

In one-dimensional NCP, F is continuously differentiable and there is only one variable x in F, so ðx; FðxÞÞ is continuous curve on R²and merit functionWpðxÞ ¼ w_pðx; FðxÞÞ is obviously a curve on the surface z ¼ w_pða; bÞ, seeFig. 16(a) and (b).

Therefore, point sequence in one-dimensional problem can only lie on the curve x; FðxÞ; w _pðx; FðxÞÞ .

−2

−1 0

1

2 −2 −1 0 1 2

0 5 10 15 20 25

y−axis x−axis

z−axis

0 1

2

3 0 0.5 1 1.5 2 2.5 3

0 0.5 1 1.5 2

y−axis x−axis

z−axis

Fig. 14. The curve intersected by surface z ¼ wpða; bÞ and plane a þ b ¼ 2r.

(17)

Example 4.1. Consider the NCP, where F : R ! R is given by FðxÞ ¼ ðx 3Þ³þ 1:

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

p=1.1 p=1.5 p=2p=3 p=10

−0.50 0 0.5

1 2 3 4 5 6 7 8x 10⁻³

−0.5 0 0.5

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.5 1 1.5

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0.5 p=1

p=1.1 p=1.2 p=1.3 p=1.4

0.5 1 1.5

0 0.02 0.04 0.06 0.08 0.1 0.12

0.14 p=100

p=10 p=5p=4 p=3

Fig. 15. The curvature jpð0Þ at point br;pð0Þ.

(18)

The unique solution of this NCP is x¼ 2. Note that F is strictly monotone, see geometric view of this NCP problem inFig. 16.

The value of merit function with each iteration is plotted inFig. 16(c) which presents the different behavior of the functions with different value p near by the solution.Fig. 17(a)–(d) depict convergent behavior inAlgorithm 4.1from two direction with two different initial points, andFig. 17(e) and (f) show convergent behavior with different p.Fig. 19(a)–(d) depict convergent behavior inAlgorithm 4.2from two direction with two different initial points. We found thatAlgorithm 4.2always produce point sequence in or close to the boundary of feasible set, i.e., ðx; FðxÞÞ : x P 0 and FðxÞ P 0f g. Based onProposition 3.2, the speed of the decreasing of merit function with different initial point inAlgorithm 4.1is different when we increase p.

But it is similar with different initial point inAlgorithm 4.2. This phenomena is consistent with geometric properties studied in Section3.

To show the importance of inﬂection point, we give an extreme example as follows:

Example 4.2. Consider the NCP, where F : R ! R is given by

FðxÞ ¼ 1:

The unique solution of this NCP is x¼ 0. From above discussion, we know that point sequence is on the curve x; 1; w _pðx; 1Þ , seeFig. 18(a).Fig. 18(c) shows there is rapid decreasing of merit function form the 80th to 120th iteration.Fig. 18(b) shows the behavior during 80th to 120th iteration. Observing the width of the level curve inFig. 18(b), we found that rapid decreasing may arise from the existence of inﬂection point on the surface.Figs. 18(c)–(f) andFig. 19(e) and (f) show that the position of inﬂection point may change with different p.

1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6

−1

−0.5 0 0.5 1 1.5

x

F(x)

−1 0 1 2 3 4

−1

−0.5 0 0.5 1 1.5 2

x

F(x)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

x Ψp(x,F(x))

p=1.1 p=1.5 p=2p=3 p=100

Fig. 16. Geometric view of NCP inExample 4.1.

(19)

Example 4.3. Consider the NCP, where F : R⁴! R⁴is given by

FðxÞ ¼

3x²₁þ 2x1x2þ 2x²2þ x3þ 3x4 6 2x²₁þ x1þ x²₂þ 3x3þ 2x4 2 3x²₁þ x1x2þ 2x²2þ 2x3þ 3x4 1

x²₁þ 3x²₂þ 2x3þ 3x4 3 0

BB B@

1 CC CA:

−1 0 1 2 3 4 5

−8

−6

−4

−2 0 2 4 6

x

F(x)

20 40 60 80 100 120 140

0 20 40 60 80 100 120

0 0.5 1 1.5 2 2.5 3 3.5x 10⁻³

Iteration

Merit function

−1 0 1 2 3 4 5

−1

−0.5 0 0.5 1 1.5 2 2.5 3

x

F(x)

1 2 3 4 5 6 7

0 200 400 600 800 1000 1200 1400

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10⁻⁵

Iteration

Merit function

−1 0 1 2 3 4 5

−8

−6

−4

−2 0 2 4 6

x

F(x)

20 40 60 80 100 120 140

0 50 100 150 200 250

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2x 10⁻³

Iteration

Merit function

Fig. 17. Convergent behavior ofAlgorithm 4.1and the value of merit function inExample 4.1.