# Applied Mathematics and Computation

(1)

## Geometric views of the generalized Fischer–Burmeister function and its induced merit function

### Huai-Yin Tsai, Jein-Shan Chen

,1

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

a r t i c l e i n f o

Keywords:

Curvature Surface Level curve NCP-function Merit function

a b s t r a c t

In this paper, we study geometric properties of surfaces of the generalized Fischer–Burmei- ster function and its induced merit function. Then, a visualization is proposed to explain how the convergent behaviors are inﬂuenced by two descent directions in merit function approach. Based on the geometric properties and visualization, we have more intuitive ideas about how the convergent behavior is affected by changing parameter. Furthermore, geometric view indicates how to improve the algorithm to achieve our goal by setting proper value of the parameter in merit function approach.

1. Introduction

The nonlinear complementarity problem (NCP) is to ﬁnd a point x 2 Rnsuch that

x P 0; FðxÞ P 0; hx; FðxÞi ¼ 0; ð1Þ

where h; i is the Euclidean inner product and F ¼ ðF1; . . . ;FnÞTis a map from Rnto Rn. We assume that F is continuously dif- ferentiable throughout this paper. The NCP has attracted much attention because of its wide applications in the ﬁelds of eco- nomics, engineering, and operations research[8,11,16], to name a few.

Many methods have been proposed to solve the NCP; see[1,14,16,20,22,25]and the references therein. One of the most powerful and popular approach is to reformulate the NCP as a system of nonlinear equations[21,23,28], or an unconstrained minimization problem[9,10,12,15,18,19,24,27]. The objective function that can constitute an equivalent unconstrained min- imization problem is called a merit function, whose global minima are coincident with the solutions of the original NCP. To construct a merit function, a class of functions, called NCP-functions and deﬁned below, plays a signiﬁcant role.

A function / : R2! R is called an NCP-function if it satisﬁes

/ða; bÞ ¼ 0 () a P 0; b P 0; ab ¼ 0: ð2Þ

Equivalently, / is an NCP-function if the set of its zeros is the two nonnegative semiaxes. An important NCP-function, which plays a central role in the development of efﬁcient algorithms for the solution of the NCP, is the well-known Fischer–Burmeister (FB) NCP-function[12,13]deﬁned as

Corresponding author.

1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Ofﬁce. The author’s work is supported by Ministry of Science and Technology, Taiwan.

Contents lists available atScienceDirect

## Applied Mathematics and Computation

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / a m c

(2)

/ða; bÞ ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ a2þ b2 q

 ða þ bÞ: ð3Þ

With the NCP function, we can obtain an equivalent formulation of the NCP by a system of equations:

UðxÞ ¼

/ðx1;F1ðxÞÞ





 /ðxn;FnðxÞÞ 0

BB BB BB

@

1 CC CC CC A

¼ 0: ð4Þ

In other words, we have

x solves the NCP ()UðxÞ ¼ 0:

In view of this, we deﬁne a real-valued functionW:Rn! Rþ

WðxÞ :¼1

2kUðxÞk2¼1 2

Xn

i¼1

/2ðxi;FiðxÞÞ: ð5Þ

It is known thatWa merit function of the NCP, i.e., the NCP is equivalent to an unconstrained minimization problem:

minx2Rn WðxÞ: ð6Þ

Merit functions is frequently used in designing numerical algorithms for solving the NCP. In particular, we can apply an iter- ative algorithm to minimize the merit function with hope of obtaining its global minimum.

Recently, the so-called generalized Fischer–Burmeister function was proposed in[3,4]. More speciﬁcally, they considered /p:R2! R and

/pða; bÞ :¼ kða; bÞkp ða þ bÞ; ð7Þ

where p > 1 is an arbitrary ﬁxed real number and kða; bÞkpdenotes the p-norm of ða; bÞ, i.e., kða; bÞkp¼ ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ jajpþ jbjp pp

. In other words, in the function /p, the 2-norm of ða; bÞ in the FB function is replaced by a more general p-norm. The function /pis still an NCP-function, which naturally induces another NCP-function wp:R2! Rþgiven by

wpða; bÞ :¼1

2j/pða; bÞj2: ð8Þ

For any given p > 1, the function wpis shown to possess all favorable properties of the FB function w; see[2–4]. It plays an important part in our study throughout the paper. LikeU, the operatorUp:Rn! Rndeﬁned as

UpðxÞ ¼

/pðx1;F1ðxÞÞ





 /pðxn;FnðxÞÞ 0

BB BB BB

@

1 CC CC CC A

ð9Þ

yields a family of merit functionsWp:Rn! Rþfor the NCP:

WpðxÞ :¼1

2kUpðxÞk2¼Xn

i¼1

wpðxi;FiðxÞÞ: ð10Þ

Analogously, the NCP is equivalent to an unconstrained minimization problem:

min

x2Rn WpðxÞ: ð11Þ

It was shown that if F is monotone[15]or an P0-function[10], then any stationary point ofWis a global minima of the unconstrained minimization minx2RnWðxÞ, and hence solves the NCP. The similar results were generalized toWp-case in [4]. On the other hand, there are many classical iterative methods applied to this unconstrained minimization of the NCP.

Derivative-free methods[29]are suitable for problems where the derivatives of F are not available or expansive. Some deriv- ative-free algorithms with global convergence results were proposed to solve the NCP based on generalized Fischer–Burmei- ster merit function. For example,[4,5]pointed out that the performance of the algorithm is inﬂuenced by parameter p. In addition, there have been observed some phenomenon in the derivative-free algorithm studied in[5]. More speciﬁcally, there occurs kind of ‘‘cliff’’ in the convergent behavior depicted asFig. 1.

During these years, we are frequently asked about what is the main factor causing this and how parameter p affects con- vergent behavior? These are what we are eager to know of. In light of our earlier numerical experience, we ﬁnd that ﬁguring out the geometric properties of /p and wp may be a key way to answer the aforementioned puzzles. In view of this

(3)

motivation, we aim to do analysis from geometric view in this paper. More speciﬁcally, the objective of this paper is to study the relation between convergent behavior and parameter p via aspect of geometry in which the graphs of /pand wpcan be regarded as families of surfaces embedded in R3.

This paper is organized as follows. In Section2, we propose some geometric properties of /pand present its surface struc- ture by ﬁgures. In Section3, we study properties of wp, and summarize the comparison between /pand wp. In Section4, we investigate a geometric visualization to see possible convergence behavior with different p by a few examples. Finally, we state the conclusion.

2. Geometric view of /p

In this section, we study some geometric properties of /pand interpret their meanings. We present the family of surfaces of /pða; bÞ where p 2 ð1; þ1Þ, seeFigs. 2 and 3. When we ﬁx a real number p with 1 < p < þ1,Fig. 3gives us intuitive image that the surface shape is indeed inﬂuenced by the value of p. From the deﬁnition of p-norm, we know that kða; bÞk1:¼ jaj þ jbj, and kða; bÞk1:¼ maxfjaj; jbjg. It is trivial that /pða; bÞ ! /1ða; bÞ :¼ jaj þ jbj  ða þ bÞ pointwisely, see Fig. 3(a) and (b). On the other hand /pða; bÞ ! /1ða; bÞ :¼ maxfjaj; jbjg  ða þ bÞ pointwisely, seeFig. 3(e) and (f). Note that /1ða; bÞ is not an NCP function because when a > 0 and b > 0, we have /1ða; bÞ ¼ 0 whereas /1ða; bÞ is an NCP function but not differentiable when a ¼ b.

Next, we give some lemmas which will be used in subsequent analysis.

Lemma 2.1 [6, Lemma 3.1]. If a > 0 and b > 0, then ða þ bÞp>apþ bpfor all p 2 ð1; þ1Þ.

Fig. 1. ‘‘Cliff’’ phenomenon that appears in some derivative-free algorithm.

−10

−5 0

5

10 −10

−5 0

5

10

−10 0 10 20 30 40

b−axis a−axis

z−axis

Fig. 2. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½10; 10  ½10; 10.

(4)

Lemma 2.2 [17, Lemma 1.3]. Let x ¼ ðx1;x2; . . . ;xnÞ 2 Rn and kxkp:¼ Pn i¼1jxijp

 1p

. If 1 < p1<p2, then kxkp

2

6kxkp

1

6 n

p11p21

 

kxkp2.

Lemma 2.3 [5, Lemma 3.2]. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then, 2  21p

 

j minfa; bgj 6 j/pða; bÞj 6 2 þ 2 1p

j minfa; bgj:

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10

−10 0 10 20 30 40

b−axis a−axis

z−axis

Fig. 3. The surface of z ¼ /pða; bÞ with different p.

(5)

Proposition 2.1. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then,

(a) ða > 0 and b > 0Þ () /pða; bÞ < 0;

(b) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () /pða; bÞ ¼ 0;

(c) b ¼ 0 and a < 0 ) /pða; bÞ ¼ 2a > 0;

(d) a ¼ 0 and b < 0 ) /pða; bÞ ¼ 2b > 0.

Proof

(a) If a > 0 and b > 0, it is easy to see /pða; bÞ < 0 by Lemma 2.1. Conversely, because ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ jajpþ jbjp pp

Pjaj and ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

jajpþ jbjp pp

Pjbj, we have ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ jajpþ jbjp pp

Pmaxfjaj; jbjg. Suppose a 6 0 or b 6 0, then we have maxfjaj; jbjg P ða þ bÞ which implies /pða; bÞ P 0. This is a contradiction.

(b) By deﬁnition of /pða; bÞ, we know

/pða; 0Þ ¼ jaj  a ¼ 0 a P 0;

2a a < 0;



/pð0; bÞ ¼ jbj  b ¼ 0 b P 0;

2b b < 0;



which say that ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ ) /pða; bÞ ¼ 0. Conversely, suppose /pða; bÞ ¼ 0. If a < 0 or b < 0, mimicking the arguments of part (a) yields

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ jajpþ jbjp qp

>maxfjaj; jbjg > a þ b;

which implies /pða; bÞ > 0. Thus, there must hold a P 0 and b P 0. Furthermore, one of a and b must be 0 from part (a).

The proofs of (c) and (d) are direct from the proof of part (b). h

Proposition 2.1(a)shows that /pða; bÞ is negative on the ﬁrst quadrant of R2-plane, seeFig. 4, whileProposition 2.1(b) shows that /pða; bÞ ¼ 0 can only happen on the nonnegative semiaxes (i.e., a P 0; b ¼ 0 or a ¼ 0; b P 0). In fact, this prop- osition is also equivalent to saying that /pða; bÞ is an NCP-function. In addition,Proposition 2.1(b)–(d) indicate that the value of p does not affect the value /pða; bÞ on the a-axis and b-axis.

Proposition 2.2. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then,

(a) /pða; bÞ ¼ /pðb; aÞ;

(b) /pis convex, i.e.,

/pð

w þ ð1 

Þw0Þ 6

/pðwÞ þ ð1 

### a

Þ/pðw0Þ for all w; w02 R2and

### a

2 ½0; 1;

(c) if 1 < p1<p2, then /p

1ða; bÞ P /p2ða; bÞ.

Proof. The veriﬁcations for part (a) and (b) are straightforward, we omit them. Part (c) is true by applyingLemma 2.2. h

Proposition 2.2(a)shows the symmetric property of /pða; bÞ which means there have a couple of points on plane between line a ¼ b having the same height. In other words, surface z ¼ /pða; bÞ has the same structure on second and forth quadrant of the plane, seeFigs. 4–6.Proposition 2.2(b)says that the shape of surface is convex because the function /pis convex while Proposition 2.2(c)implies that the value of /pis decreasing when the value of p is increasing. In summary, the value of p would affect geometric structure.

Proposition 2.3. If fðak;bkÞg # R2 with ðak! 1Þ or ðbk! 1Þ or ðak! þ1 and bk! þ1Þ, then j/pðak;bkÞj ! þ1 for k ! þ1.

Proof. This can be found in[26, p. 20]. h

Proposition 2.3implies the increasing direction on surface. This can be seen from the contour graph of z ¼ /pða; bÞ which is plotted inFig. 4, where the deep color presents the lower height. In order to understand the structure of the surface, it is nature to investigate special curves on the surface. We consider a family of curves

### a

r;p:R! R3deﬁned as follows:

### a

r;pðtÞ :¼ r þ t; r  t; / pðr þ t; r  tÞ

ð12Þ

(6)

where r 2 R and p 2 ð1; þ1Þ are two arbitrary ﬁxed real number. These curves can be viewed as the intersection of surface z ¼ /pða; bÞ and plane a þ b ¼ 2r, seeFig. 6. We study some properties regarding these special curves.

Lemma 2.4. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Fix any r 2 R, we deﬁne f : R ! R as f ðtÞ :¼ /pðr þ t; r  tÞ, then f is a convex function.

Proof. We know that /pis a convex function byProposition 2.3and observe that f is a composition of /pand an afﬁne func- tion. Thus, f is convex since it is a composition of a convex function and an afﬁne function (the composition of two convex functions is not necessarily convex, however, our case does guarantee the convexity because one of them is afﬁne). h

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

50 100 150 200 250 300 350

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0 50 100 150 200 250 300 350

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0 50 100 150 200 250 300

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250 300

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

−50 0 50 100 150 200 250

Fig. 4. Level curves of z ¼ /pða; bÞ with different p.

(7)

Theorem 2.1. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Suppose a and b are constrained on the curve determined by a þ b ¼ 2r (r 2 R) and the surface. Then, /pða; bÞ attains its minima /pðr; rÞ ¼ 21pjrj  2r along this curve at ða; bÞ ¼ ðr; rÞ.

Proof. We know that /pða; bÞ is differentiable except ð0; 0Þ, therefore we discuss two cases as follows.

0 2

4 6

8

10 0 2 4 6 8 10

−6

−4

−2 0

b−axis a−axis

z−axis

Fig. 5. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½0; 10  ½0; 10.

Fig. 6. The curve intersected by surface z ¼ /pða; bÞ and plane a þ b ¼ 2r.

(8)

(i) Case (1): r ¼ 0. Because a þ b ¼ 0; a and b have opposite sign to each other except a ¼ b ¼ 0, fromProposition 2.1, we know /pða; bÞ P 0 under this case. Thus, when ða; bÞ ¼ ð0; 0Þ; /pða; bÞ attains its minima zero.

(ii) Case (2): r – 0. Fix r and p > 1. Let f : R ! R and g : R ! R be respectively deﬁned as

f ðtÞ :¼ /pðr þ t; r  tÞ; gðtÞ :¼ jr þ tjpþ jr  tjp:

Then, we calculate that

f0ðtÞ ¼ g0ðtÞ pðgðtÞÞp1p

and g0ðtÞ ¼ p sgnðr þ tÞðr þ tÞh p1 sgnðr  tÞðr  tÞp1i :

We know gðtÞ > 0 for all t 2 R. It is clear g0ð0Þ ¼ 0, and hence f0ð0Þ ¼ 0. ByLemma 2.4, f ðtÞ is convex on R. In addition, it is also continuous, therefore, t ¼ 0 is a critical point of f ðtÞ which is also a global minimizer of f ðtÞ. The proof is done since a ¼ b ¼ r and /pðr; rÞ ¼ 21pjrj  2r when t ¼ 0. h

Lemma 2.4andTheorem 2.1show that the curve determined by the plane a þ b ¼ 2r and the surface z ¼ /pða; bÞ is convex and attains minima when a ¼ b, seeFig. 7. We now study curvature of the family of curves

### a

r;pdeﬁned as in(12)at point

r; r; /r;pðr; rÞ

 

. Because function /p is not differentiable at ða; bÞ ¼ ð0; 0Þ (i.e., r ¼ 0), we choose two points

t0;t0;/0;pðt0;t0Þ

 

and t 0;t0;/0;pðt0;t0Þ

where t0>0, and calculate the value of cosine function of the angle between

0;pðt0Þ;

### a

0;pðt0Þ, seeFig. 8.

Proposition 2.4. Let

### a

r;p:R! R3be deﬁned as in(12), and cospðhÞ be cosine function of the angle between two vectors

0;pðt0Þ and

### a

0;pðt0Þ where t0>0. Then,

(a) cospðhÞ ¼ 2

2p6

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

22p2

 2

þ32

q ;

(b) cospðhÞ ! 13as p ! 1, and cospðhÞ ! 335 as p ! þ1;

(c) if 1 < p1<p2, then cosp1ðhÞ < cosp2ðhÞ.

Proof

(a) By direct computation, we obtain

cospðhÞ ¼

0;pðt0Þ 

0;pðt0Þ

k

0;pðt0Þkk

### a

0;pðt0Þk¼ 22p 6 ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

22pþ 6

 

þ 21pþ2

r ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 22pþ 6

 

 21pþ2

r ¼ 22p 6

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 22p 2

 2

þ 32

r :

(b) From part (a), let f : ð1; þ1Þ ! R be f ðpÞ :¼ cospðhÞ. Then f ðpÞ is continuous on ð1; þ1Þ. By taking the limit, we have cospðhÞ ! 13as p ! 1, and cospðhÞ ! 335 as p ! þ1.

(c) From part (b), we know f0ðpÞ ¼ 6 1

ln 2

ð pÞ22p ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ð22p2Þ 2

þ32

q which implies f0ðpÞ > 0 for all p > 1. Therefore, f ðpÞ is a strictly increasing

function on ð1; þ1Þ. h

Proposition 2.5. Let

### a

r;p:R! R3be deﬁned as in(12). Then the following hold.

(a) The curvature at point

### a

r;pð0Þ ¼ r; r; / pðr; rÞ

is

pð0Þ ¼ðp1Þ2

1p1 jrj . (b)

### j

pð0Þ ! 0 as p ! 1 and

### j

pð0Þ ! þ1 as p ! þ1.

(c) If 1 < p1<p2, then

p1ð0Þ <

p2ð0Þ.

Proof

(a) Because

### a

r;pðtÞ ¼ r þ t; r  t; / pðr þ t; r  tÞ

, we know

### a

0r;pð0Þ ¼ ð1; 1; 0Þ and

### a

00r;pð0Þ ¼ 0; 0;ðp  1Þ21p jrj

! :

(9)

Recall the formulation of curvature

pðtÞ ¼j

0r;pðtÞ ^

00r;pðtÞj j

### a

0r;pðtÞj3 ;

where wage operator means the outer product of two vectors. Thus, we have

pð0Þ ¼j

0r;pð0Þ ^

00r;pð0Þj

j

### a

0r;pð0Þj3 ¼ðp  1Þ21p1 jrj :

(b) Let f : ð1; þ1Þ ! R be deﬁned as

f ðpÞ :¼

### j

pð0Þ ¼ðp  1Þ21p1 jrj ;

then obviously f ðpÞ is continuous on R. Thus, the desired result follows by taking the limit directly.

(c) From part (b), we compute that

f0ðpÞ ¼21p1

jrj 1 ln 2 p þln 2

p2



;

which implies f0ðpÞ > 0 for all p 2 ð1; þ1Þ. Then f ðpÞ is strictly increasing on ð1; þ1Þ. h

−0.1 −0.05 0 0.05 0.1

−0.0615

−0.061

−0.0605

−0.06

−0.0595

−0.059

−0.0585

−10 −5 0 5 10

−2 0 2 4 6 8 10

−0.5 0 0.5

−5

−4.9

−4.8

−4.7

−4.6

−4.5

−4.4

−0.5 0 0.5

15 15.1 15.2 15.3 15.4 15.5 15.6

Fig. 7. The curve f ðtÞ ¼ /pðr þ t; r  tÞ.

(10)

The above two propositions shows how p affect the geometric structure, seeFig. 9(a) and (b).Proposition 2.5(b)says that when p ! 1 the curve becomes a straight line, seeFig. 9(c). Note that when p ! þ1 the curve becomes more and more sharp at the point. This curve is not differentiable when t ¼ 0, seeFig. 9(d). To sum up, from all properties we presented in this section we realize that p indeed affect the geometric behavior of surface z ¼ /pða; bÞ both locally and globally.

3. Geometric view of wp

In previous section, we see that generalized FB function /pis convex and differentiable everywhere except ð0; 0Þ. To the contrast, the function wpða; bÞ deﬁned as in(8)is non-convex, but continuously differentiable everywhere. Nonetheless, /p and wphave many similar geometric properties as will be seen later. In this section, we study some properties like what we have done in Section2and compare the difference between wpand /p(seeFigs. 10 and 11).

Proposition 3.1. Let wp:R2! R be given as in(8)where p 2 ð1; þ1Þ. Then,

(a) wpða; bÞ P 0; 8ða; bÞ 2 R2; (b) wpða; bÞ ¼ wpðb; aÞ; 8ða; bÞ 2 R2;

(c) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () wpða; bÞ ¼ 0;

(d) b ¼ 0 and a < 0 ) wpða; bÞ ¼ 2a2>0;

(e) a ¼ 0 and b < 0 ) wpða; bÞ ¼ 2b2>0;

(f) wpis continuously differentiable everywhere.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3 3.5 4

p=1.1 p=1.5 p=2 p=3p=10

−0.10 −0.05 0 0.05 0.1

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

−0.10 −0.05 0 0.05 0.1

0.02 0.04 0.06 0.08 0.1 0.12

Fig. 8. Angle between vectorsa0;pðt0Þ anda0;pðt0Þ.

(11)

Proof. Parts (d) and (e) come fromPropositions 2.5(c) and 2.1(d), please see[2–4]for the rest. h

Proposition 2.2(c)says that the value of /pis decreasing with respect to p. To the contrast, wpdoes not have such property.

More speciﬁcally, it is true for wp to hold such property only on certain quadrants.

Proposition 3.2. Suppose 1 < p1<p2and ða; bÞ 2 R2. Then,

(a) if a < 0 or b < 0, then wp

1ða; bÞ P wp2ða; bÞ;

(b) if a > 0 and b > 0, then wp

1ða; bÞ 6 wp

2ða; bÞ.

Proof

(a) This is clear fromProposition 2.2(c).

(b) Suppose a > 0 and b > 0, from Proposition 2.1(a), we have /pða; bÞ < 0. Then Proposition 2.2(c) yields /p1ða; bÞ P /p2ða; bÞ, and hence /2p

1ða; bÞ 6 /2p

2ða; bÞ. h

Since wpis not convex in general. The counterpart ofTheorem 2.1is as below.

Theorem 3.1. Let wpða; bÞ be deﬁned as(8)with a þ b ¼ 2r. Then, the following hold.

(a) If r 2 Rþand a > 0; b > 0, then wpða; bÞ attains maxima 2 2p1 21pþ1þ 2

r2when ða; bÞ ¼ ðr; rÞ.

(b) If r 2 R[ f0g, then wpða; bÞ attains minima 2 2p1þ 21pþ1þ 2

r2when ða; bÞ ¼ ðr; rÞ.

−0.5 0 0.5

−0.5

−0.45

−0.4

−0.35

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05 0

p=1.1 p=1.5 p=2p=3 p=10

−0.53 0 0.5

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

p=1.1 p=1.5 p=2p=3 p=10

−0.5 0 0.5

3.986 3.988 3.99 3.992 3.994 3.996 3.998 4

−0.53 0 0.5

3.1 3.2 3.3 3.4 3.5 3.6 3.7

Fig. 9. The curvaturejpð0Þ at pointar;pð0Þ.

(12)

Proof

(a) When a > 0 and b > 0,Proposition 2.1(a)says that /pða; bÞ < 0. Since /2pða; bÞ > 0, byTheorem 2.1, the minima of /pða; bÞ becomes maxima of wpða; bÞ.

(b) This is a consequence ofTheorem 2.1. h

The aforementioned results show wp has many similar properties like /p hold, seeFigs. 11 and 12, where we denote w1ða; bÞ :¼12j/1ða; bÞj2and w1ða; bÞ ¼12j/1ða; bÞj2. However, there still are some differences between /pand wp. For example, wpis not convex whereas /pis.Fig. 13depicts the increasing direction of wp. Note that wpða; bÞ is nonnegative and has dif- ferent properties when a > 0 and b > 0, seeFig. 11.

In order to further understand the geometric properties, we deﬁne a family of curves as follows:

br;pðtÞ :¼ r þ t; r  t; w pðr þ t; r  tÞ

; ð13Þ

where r is a ﬁxed real number, and t 2 R. This family of curves can be regarded as intersection of plane a þ b ¼ 2r and surface z ¼ wpða; bÞ, seeFig. 14.

Proposition 3.3. Let br;p:R! R3be deﬁned as in(13). Then the following hold.

(a) The curvature at point br;pð0Þ ¼ r; r; w pðr; rÞ

is 

### j

pð0Þ ¼ ðp  1Þ21p1  21p1 . (b) 

### j

pð0Þ ! 0 as p ! 1 and 

### j

pð0Þ ! þ1 as p ! þ1.

(c) If 1 < p1<p2, then 

p1ð0Þ < 

### j

p2ð0Þ.

Proof

(a) From br;pðtÞ ¼ r þ t; r  t; w pðr þ t; r  tÞ

, we know

b0r;pð0Þ ¼ ð1; 1; 0Þ and b0r;pð0Þ ¼ 0; 0; ðp  1Þ2 2p sgnðrÞðp  1Þ21pþ1

;

which yields

### j

pðrÞ ¼jb0r;pð0Þ ^ b0r;pð0Þj

jb0r;pð0Þj3 ¼ ðp  1Þ21p1  21p1 :

(b) Let f : ð1; þ1Þ ! R be deﬁned as f ðpÞ :¼ 

### j

pð0Þ ¼ ðp  1Þ21p1  21p1 . Then the result follows by taking the limit directly.

(c) From part (b), it can be veriﬁed that f0ðpÞ > 0 for all p 2 ð1; þ1Þ. Thus, f ðpÞ is strictly increasing on ð1; þ1Þ. h

Fig. 14depicts the change of the curve when we have different value of p in which we can see the change of curvature when p is close to one or inﬁnity. We state an addendum to part (a) here: the curvature at another two special points br;pðrÞ ¼ ð0; 2r; 0Þ, br;pðrÞ ¼ ð2r; 0; 0Þ is the same, namely, 

pðrÞ ¼ 

### j

pðrÞ ¼12. Note that although wpis differentiable every- where, the mean curvature at ð0; 0Þ does not exist. To end up this section, we summarize the similarity and difference be- tween /pand wp as below.

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

Fig. 10. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½10; 10  ½10; 10.

(13)

/pða; bÞ wpða; bÞ

Difference Convex Nonconvex

differentiable everywhere except ð0; 0Þ/pða; bÞ < 0 when a > 0 and b > 0

differentiable everywhere wpða; bÞ P 0;8ða; bÞ 2 R2 Similarity (1) NCP-function

(2) Symmetry (i.e. /pða; bÞ ¼ /pðb; aÞ and wpða; bÞ ¼ wpðb; aÞ) (3) The function is not affected by p on axes

(4) When ðak! 1Þ or ðbk! 1Þ or ðak;bk! þ1Þ there have j/pðak;bkÞj ! 1 and jwpðak;bkÞj ! 1 (5) Non-coercive

4. Geometric analysis of merit function in descent algorithms

In this section, we employ derivative-free descent algorithms presented in[4,5]to solve the unconstrained minimization problem(11)by using the merit function(10). We then compare two algorithms and study their convergent behavior by investigating an intuitive visualization. We ﬁrst list these two algorithms as below.

Algorithm 4.1 [4, Algorithm 4.1].

(Step 0) Given real numbers p > 1 and a starting point x02 Rn. Choose the parameters

### r

2 ð0; 1Þ; b 2 ð0; 1Þ and

### e

P0. Set k :¼ 0.

(Step 1) IfWpðxkÞ 6

### e

, then stop.

(Step 2) Let mkbe the smallest nonnegative integer m satisfying Wpðxkþ bmdkÞ 6 ð1 

### r

b2mÞWpðxkÞ;

where

dk:¼ rbwpðxk;FðxkÞÞ and

rbwpðx; FðxÞÞ :¼ rbwpðx1;F1ðxÞÞ; . . . ;rbwpðxn;FnðxÞÞT

: (Step 3) Set xkþ1:¼ xkþ bmkdk, k :¼ k þ 1 and go to Step 1.

Algorithm 4.2 [5, Algorithm 4.1].

(Step 0) Given real numbers p > 1 and

### a

P0 and a starting point x02 Rn. Choose the parameters

### r

2 ð0; 1Þ; b 2 ð0; 1Þ;

2 ð0; 1Þ and

### e

P0. Set k :¼ 0.

(Step 1) IfWa;pðxkÞ 6

### e

, then stop.

(Step 2) Let mkbe the smallest nonnegative integer m satisfying

0 2

4 6

8

10 0 2 4 6 8 10

0 5 10 15 20

b−axis a−axis

z−axis

Fig. 11. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½0; 10  ½0; 10.

(14)

Wa;pðxkþ bmdkð

mÞÞ 6 ð1 

b2mÞWa;pðxkÞ;

where

dkð

### c

mÞ :¼ rbwa;pðxk;FðxkÞÞ 

### c

mrawa;pðxk;FðxkÞÞ and

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600 800

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600 800

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

200 400 600

b−axis a−axis

z−axis

−10

−5 0

5 10 −10

−5 0

5 10 0

100 200 300 400 500

b−axis a−axis

z−axis

Fig. 12. The surface of z ¼ /pða; bÞ with different p.

(15)

rawa;pðx; FðxÞÞ :¼rawa;pðx1;F1ðxÞÞ; . . . ;rawa;pðxn;FnðxÞÞT

; rbwa;pðx; FðxÞÞ :¼rbwa;pðx1;F1ðxÞÞ; . . . ;rbwa;pðxn;FnðxÞÞT

: (Step 3) Set xkþ1:¼ xkþ bmkdkð

### c

mkÞ, k :¼ k þ 1 and go to Step 1.

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

1 2 3 4 5 6 7 x 104

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

1 2 3 4 5 6 7 x 104

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 x 104

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 x 104

a−axis

b−axis

−100 −50 0 50 100

−100

−80

−60

−40

−20 0 20 40 60 80 100

0.5 1 1.5 2 2.5 3 3.5 4 x 104

Fig. 13. Level curves of z ¼ wpða; bÞ with different p.

(16)

InAlgorithm 4.2, wa;p:R2! Rþis an NCP-function deﬁned by

wa;pða; bÞ :¼

### a

2ðmaxf0; abgÞ2þ wpða; bÞ ¼

### a

2ðabÞ2þþ1

2ðkða; bÞkp ða þ bÞÞ2

with

### a

P0 being a real parameter. When

### a

¼ 0, the function wa;preduces to wp. For comparing these two algorithms, we take

### a

¼ 0 when we useAlgorithm 4.2in this section. Note that the descent direction inAlgorithm 4.1is lack of a certain sym- metry whereasAlgorithm 4.2adopts a symmetric search direction. Under the assumption of monotonicity, i.e.,

hx  y; FðxÞ  FðyÞi P 0 for all x; y 2 Rn;

the error bound is proposed andAlgorithm 4.2is shown to have locally R-linear convergence rate in[5]. In other words, there exists a positive constant

2such that

kxk xk 6

### j

2 max Wa;pðxkÞ; ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Wa;pðxkÞ

 q

 12

when

### a

¼ 0:

Furthermore, the convergence rate ofAlgorithm 4.2has a close relation with the constant

logc

b L1þ

CðB;

;pÞ



where CðB;

;pÞ ¼ 2  21p4

### a

B2þ 2 þ 2 1p2:

Therefore, when the value of p decreases, the convergence rate ofAlgorithm 4.2becomes worse and worse, see Remark 4.1 in [5].

Recall that merit functionWpðxÞ is sum of n nonnegative functions wp, i.e.,

WpðxÞ ¼Xn

i¼1

wpðxi;FiðxÞÞ:

This encourages us to view each component wpðxki;FiðxkÞÞ for i ¼ 1; 2 . . . ; n as the motion with different velocity on the same surface z ¼ wpða; bÞ at each iteration. Due to our study in Sections2 and 3, we observe a visualization that help us understand the convergent behavior in details.Fig. 20depicts the visualization in a four-dimensional NCP inExample 4.3. The merit function of this NCP isWpðxÞ ¼P4

i¼1wpðxi;FiðxÞÞ. We plot point sequences ðx ki;FiðxkÞÞ

for i ¼ 1; 2; 3; 4 together with different color and level curve of surface w1:1ða; bÞ inFig. 20(a). Vertical line represents value of xi, horizontal line represents value of FiðxÞ and skew line means xi¼ FiðxÞ. We take initial point x0¼ ð0; 0; 0; 0Þ which implies Fðx0Þ ¼ ð6; 2; 1; 3Þ, and observe convergent behavior separately with different i from initial point to the solution x¼ ð ﬃﬃﬃ

p6

=2; 0; 0; 1=2Þ which is on the hor- izontal line in this ﬁgure. Furthermore, we observe the position of point sequence on the surface inFig. 20(a) and merit func- tion which is the sum of their height at each iteration shown as inFig. 20(b).

In one-dimensional NCP, F is continuously differentiable and there is only one variable x in F, so ðx; FðxÞÞ is continuous curve on R2and merit functionWpðxÞ ¼ wpðx; FðxÞÞ is obviously a curve on the surface z ¼ wpða; bÞ, seeFig. 16(a) and (b).

Therefore, point sequence in one-dimensional problem can only lie on the curve x; FðxÞ; w pðx; FðxÞÞ .

−2

−1 0

1

2 −2 −1 0 1 2

0 5 10 15 20 25

y−axis x−axis

z−axis

0 1

2

3 0 0.5 1 1.5 2 2.5 3

0 0.5 1 1.5 2

y−axis x−axis

z−axis

Fig. 14. The curve intersected by surface z ¼ wpða; bÞ and plane a þ b ¼ 2r.

(17)

Example 4.1. Consider the NCP, where F : R ! R is given by FðxÞ ¼ ðx  3Þ3þ 1:

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

p=1.1 p=1.5 p=2p=3 p=10

−0.50 0 0.5

1 2 3 4 5 6 7 8x 10−3

−0.5 0 0.5

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.5 1 1.5

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0.5 p=1

p=1.1 p=1.2 p=1.3 p=1.4

0.5 1 1.5

0 0.02 0.04 0.06 0.08 0.1 0.12

0.14 p=100

p=10 p=5p=4 p=3

Fig. 15. The curvature jpð0Þ at point br;pð0Þ.

(18)

The unique solution of this NCP is x¼ 2. Note that F is strictly monotone, see geometric view of this NCP problem inFig. 16.

The value of merit function with each iteration is plotted inFig. 16(c) which presents the different behavior of the functions with different value p near by the solution.Fig. 17(a)–(d) depict convergent behavior inAlgorithm 4.1from two direction with two different initial points, andFig. 17(e) and (f) show convergent behavior with different p.Fig. 19(a)–(d) depict con- vergent behavior inAlgorithm 4.2from two direction with two different initial points. We found thatAlgorithm 4.2always produce point sequence in or close to the boundary of feasible set, i.e., ðx; FðxÞÞ : x P 0 and FðxÞ P 0f g. Based onProposition 3.2, the speed of the decreasing of merit function with different initial point inAlgorithm 4.1is different when we increase p.

But it is similar with different initial point inAlgorithm 4.2. This phenomena is consistent with geometric properties studied in Section3.

To show the importance of inﬂection point, we give an extreme example as follows:

Example 4.2. Consider the NCP, where F : R ! R is given by

FðxÞ ¼ 1:

The unique solution of this NCP is x¼ 0. From above discussion, we know that point sequence is on the curve x; 1; w pðx; 1Þ , seeFig. 18(a).Fig. 18(c) shows there is rapid decreasing of merit function form the 80th to 120th iteration.Fig. 18(b) shows the behavior during 80th to 120th iteration. Observing the width of the level curve inFig. 18(b), we found that rapid decreas- ing may arise from the existence of inﬂection point on the surface.Figs. 18(c)–(f) andFig. 19(e) and (f) show that the position of inﬂection point may change with different p.

1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6

−1

−0.5 0 0.5 1 1.5

x

F(x)

−1 0 1 2 3 4

−1

−0.5 0 0.5 1 1.5 2

x

F(x)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

x Ψp(x,F(x))

p=1.1 p=1.5 p=2p=3 p=100

Fig. 16. Geometric view of NCP inExample 4.1.

(19)

Example 4.3. Consider the NCP, where F : R4! R4is given by

FðxÞ ¼

3x21þ 2x1x2þ 2x22þ x3þ 3x4 6 2x21þ x1þ x22þ 3x3þ 2x4 2 3x21þ x1x2þ 2x22þ 2x3þ 3x4 1

x21þ 3x22þ 2x3þ 3x4 3 0

BB B@

1 CC CA:

−1 0 1 2 3 4 5

−8

−6

−4

−2 0 2 4 6

x

F(x)

20 40 60 80 100 120 140

0 20 40 60 80 100 120

0 0.5 1 1.5 2 2.5 3 3.5x 10−3

Iteration

Merit function

−1 0 1 2 3 4 5

−1

−0.5 0 0.5 1 1.5 2 2.5 3

x

F(x)

1 2 3 4 5 6 7

0 200 400 600 800 1000 1200 1400

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10−5

Iteration

Merit function

−1 0 1 2 3 4 5

−8

−6

−4

−2 0 2 4 6

x

F(x)

20 40 60 80 100 120 140

0 50 100 150 200 250

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2x 10−3

Iteration

Merit function

Fig. 17. Convergent behavior ofAlgorithm 4.1and the value of merit function inExample 4.1.

Updating...

## References

Related subjects :