Geometric views of the generalized Fischer–Burmeister function and its induced merit function
Huai-Yin Tsai, Jein-Shan Chen
⇑,1Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan
a r t i c l e i n f o
Keywords:
Curvature Surface Level curve NCP-function Merit function
a b s t r a c t
In this paper, we study geometric properties of surfaces of the generalized Fischer–Burmei- ster function and its induced merit function. Then, a visualization is proposed to explain how the convergent behaviors are influenced by two descent directions in merit function approach. Based on the geometric properties and visualization, we have more intuitive ideas about how the convergent behavior is affected by changing parameter. Furthermore, geometric view indicates how to improve the algorithm to achieve our goal by setting proper value of the parameter in merit function approach.
Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction
The nonlinear complementarity problem (NCP) is to find a point x 2 Rnsuch that
x P 0; FðxÞ P 0; hx; FðxÞi ¼ 0; ð1Þ
where h; i is the Euclidean inner product and F ¼ ðF1; . . . ;FnÞTis a map from Rnto Rn. We assume that F is continuously dif- ferentiable throughout this paper. The NCP has attracted much attention because of its wide applications in the fields of eco- nomics, engineering, and operations research[8,11,16], to name a few.
Many methods have been proposed to solve the NCP; see[1,14,16,20,22,25]and the references therein. One of the most powerful and popular approach is to reformulate the NCP as a system of nonlinear equations[21,23,28], or an unconstrained minimization problem[9,10,12,15,18,19,24,27]. The objective function that can constitute an equivalent unconstrained min- imization problem is called a merit function, whose global minima are coincident with the solutions of the original NCP. To construct a merit function, a class of functions, called NCP-functions and defined below, plays a significant role.
A function / : R2! R is called an NCP-function if it satisfies
/ða; bÞ ¼ 0 () a P 0; b P 0; ab ¼ 0: ð2Þ
Equivalently, / is an NCP-function if the set of its zeros is the two nonnegative semiaxes. An important NCP-function, which plays a central role in the development of efficient algorithms for the solution of the NCP, is the well-known Fischer–Burmeister (FB) NCP-function[12,13]defined as
http://dx.doi.org/10.1016/j.amc.2014.03.089 0096-3003/Ó 2014 Elsevier Inc. All rights reserved.
⇑Corresponding author.
E-mail addresses:tasiwhyin@gmail.com(H.-Y. Tsai),jschen@math.ntnu.edu.tw(J.-S. Chen).
1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is supported by Ministry of Science and Technology, Taiwan.
Contents lists available atScienceDirect
Applied Mathematics and Computation
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / a m c
/ða; bÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a2þ b2 q
ða þ bÞ: ð3Þ
With the NCP function, we can obtain an equivalent formulation of the NCP by a system of equations:
UðxÞ ¼
/ðx1;F1ðxÞÞ
/ðxn;FnðxÞÞ 0
BB BB BB
@
1 CC CC CC A
¼ 0: ð4Þ
In other words, we have
x solves the NCP ()UðxÞ ¼ 0:
In view of this, we define a real-valued functionW:Rn! Rþ
WðxÞ :¼1
2kUðxÞk2¼1 2
Xn
i¼1
/2ðxi;FiðxÞÞ: ð5Þ
It is known thatWa merit function of the NCP, i.e., the NCP is equivalent to an unconstrained minimization problem:
minx2Rn WðxÞ: ð6Þ
Merit functions is frequently used in designing numerical algorithms for solving the NCP. In particular, we can apply an iter- ative algorithm to minimize the merit function with hope of obtaining its global minimum.
Recently, the so-called generalized Fischer–Burmeister function was proposed in[3,4]. More specifically, they considered /p:R2! R and
/pða; bÞ :¼ kða; bÞkp ða þ bÞ; ð7Þ
where p > 1 is an arbitrary fixed real number and kða; bÞkpdenotes the p-norm of ða; bÞ, i.e., kða; bÞkp¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jajpþ jbjp pp
. In other words, in the function /p, the 2-norm of ða; bÞ in the FB function is replaced by a more general p-norm. The function /pis still an NCP-function, which naturally induces another NCP-function wp:R2! Rþgiven by
wpða; bÞ :¼1
2j/pða; bÞj2: ð8Þ
For any given p > 1, the function wpis shown to possess all favorable properties of the FB function w; see[2–4]. It plays an important part in our study throughout the paper. LikeU, the operatorUp:Rn! Rndefined as
UpðxÞ ¼
/pðx1;F1ðxÞÞ
/pðxn;FnðxÞÞ 0
BB BB BB
@
1 CC CC CC A
ð9Þ
yields a family of merit functionsWp:Rn! Rþfor the NCP:
WpðxÞ :¼1
2kUpðxÞk2¼Xn
i¼1
wpðxi;FiðxÞÞ: ð10Þ
Analogously, the NCP is equivalent to an unconstrained minimization problem:
min
x2Rn WpðxÞ: ð11Þ
It was shown that if F is monotone[15]or an P0-function[10], then any stationary point ofWis a global minima of the unconstrained minimization minx2RnWðxÞ, and hence solves the NCP. The similar results were generalized toWp-case in [4]. On the other hand, there are many classical iterative methods applied to this unconstrained minimization of the NCP.
Derivative-free methods[29]are suitable for problems where the derivatives of F are not available or expansive. Some deriv- ative-free algorithms with global convergence results were proposed to solve the NCP based on generalized Fischer–Burmei- ster merit function. For example,[4,5]pointed out that the performance of the algorithm is influenced by parameter p. In addition, there have been observed some phenomenon in the derivative-free algorithm studied in[5]. More specifically, there occurs kind of ‘‘cliff’’ in the convergent behavior depicted asFig. 1.
During these years, we are frequently asked about what is the main factor causing this and how parameter p affects con- vergent behavior? These are what we are eager to know of. In light of our earlier numerical experience, we find that figuring out the geometric properties of /p and wp may be a key way to answer the aforementioned puzzles. In view of this
motivation, we aim to do analysis from geometric view in this paper. More specifically, the objective of this paper is to study the relation between convergent behavior and parameter p via aspect of geometry in which the graphs of /pand wpcan be regarded as families of surfaces embedded in R3.
This paper is organized as follows. In Section2, we propose some geometric properties of /pand present its surface struc- ture by figures. In Section3, we study properties of wp, and summarize the comparison between /pand wp. In Section4, we investigate a geometric visualization to see possible convergence behavior with different p by a few examples. Finally, we state the conclusion.
2. Geometric view of /p
In this section, we study some geometric properties of /pand interpret their meanings. We present the family of surfaces of /pða; bÞ where p 2 ð1; þ1Þ, seeFigs. 2 and 3. When we fix a real number p with 1 < p < þ1,Fig. 3gives us intuitive image that the surface shape is indeed influenced by the value of p. From the definition of p-norm, we know that kða; bÞk1:¼ jaj þ jbj, and kða; bÞk1:¼ maxfjaj; jbjg. It is trivial that /pða; bÞ ! /1ða; bÞ :¼ jaj þ jbj ða þ bÞ pointwisely, see Fig. 3(a) and (b). On the other hand /pða; bÞ ! /1ða; bÞ :¼ maxfjaj; jbjg ða þ bÞ pointwisely, seeFig. 3(e) and (f). Note that /1ða; bÞ is not an NCP function because when a > 0 and b > 0, we have /1ða; bÞ ¼ 0 whereas /1ða; bÞ is an NCP function but not differentiable when a ¼ b.
Next, we give some lemmas which will be used in subsequent analysis.
Lemma 2.1 [6, Lemma 3.1]. If a > 0 and b > 0, then ða þ bÞp>apþ bpfor all p 2 ð1; þ1Þ.
Fig. 1. ‘‘Cliff’’ phenomenon that appears in some derivative-free algorithm.
−10
−5 0
5
10 −10
−5 0
5
10
−10 0 10 20 30 40
b−axis a−axis
z−axis
Fig. 2. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½10; 10 ½10; 10.
Lemma 2.2 [17, Lemma 1.3]. Let x ¼ ðx1;x2; . . . ;xnÞ 2 Rn and kxkp:¼ Pn i¼1jxijp
1p
. If 1 < p1<p2, then kxkp
2
6kxkp
1
6 n
p11p21
kxkp2.
Lemma 2.3 [5, Lemma 3.2]. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then, 2 21p
j minfa; bgj 6 j/pða; bÞj 6 2 þ 2 1p
j minfa; bgj:
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10
−10 0 10 20 30 40
b−axis a−axis
z−axis
Fig. 3. The surface of z ¼ /pða; bÞ with different p.
Proposition 2.1. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then,
(a) ða > 0 and b > 0Þ () /pða; bÞ < 0;
(b) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () /pða; bÞ ¼ 0;
(c) b ¼ 0 and a < 0 ) /pða; bÞ ¼ 2a > 0;
(d) a ¼ 0 and b < 0 ) /pða; bÞ ¼ 2b > 0.
Proof
(a) If a > 0 and b > 0, it is easy to see /pða; bÞ < 0 by Lemma 2.1. Conversely, because ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jajpþ jbjp pp
Pjaj and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jajpþ jbjp pp
Pjbj, we have ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jajpþ jbjp pp
Pmaxfjaj; jbjg. Suppose a 6 0 or b 6 0, then we have maxfjaj; jbjg P ða þ bÞ which implies /pða; bÞ P 0. This is a contradiction.
(b) By definition of /pða; bÞ, we know
/pða; 0Þ ¼ jaj a ¼ 0 a P 0;
2a a < 0;
/pð0; bÞ ¼ jbj b ¼ 0 b P 0;
2b b < 0;
which say that ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ ) /pða; bÞ ¼ 0. Conversely, suppose /pða; bÞ ¼ 0. If a < 0 or b < 0, mimicking the arguments of part (a) yields
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jajpþ jbjp qp
>maxfjaj; jbjg > a þ b;
which implies /pða; bÞ > 0. Thus, there must hold a P 0 and b P 0. Furthermore, one of a and b must be 0 from part (a).
The proofs of (c) and (d) are direct from the proof of part (b). h
Proposition 2.1(a)shows that /pða; bÞ is negative on the first quadrant of R2-plane, seeFig. 4, whileProposition 2.1(b) shows that /pða; bÞ ¼ 0 can only happen on the nonnegative semiaxes (i.e., a P 0; b ¼ 0 or a ¼ 0; b P 0). In fact, this prop- osition is also equivalent to saying that /pða; bÞ is an NCP-function. In addition,Proposition 2.1(b)–(d) indicate that the value of p does not affect the value /pða; bÞ on the a-axis and b-axis.
Proposition 2.2. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Then,
(a) /pða; bÞ ¼ /pðb; aÞ;
(b) /pis convex, i.e.,
/pð
a
w þ ð1a
Þw0Þ 6a
/pðwÞ þ ð1a
Þ/pðw0Þ for all w; w02 R2anda
2 ½0; 1;(c) if 1 < p1<p2, then /p
1ða; bÞ P /p2ða; bÞ.
Proof. The verifications for part (a) and (b) are straightforward, we omit them. Part (c) is true by applyingLemma 2.2. h
Proposition 2.2(a)shows the symmetric property of /pða; bÞ which means there have a couple of points on plane between line a ¼ b having the same height. In other words, surface z ¼ /pða; bÞ has the same structure on second and forth quadrant of the plane, seeFigs. 4–6.Proposition 2.2(b)says that the shape of surface is convex because the function /pis convex while Proposition 2.2(c)implies that the value of /pis decreasing when the value of p is increasing. In summary, the value of p would affect geometric structure.
Proposition 2.3. If fðak;bkÞg # R2 with ðak! 1Þ or ðbk! 1Þ or ðak! þ1 and bk! þ1Þ, then j/pðak;bkÞj ! þ1 for k ! þ1.
Proof. This can be found in[26, p. 20]. h
Proposition 2.3implies the increasing direction on surface. This can be seen from the contour graph of z ¼ /pða; bÞ which is plotted inFig. 4, where the deep color presents the lower height. In order to understand the structure of the surface, it is nature to investigate special curves on the surface. We consider a family of curves
a
r;p:R! R3defined as follows:a
r;pðtÞ :¼ r þ t; r t; / pðr þ t; r tÞð12Þ
where r 2 R and p 2 ð1; þ1Þ are two arbitrary fixed real number. These curves can be viewed as the intersection of surface z ¼ /pða; bÞ and plane a þ b ¼ 2r, seeFig. 6. We study some properties regarding these special curves.
Lemma 2.4. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Fix any r 2 R, we define f : R ! R as f ðtÞ :¼ /pðr þ t; r tÞ, then f is a convex function.
Proof. We know that /pis a convex function byProposition 2.3and observe that f is a composition of /pand an affine func- tion. Thus, f is convex since it is a composition of a convex function and an affine function (the composition of two convex functions is not necessarily convex, however, our case does guarantee the convexity because one of them is affine). h
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
50 100 150 200 250 300 350
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0 50 100 150 200 250 300 350
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0 50 100 150 200 250 300
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
−50 0 50 100 150 200 250 300
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
−50 0 50 100 150 200 250
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
−50 0 50 100 150 200 250
Fig. 4. Level curves of z ¼ /pða; bÞ with different p.
Theorem 2.1. Let /p:R2! R be given as in(7)where p 2 ð1; þ1Þ. Suppose a and b are constrained on the curve determined by a þ b ¼ 2r (r 2 R) and the surface. Then, /pða; bÞ attains its minima /pðr; rÞ ¼ 21pjrj 2r along this curve at ða; bÞ ¼ ðr; rÞ.
Proof. We know that /pða; bÞ is differentiable except ð0; 0Þ, therefore we discuss two cases as follows.
0 2
4 6
8
10 0 2 4 6 8 10
−6
−4
−2 0
b−axis a−axis
z−axis
Fig. 5. The surface of z ¼ /2ða; bÞ with ða; bÞ 2 ½0; 10 ½0; 10.
Fig. 6. The curve intersected by surface z ¼ /pða; bÞ and plane a þ b ¼ 2r.
(i) Case (1): r ¼ 0. Because a þ b ¼ 0; a and b have opposite sign to each other except a ¼ b ¼ 0, fromProposition 2.1, we know /pða; bÞ P 0 under this case. Thus, when ða; bÞ ¼ ð0; 0Þ; /pða; bÞ attains its minima zero.
(ii) Case (2): r – 0. Fix r and p > 1. Let f : R ! R and g : R ! R be respectively defined as
f ðtÞ :¼ /pðr þ t; r tÞ; gðtÞ :¼ jr þ tjpþ jr tjp:
Then, we calculate that
f0ðtÞ ¼ g0ðtÞ pðgðtÞÞp1p
and g0ðtÞ ¼ p sgnðr þ tÞðr þ tÞh p1 sgnðr tÞðr tÞp1i :
We know gðtÞ > 0 for all t 2 R. It is clear g0ð0Þ ¼ 0, and hence f0ð0Þ ¼ 0. ByLemma 2.4, f ðtÞ is convex on R. In addition, it is also continuous, therefore, t ¼ 0 is a critical point of f ðtÞ which is also a global minimizer of f ðtÞ. The proof is done since a ¼ b ¼ r and /pðr; rÞ ¼ 21pjrj 2r when t ¼ 0. h
Lemma 2.4andTheorem 2.1show that the curve determined by the plane a þ b ¼ 2r and the surface z ¼ /pða; bÞ is convex and attains minima when a ¼ b, seeFig. 7. We now study curvature of the family of curves
a
r;pdefined as in(12)at pointr; r; /r;pðr; rÞ
. Because function /p is not differentiable at ða; bÞ ¼ ð0; 0Þ (i.e., r ¼ 0), we choose two points
t0;t0;/0;pðt0;t0Þ
and t 0;t0;/0;pðt0;t0Þ
where t0>0, and calculate the value of cosine function of the angle between
a
0;pðt0Þ;a
0;pðt0Þ, seeFig. 8.Proposition 2.4. Let
a
r;p:R! R3be defined as in(12), and cospðhÞ be cosine function of the angle between two vectorsa
0;pðt0Þ anda
0;pðt0Þ where t0>0. Then,(a) cospðhÞ ¼ 2
2p6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
22p2
2
þ32
q ;
(b) cospðhÞ ! 13as p ! 1, and cospðhÞ ! 335 as p ! þ1;
(c) if 1 < p1<p2, then cosp1ðhÞ < cosp2ðhÞ.
Proof
(a) By direct computation, we obtain
cospðhÞ ¼
a
0;pðt0Þa
0;pðt0Þk
a
0;pðt0Þkka
0;pðt0Þk¼ 22p 6 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi22pþ 6
þ 21pþ2
r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 22pþ 6
21pþ2
r ¼ 22p 6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 22p 2
2
þ 32
r :
(b) From part (a), let f : ð1; þ1Þ ! R be f ðpÞ :¼ cospðhÞ. Then f ðpÞ is continuous on ð1; þ1Þ. By taking the limit, we have cospðhÞ ! 13as p ! 1, and cospðhÞ ! 335 as p ! þ1.
(c) From part (b), we know f0ðpÞ ¼ 6 1
ln 2
ð pÞ22p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð22p2Þ 2
þ32
q which implies f0ðpÞ > 0 for all p > 1. Therefore, f ðpÞ is a strictly increasing
function on ð1; þ1Þ. h
Proposition 2.5. Let
a
r;p:R! R3be defined as in(12). Then the following hold.(a) The curvature at point
a
r;pð0Þ ¼ r; r; / pðr; rÞis
j
pð0Þ ¼ðp1Þ21p1 jrj . (b)
j
pð0Þ ! 0 as p ! 1 andj
pð0Þ ! þ1 as p ! þ1.(c) If 1 < p1<p2, then
j
p1ð0Þ <j
p2ð0Þ.Proof
(a) Because
a
r;pðtÞ ¼ r þ t; r t; / pðr þ t; r tÞ, we know
a
0r;pð0Þ ¼ ð1; 1; 0Þ anda
00r;pð0Þ ¼ 0; 0;ðp 1Þ21p jrj! :
Recall the formulation of curvature
j
pðtÞ ¼ja
0r;pðtÞ ^a
00r;pðtÞj ja
0r;pðtÞj3 ;where wage operator means the outer product of two vectors. Thus, we have
j
pð0Þ ¼ja
0r;pð0Þ ^a
00r;pð0Þjj
a
0r;pð0Þj3 ¼ðp 1Þ21p1 jrj :(b) Let f : ð1; þ1Þ ! R be defined as
f ðpÞ :¼
j
pð0Þ ¼ðp 1Þ21p1 jrj ;then obviously f ðpÞ is continuous on R. Thus, the desired result follows by taking the limit directly.
(c) From part (b), we compute that
f0ðpÞ ¼21p1
jrj 1 ln 2 p þln 2
p2
;
which implies f0ðpÞ > 0 for all p 2 ð1; þ1Þ. Then f ðpÞ is strictly increasing on ð1; þ1Þ. h
−0.1 −0.05 0 0.05 0.1
−0.0615
−0.061
−0.0605
−0.06
−0.0595
−0.059
−0.0585
−10 −5 0 5 10
−2 0 2 4 6 8 10
−0.5 0 0.5
−5
−4.9
−4.8
−4.7
−4.6
−4.5
−4.4
−0.5 0 0.5
15 15.1 15.2 15.3 15.4 15.5 15.6
Fig. 7. The curve f ðtÞ ¼ /pðr þ t; r tÞ.
The above two propositions shows how p affect the geometric structure, seeFig. 9(a) and (b).Proposition 2.5(b)says that when p ! 1 the curve becomes a straight line, seeFig. 9(c). Note that when p ! þ1 the curve becomes more and more sharp at the point. This curve is not differentiable when t ¼ 0, seeFig. 9(d). To sum up, from all properties we presented in this section we realize that p indeed affect the geometric behavior of surface z ¼ /pða; bÞ both locally and globally.
3. Geometric view of wp
In previous section, we see that generalized FB function /pis convex and differentiable everywhere except ð0; 0Þ. To the contrast, the function wpða; bÞ defined as in(8)is non-convex, but continuously differentiable everywhere. Nonetheless, /p and wphave many similar geometric properties as will be seen later. In this section, we study some properties like what we have done in Section2and compare the difference between wpand /p(seeFigs. 10 and 11).
Proposition 3.1. Let wp:R2! R be given as in(8)where p 2 ð1; þ1Þ. Then,
(a) wpða; bÞ P 0; 8ða; bÞ 2 R2; (b) wpða; bÞ ¼ wpðb; aÞ; 8ða; bÞ 2 R2;
(c) ða ¼ 0 and b P 0Þ or ðb ¼ 0 and a P 0Þ () wpða; bÞ ¼ 0;
(d) b ¼ 0 and a < 0 ) wpða; bÞ ¼ 2a2>0;
(e) a ¼ 0 and b < 0 ) wpða; bÞ ¼ 2b2>0;
(f) wpis continuously differentiable everywhere.
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0 0.5 1 1.5 2 2.5 3 3.5 4
p=1.1 p=1.5 p=2 p=3p=10
−0.10 −0.05 0 0.05 0.1
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
−0.10 −0.05 0 0.05 0.1
0.02 0.04 0.06 0.08 0.1 0.12
Fig. 8. Angle between vectorsa0;pðt0Þ anda0;pðt0Þ.
Proof. Parts (d) and (e) come fromPropositions 2.5(c) and 2.1(d), please see[2–4]for the rest. h
Proposition 2.2(c)says that the value of /pis decreasing with respect to p. To the contrast, wpdoes not have such property.
More specifically, it is true for wp to hold such property only on certain quadrants.
Proposition 3.2. Suppose 1 < p1<p2and ða; bÞ 2 R2. Then,
(a) if a < 0 or b < 0, then wp
1ða; bÞ P wp2ða; bÞ;
(b) if a > 0 and b > 0, then wp
1ða; bÞ 6 wp
2ða; bÞ.
Proof
(a) This is clear fromProposition 2.2(c).
(b) Suppose a > 0 and b > 0, from Proposition 2.1(a), we have /pða; bÞ < 0. Then Proposition 2.2(c) yields /p1ða; bÞ P /p2ða; bÞ, and hence /2p
1ða; bÞ 6 /2p
2ða; bÞ. h
Since wpis not convex in general. The counterpart ofTheorem 2.1is as below.
Theorem 3.1. Let wpða; bÞ be defined as(8)with a þ b ¼ 2r. Then, the following hold.
(a) If r 2 Rþand a > 0; b > 0, then wpða; bÞ attains maxima 2 2p1 21pþ1þ 2
r2when ða; bÞ ¼ ðr; rÞ.
(b) If r 2 R[ f0g, then wpða; bÞ attains minima 2 2p1þ 21pþ1þ 2
r2when ða; bÞ ¼ ðr; rÞ.
−0.5 0 0.5
−0.5
−0.45
−0.4
−0.35
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05 0
p=1.1 p=1.5 p=2p=3 p=10
−0.53 0 0.5
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
p=1.1 p=1.5 p=2p=3 p=10
−0.5 0 0.5
3.986 3.988 3.99 3.992 3.994 3.996 3.998 4
−0.53 0 0.5
3.1 3.2 3.3 3.4 3.5 3.6 3.7
Fig. 9. The curvaturejpð0Þ at pointar;pð0Þ.
Proof
(a) When a > 0 and b > 0,Proposition 2.1(a)says that /pða; bÞ < 0. Since /2pða; bÞ > 0, byTheorem 2.1, the minima of /pða; bÞ becomes maxima of wpða; bÞ.
(b) This is a consequence ofTheorem 2.1. h
The aforementioned results show wp has many similar properties like /p hold, seeFigs. 11 and 12, where we denote w1ða; bÞ :¼12j/1ða; bÞj2and w1ða; bÞ ¼12j/1ða; bÞj2. However, there still are some differences between /pand wp. For example, wpis not convex whereas /pis.Fig. 13depicts the increasing direction of wp. Note that wpða; bÞ is nonnegative and has dif- ferent properties when a > 0 and b > 0, seeFig. 11.
In order to further understand the geometric properties, we define a family of curves as follows:
br;pðtÞ :¼ r þ t; r t; w pðr þ t; r tÞ
; ð13Þ
where r is a fixed real number, and t 2 R. This family of curves can be regarded as intersection of plane a þ b ¼ 2r and surface z ¼ wpða; bÞ, seeFig. 14.
Proposition 3.3. Let br;p:R! R3be defined as in(13). Then the following hold.
(a) The curvature at point br;pð0Þ ¼ r; r; w pðr; rÞ
is
j
pð0Þ ¼ ðp 1Þ21p1 21p1 . (b)j
pð0Þ ! 0 as p ! 1 andj
pð0Þ ! þ1 as p ! þ1.(c) If 1 < p1<p2, then
j
p1ð0Þ <j
p2ð0Þ.Proof
(a) From br;pðtÞ ¼ r þ t; r t; w pðr þ t; r tÞ
, we know
b0r;pð0Þ ¼ ð1; 1; 0Þ and b0r;pð0Þ ¼ 0; 0; ðp 1Þ2 2p sgnðrÞðp 1Þ21pþ1
;
which yields
j
pðrÞ ¼jb0r;pð0Þ ^ b0r;pð0Þjjb0r;pð0Þj3 ¼ ðp 1Þ21p1 21p1 :
(b) Let f : ð1; þ1Þ ! R be defined as f ðpÞ :¼
j
pð0Þ ¼ ðp 1Þ21p1 21p1 . Then the result follows by taking the limit directly.(c) From part (b), it can be verified that f0ðpÞ > 0 for all p 2 ð1; þ1Þ. Thus, f ðpÞ is strictly increasing on ð1; þ1Þ. h
Fig. 14depicts the change of the curve when we have different value of p in which we can see the change of curvature when p is close to one or infinity. We state an addendum to part (a) here: the curvature at another two special points br;pðrÞ ¼ ð0; 2r; 0Þ, br;pðrÞ ¼ ð2r; 0; 0Þ is the same, namely,
j
pðrÞ ¼j
pðrÞ ¼12. Note that although wpis differentiable every- where, the mean curvature at ð0; 0Þ does not exist. To end up this section, we summarize the similarity and difference be- tween /pand wp as below.−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600
b−axis a−axis
z−axis
Fig. 10. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½10; 10 ½10; 10.
/pða; bÞ wpða; bÞ
Difference Convex Nonconvex
differentiable everywhere except ð0; 0Þ/pða; bÞ < 0 when a > 0 and b > 0
differentiable everywhere wpða; bÞ P 0;8ða; bÞ 2 R2 Similarity (1) NCP-function
(2) Symmetry (i.e. /pða; bÞ ¼ /pðb; aÞ and wpða; bÞ ¼ wpðb; aÞ) (3) The function is not affected by p on axes
(4) When ðak! 1Þ or ðbk! 1Þ or ðak;bk! þ1Þ there have j/pðak;bkÞj ! 1 and jwpðak;bkÞj ! 1 (5) Non-coercive
4. Geometric analysis of merit function in descent algorithms
In this section, we employ derivative-free descent algorithms presented in[4,5]to solve the unconstrained minimization problem(11)by using the merit function(10). We then compare two algorithms and study their convergent behavior by investigating an intuitive visualization. We first list these two algorithms as below.
Algorithm 4.1 [4, Algorithm 4.1].
(Step 0) Given real numbers p > 1 and a starting point x02 Rn. Choose the parameters
r
2 ð0; 1Þ; b 2 ð0; 1Þ ande
P0. Set k :¼ 0.(Step 1) IfWpðxkÞ 6
e
, then stop.(Step 2) Let mkbe the smallest nonnegative integer m satisfying Wpðxkþ bmdkÞ 6 ð1
r
b2mÞWpðxkÞ;where
dk:¼ rbwpðxk;FðxkÞÞ and
rbwpðx; FðxÞÞ :¼ rbwpðx1;F1ðxÞÞ; . . . ;rbwpðxn;FnðxÞÞT
: (Step 3) Set xkþ1:¼ xkþ bmkdk, k :¼ k þ 1 and go to Step 1.
Algorithm 4.2 [5, Algorithm 4.1].
(Step 0) Given real numbers p > 1 and
a
P0 and a starting point x02 Rn. Choose the parametersr
2 ð0; 1Þ; b 2 ð0; 1Þ;c
2 ð0; 1Þ ande
P0. Set k :¼ 0.(Step 1) IfWa;pðxkÞ 6
e
, then stop.(Step 2) Let mkbe the smallest nonnegative integer m satisfying
0 2
4 6
8
10 0 2 4 6 8 10
0 5 10 15 20
b−axis a−axis
z−axis
Fig. 11. The surface of z ¼ w2ða; bÞ with ða; bÞ 2 ½0; 10 ½0; 10.
Wa;pðxkþ bmdkð
c
mÞÞ 6 ð1r
b2mÞWa;pðxkÞ;where
dkð
c
mÞ :¼ rbwa;pðxk;FðxkÞÞc
mrawa;pðxk;FðxkÞÞ and−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600 800
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600 800
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10 0
200 400 600
b−axis a−axis
z−axis
−10
−5 0
5 10 −10
−5 0
5 10 0
100 200 300 400 500
b−axis a−axis
z−axis
Fig. 12. The surface of z ¼ /pða; bÞ with different p.
rawa;pðx; FðxÞÞ :¼rawa;pðx1;F1ðxÞÞ; . . . ;rawa;pðxn;FnðxÞÞT
; rbwa;pðx; FðxÞÞ :¼rbwa;pðx1;F1ðxÞÞ; . . . ;rbwa;pðxn;FnðxÞÞT
: (Step 3) Set xkþ1:¼ xkþ bmkdkð
c
mkÞ, k :¼ k þ 1 and go to Step 1.a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
1 2 3 4 5 6 7 x 104
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
1 2 3 4 5 6 7 x 104
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 x 104
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0.5 1 1.5 2 2.5 3 3.5 4 x 104
a−axis
b−axis
−100 −50 0 50 100
−100
−80
−60
−40
−20 0 20 40 60 80 100
0.5 1 1.5 2 2.5 3 3.5 4 x 104
Fig. 13. Level curves of z ¼ wpða; bÞ with different p.
InAlgorithm 4.2, wa;p:R2! Rþis an NCP-function defined by
wa;pða; bÞ :¼
a
2ðmaxf0; abgÞ2þ wpða; bÞ ¼
a
2ðabÞ2þþ1
2ðkða; bÞkp ða þ bÞÞ2
with
a
P0 being a real parameter. Whena
¼ 0, the function wa;preduces to wp. For comparing these two algorithms, we takea
¼ 0 when we useAlgorithm 4.2in this section. Note that the descent direction inAlgorithm 4.1is lack of a certain sym- metry whereasAlgorithm 4.2adopts a symmetric search direction. Under the assumption of monotonicity, i.e.,hx y; FðxÞ FðyÞi P 0 for all x; y 2 Rn;
the error bound is proposed andAlgorithm 4.2is shown to have locally R-linear convergence rate in[5]. In other words, there exists a positive constant
j
2such thatkxk xk 6
j
2 max Wa;pðxkÞ; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Wa;pðxkÞq
12
when
a
¼ 0:Furthermore, the convergence rate ofAlgorithm 4.2has a close relation with the constant
logc
b L1þ
r
CðB;
a
;pÞ
where CðB;
a
;pÞ ¼ 2 21p4a
B2þ 2 þ 2 1p2:Therefore, when the value of p decreases, the convergence rate ofAlgorithm 4.2becomes worse and worse, see Remark 4.1 in [5].
Recall that merit functionWpðxÞ is sum of n nonnegative functions wp, i.e.,
WpðxÞ ¼Xn
i¼1
wpðxi;FiðxÞÞ:
This encourages us to view each component wpðxki;FiðxkÞÞ for i ¼ 1; 2 . . . ; n as the motion with different velocity on the same surface z ¼ wpða; bÞ at each iteration. Due to our study in Sections2 and 3, we observe a visualization that help us understand the convergent behavior in details.Fig. 20depicts the visualization in a four-dimensional NCP inExample 4.3. The merit function of this NCP isWpðxÞ ¼P4
i¼1wpðxi;FiðxÞÞ. We plot point sequences ðx ki;FiðxkÞÞ
for i ¼ 1; 2; 3; 4 together with different color and level curve of surface w1:1ða; bÞ inFig. 20(a). Vertical line represents value of xi, horizontal line represents value of FiðxÞ and skew line means xi¼ FiðxÞ. We take initial point x0¼ ð0; 0; 0; 0Þ which implies Fðx0Þ ¼ ð6; 2; 1; 3Þ, and observe convergent behavior separately with different i from initial point to the solution x¼ ð ffiffiffi
p6
=2; 0; 0; 1=2Þ which is on the hor- izontal line in this figure. Furthermore, we observe the position of point sequence on the surface inFig. 20(a) and merit func- tion which is the sum of their height at each iteration shown as inFig. 20(b).
In one-dimensional NCP, F is continuously differentiable and there is only one variable x in F, so ðx; FðxÞÞ is continuous curve on R2and merit functionWpðxÞ ¼ wpðx; FðxÞÞ is obviously a curve on the surface z ¼ wpða; bÞ, seeFig. 16(a) and (b).
Therefore, point sequence in one-dimensional problem can only lie on the curve x; FðxÞ; w pðx; FðxÞÞ .
−2
−1 0
1
2 −2 −1 0 1 2
0 5 10 15 20 25
y−axis x−axis
z−axis
0 1
2
3 0 0.5 1 1.5 2 2.5 3
0 0.5 1 1.5 2
y−axis x−axis
z−axis
Fig. 14. The curve intersected by surface z ¼ wpða; bÞ and plane a þ b ¼ 2r.
Example 4.1. Consider the NCP, where F : R ! R is given by FðxÞ ¼ ðx 3Þ3þ 1:
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
p=1.1 p=1.5 p=2p=3 p=10
−0.50 0 0.5
1 2 3 4 5 6 7 8x 10−3
−0.5 0 0.5
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.5 1 1.5
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
0.5 p=1
p=1.1 p=1.2 p=1.3 p=1.4
0.5 1 1.5
0 0.02 0.04 0.06 0.08 0.1 0.12
0.14 p=100
p=10 p=5p=4 p=3
Fig. 15. The curvature jpð0Þ at point br;pð0Þ.
The unique solution of this NCP is x¼ 2. Note that F is strictly monotone, see geometric view of this NCP problem inFig. 16.
The value of merit function with each iteration is plotted inFig. 16(c) which presents the different behavior of the functions with different value p near by the solution.Fig. 17(a)–(d) depict convergent behavior inAlgorithm 4.1from two direction with two different initial points, andFig. 17(e) and (f) show convergent behavior with different p.Fig. 19(a)–(d) depict con- vergent behavior inAlgorithm 4.2from two direction with two different initial points. We found thatAlgorithm 4.2always produce point sequence in or close to the boundary of feasible set, i.e., ðx; FðxÞÞ : x P 0 and FðxÞ P 0f g. Based onProposition 3.2, the speed of the decreasing of merit function with different initial point inAlgorithm 4.1is different when we increase p.
But it is similar with different initial point inAlgorithm 4.2. This phenomena is consistent with geometric properties studied in Section3.
To show the importance of inflection point, we give an extreme example as follows:
Example 4.2. Consider the NCP, where F : R ! R is given by
FðxÞ ¼ 1:
The unique solution of this NCP is x¼ 0. From above discussion, we know that point sequence is on the curve x; 1; w pðx; 1Þ , seeFig. 18(a).Fig. 18(c) shows there is rapid decreasing of merit function form the 80th to 120th iteration.Fig. 18(b) shows the behavior during 80th to 120th iteration. Observing the width of the level curve inFig. 18(b), we found that rapid decreas- ing may arise from the existence of inflection point on the surface.Figs. 18(c)–(f) andFig. 19(e) and (f) show that the position of inflection point may change with different p.
1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6
−1
−0.5 0 0.5 1 1.5
x
F(x)
−1 0 1 2 3 4
−1
−0.5 0 0.5 1 1.5 2
x
F(x)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x Ψp(x,F(x))
p=1.1 p=1.5 p=2p=3 p=100
Fig. 16. Geometric view of NCP inExample 4.1.
Example 4.3. Consider the NCP, where F : R4! R4is given by
FðxÞ ¼
3x21þ 2x1x2þ 2x22þ x3þ 3x4 6 2x21þ x1þ x22þ 3x3þ 2x4 2 3x21þ x1x2þ 2x22þ 2x3þ 3x4 1
x21þ 3x22þ 2x3þ 3x4 3 0
BB B@
1 CC CA:
−1 0 1 2 3 4 5
−8
−6
−4
−2 0 2 4 6
x
F(x)
20 40 60 80 100 120 140
0 20 40 60 80 100 120
0 0.5 1 1.5 2 2.5 3 3.5x 10−3
Iteration
Merit function
−1 0 1 2 3 4 5
−1
−0.5 0 0.5 1 1.5 2 2.5 3
x
F(x)
1 2 3 4 5 6 7
0 200 400 600 800 1000 1200 1400
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10−5
Iteration
Merit function
−1 0 1 2 3 4 5
−8
−6
−4
−2 0 2 4 6
x
F(x)
20 40 60 80 100 120 140
0 50 100 150 200 250
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
2x 10−3
Iteration
Merit function
Fig. 17. Convergent behavior ofAlgorithm 4.1and the value of merit function inExample 4.1.