AnovelgeneralizationofthenaturalresidualfunctionandaneuralnetworkapproachfortheNCP Neurocomputing

(1)

A novel generalization of the natural residual function and a neural network approach for the NCP

Jan Harold Alcantara, Jein-Shan Chen

^⇑^,1

Department of Mathematics, National Taiwan Normal University, Taiwan

a r t i c l e i n f o

Article history:

Received 30 January 2020 Revised 20 April 2020 Accepted 11 June 2020 Available online 23 June 2020 Communicated by Q. Wei

MSC[2010]:

37-N40 65-K10 65-K15 Keywords:

Complementarity functions Natural residual function

Nonlinear complementarity problem

a b s t r a c t

The natural residual (NR) function is a mapping often used to solve nonlinear complementarity problems (NCPs). Recently, three discrete-type families of complementarity functions with parameter pP 3 (where p is odd) based on the NR function were proposed. Using a neural network approach based on these families, it was observed from some preliminary numerical experiments that lower values of p provide better convergence rates. Moreover, higher values of p require larger computational time for the test problems considered. Hence, the value p¼ 3 is recommended for numerical simulations, which is rather unfortunate since we cannot exploit the wide range of values for the parameter p of the family of NCP functions. This paper is a follow-up study on the aforementioned results. Motivated by previously reported numerical results, we formulate a continuous-type generalization of the NR function and two corresponding symmetrizations. The new families admit a continuous parameter p> 0, giving us a wider range of choices for p and smooth NCP functions when p> 1. Moreover, the generalization subsumes the discrete-type generalization initially proposed. The numerical simulations show that in general, increased stability and better numerical performance can be achieved by taking values of p in the interval

1; 3

ð Þ. This is indeed a significant improvement of preceding studies.

1. Motivation

The nonlinear complementarity problem (NCP) is very important in engineering and economic applications[11], as well as in operations research [8]. In particular, given a mapping F: Rⁿ! Rⁿ, the problem consists of finding a vector x2 Rⁿsatis- fying the conditions

xP 0; F xð Þ P 0 and hx; F xð Þi ¼ 0:

This problem will be denoted by NCP(F). The solution set of this problem is denoted by SOL Fð Þ and the feasible region is denoted by XF:¼ x 2 Rf ⁿjx P 0; F xð Þ P 0g. Some solution methods for NCP(F) can be found in[1,6,9,10,13,14,21,17,18,25,27]. A natural reformu- lation of NCP(F) is to consider the fixed-point problem

x¼ PKðx F xð ÞÞ;

where PKdenotes the projection onto K with K¼ Rⁿþ. Consequently, NCP(F) is equivalent to solving the equation

/NRðx1; F1ð ÞxÞ ...

/NRðxn; Fnð ÞxÞ 0

BB

@

1 CC A ¼ 0;

where

/NRða; bÞ ¼ a a bð Þ_þ; ð1Þ

and tþ:¼ max t; 0f g. The function /NRis called the natural residual (NR) function. In fact, /NRcan be replaced by any other function / : R²! R with the property that

/ a; bð Þ ¼ 0 () aP 0; b P 0; ab ¼ 0; ð2Þ that is, NCP(F) and the system

U^Fð Þ :¼^x

/ xð 1; F1ð ÞxÞ ...

/ xð n; Fnð ÞxÞ 0

BB

@

1 CC

A ¼ 0 ð3Þ

⇑Corresponding author.

E-mail addresses:[email protected](J.H. Alcantara),[email protected].

edu.tw(J.-S. Chen).

1 The author’s work is supported by Ministry of Science and Technology, Taiwan.

Contents lists available atScienceDirect

Neurocomputing

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m/ l o c a t e / n e u c o m

(2)

are equivalent. A function satisfying(2)is known in the literature as an NCP-function. Other than the NR function, the generalized Fischer-Burmeister (GFB) function

/^pFBða; bÞ ¼ k a; bð Þk_p a þ bð Þ; p > 1 ð4Þ is another popular NCP function used in dealing with the complementarity problem. The GFB function is known as a ‘‘continuous”

extension of the famous Fischer-Burmeister (FB) function given by

/FBða; bÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a²þ b² q

a þ bð Þ;

which can be obtained by taking p¼ 2 in expression(4). The generalization is considered continuous since p can take on any value in the interval 1ð ; 1Þ. Motivated by this extension, a generalization of the NR function(1)was formulated in[5]which is given by /^p_NRða; bÞ ¼ a^p a bð Þ_þp

; ð5Þ

where p is an odd integer. Indeed, taking p¼ 1 yields the NR function(1). This generalization is considered to be of ‘‘discrete’” type since p can only take odd integral values. Note that /^pNRis a twice continuously differentiable function for pP 3 but its surface is not symmetric. To resolve this, two symmetrizations were proposed in[3], which are given by

/^p_SNRða; bÞ ¼ a^p a bð Þ^p if aP b;

b^p b að Þ^p if a< b;

(

ð6Þ

and

w^p_SNRða; bÞ ¼ a^pb^p a bð Þ^pb^p if aP b;

a^pb^p b að Þ^pa^p if a< b;

(

ð7Þ

where pP 3 is an odd integer. Properties of these three discrete- type families are elaborated in[2,15].

The first attempt to use the above three discrete-type functions in designing solution methods for NCP was a neural network approach, which was presented in our previous work[2]. To con- struct the neural network, note that by takingUF as defined in (3), the unconstrained minimization problem min_x2RⁿW^F^{ð Þ, where}^x W^F^{ð Þ ¼}^x ¹₂^kU^F^{ð Þk}^x ²^¼¹₂^Xⁿ

j¼1

/ x j; Fjð Þx2

; ð8Þ

is equivalent to NCP(F). Then the gradient dynamical system dx

dt¼

q

rWFðx tð ÞÞ; x 0ð Þ ¼ x⁰ ð9Þ is a natural neural network to be considered to deal with NCP(F).

In[2], the discrete-type functions(5), (6), and(7)were used to form the merit functionWF. Preliminary numerical experiments conducted in[2] showed that lower values of the parameter p result to faster convergence, although theoretical evidence for this phenomenon is yet to be verified. Moreover, longer compu- tation time is usually required when a higher value of p is used.

There are also test instances when larger values of p lead to ill- conditioning problems. In turn, the choice p¼ 3 may seem opti- mal in practice. In other words, the results suggest that choosing higher values of p need not be done. Consequently, this seems to suggest that the discrete-type generalization appears to be not very useful in the sense that only one member of each of the families is useful for numerical purposes. This motivates us to explore if there exists a continuous generalization of the NR function, i.e. a generalization parametrized by p which assumes values on some interval. This will provide us more values to consider for the tunable parameter, instead of just the odd integers with value at least 3.

We provide an affirmative answer to this problem. More pre- cisely, the main contributions of this paper are as follows:

(i) We propose a continuous-type generalization of the NR function. The proposed function does not have a symmetric surface, but we provide two symmetrizations which also admit a continuous parameter p. This generalizes the results in[3,5].

(ii) We establish several properties of these newly formulated NCP functions which are prerequisite to designing solution methods for the complementarity problem, which are not lim- ited to the neural network approach. These properties extend the results in[15].

(iii) Stability properties of the neural network(9)will be established as important extensions of the results in[2].

More importantly,

(iv) We illustrate that the proposed continuous generalization is meaningful. In particular, it provides a wider range of values of p which offer better convergence rates than the ones based on the discrete-type generalization and their symmetrizations illustrated in[2].

(v) We provide theoretical evidence for the performance dependence on p of the gradient dynamical systems based on the three new families of NCP functions. This was not accomplished in[2].

(vi) This work is a significant improvement of the numerical results that were initially presented in [2], since the proposed families not only provide faster convergence rates but also higher stability. That is, the proposed generalizations yield neural networks which are less sensitive to initial conditions, which is one of the main issues encountered in [2].

In summary, this paper can be viewed as an important extension of the works presented in[2,3,5,15]where the discrete-type generalization and two discrete-type symmetrizations of the NR function were studied.

This paper is organized as follows: In Section2, we present our proposed continuous generalization of the NR function.

We also prove important properties of the obtained generalization, which are extensions of the results given in [2,3,5,15]. The theoretical properties proved in this section will later be used in the analysis of the neural network, which will be presented in Section 3. Results of several numerical experiments are presented in Section 4 and elaborately dis- cussed in Section 5. Concluding remarks are presented in Section6.

2. Continuous generalization

Our proposed generalization of the NR function is defined as

/^pNRða; bÞ ¼ sgn að Þjaj^p a bð Þ_þp

: ð10Þ

Here, we assume that p is any number in 0ð ; 1Þ and sgn tð Þ :¼

1 if t> 0 0 if t¼ 0

1 if t < 0 8>

<

>: :

Observe that /^pNR is an NCP function. Indeed, note that /^pNRða; bÞ ¼ f að Þ f a bð Þ_þ

, where f tð Þ ¼ sgn tð Þjtj^p which is a bijective function. It follows that

(3)

/^p_NRða; bÞ ¼ 0 () f að Þ ¼ f a bð Þ_þ () a ¼ a bð Þ_þ() /NRða; bÞ ¼ 0:

Note that if p is odd, then /^pNR¼ /^pNRand so the above generalization subsumes the discrete-type extension given by(5). We note herein that the transformation employed on /NR via the monotonic function f can always be applied to any NCP function of the form / ¼ /1 /2. This fact has also been noted in [12].

It is easy to see that the function(10)does not have a symmetric surface. Employing the same strategy as in[3], we propose two symmetrizations of /^pNRas

/

p

SNRða; bÞ ¼ sgn að Þjaj^p a bð Þ^p if aP b;

sgn bð Þjbj^p b að Þ^p if a< b;

(

ð11Þ

and

w^p_SNRða; bÞ ¼ sgn að Þsgn bð Þjaj^pjbj^p sgn bð Þ a bð Þ^pjbj^p if aP b;

sgn að Þsgn bð Þjaj^pjbj^p sgn að Þ b að Þ^pjaj^p if a< b;

(

ð12Þ where p> 0. Notice that /^pSNR¼ /^pSNRand w^pSNR¼ w^pSNRwhenever p is odd.

Proposition 1. For any p> 0, the functions /^pNR; /^p_SNR, and w^p_SNR are NCP functions. Moreover, /^pNRða; bÞ > 0 (/^p_SNRða; bÞ > 0) if and only if a> 0 and b > 0, while w^pSNRða; bÞ P 0 for all a; bð Þ 2 R².

Proof. That /^pNRis an NCP function follows from the above discussion. Moreover, note that a> 0 and b > 0 if and only if a > a bð Þ_þ. Since f tð Þ ¼ sgn tð Þjtj^p is strictly increasing, we see that a> 0 and b> 0 if and only if sgn að Þjaj^p> sgn a bð Þ_þ

j a bð Þ_þj^p, i.e.

/

p

NRða; bÞ > 0. On the other hand, observe that

/^p_SNRða; bÞ ¼ /^pNRða; bÞ if a P b;

/^pNRðb; aÞ if a < b;

8<

: ð13Þ

and

w^p_SNRða; bÞ ¼ sgn bð Þjbj^p/

p

NRða; bÞ if a P b;

sgn að Þjaj^p/^pNRðb; aÞ if a < b:

8<

: ð14Þ

Using above identities and the fact that /^pNRis an NCP function, then /^pSNRand w^pSNRare also NCP functions with algebraic signs as specified in the proposition. h

In view of the above proposition, we may then view the functions /^pNR, /^pSNRand w^pSNRas continuous generalizations of the functions /^pNR; /^pSNRand w^pSNR. Now, we establish some properties of the above functions which will later be used in the neural network approach. We begin with smoothness properties.

By C¹^{ð Þ and C}X ²ð Þ, we mean the class of continuously differen-X tiable and twice continuously differentiable functions defined onX Rⁿ, respectively.

Proposition 2. The following result holds:

(a) If p> 1, the function /^pNR2 C¹ R²

whose gradient is given by

r/^pNRða; bÞ ¼ p jaj^p1 a bð Þ^p1sgn að bÞ_þ a b

ð Þ^p1sgn að bÞ_þ

" #

:

If p> 2, then /^pNR2 C² R²

whose Hessian is given by r²/

p

NRða;b¼p p1ð Þ

sgn að Þjaj^p2 abð Þ^p2sgn abð Þ_þ ab

ð Þ^p2sgn abð Þ_þ ab

ð Þ^p2sgn abð Þ_þ

abð Þ^p2sgn abð Þ_þ

" #

:

(b) If p> 1, the function /

p

SNR2 C¹ð ÞX where X:¼ a; bfð Þ j a – bg. In this case, the gradient of /^pSNRis given by

r/

p

SNRða;bÞ¼ phjaj^p1 abð Þ^p1; abð Þ^p1iT

if a>b;

p bahð Þ^p1;jbj^p1 bað Þ^p1iT

if a<b:

8>

<

>:

Further, /^pSNRis differentiable at 0ð ; 0Þ withr/^pSNRð0; 0Þ ¼ 0; 0½ ^T. If p> 2, then /^pSNR2 C²ð Þ with Hessian given byX

r²/

p

SNRða;bÞ¼

p p1ð Þ sgn að Þjaj^p2 abð Þ^p2 ðabÞ^p2 ab

ð Þ^p2 abð Þ^p2

" #

if a>b;

p p1ð Þ bað Þ^p2 ðbaÞ^p2 ba

ð Þ^p2 sgn bð Þjbj^p2 bað Þ^p2

" #

if a<b:

8>

>>

<

>>

>:

(c) If p> 1, then w^pSNR2 C¹ R²

whose gradient is given by rw^p_SNRða;bÞ

¼

p sgn bð Þjbj^pjaj^p1 abð Þ^p1

sgn að Þjaj^pjbj^p1 abð Þ^pjbj^p1þsgn bð Þ abð Þ^p1jbj^p 2

4

3 5if a>b;

pjaj^2p1 1 1

if a¼b;

p sgn bð Þjaj^p1jbj^p bað Þ^pjaj^p1þsgn að Þ bað Þ^p1jaj^p sgn að Þjaj^pjbj^p1 bað Þ^p1

2 4

3 5 if a<b;

8>

>>

<

>>

>:

If p> 2, then w^pSNR2 C² R²

whose Hessian is given by

(4)

r²w^p_SNRða; bÞ ¼ p

p 1

ð Þ sgn ah ð Þsgn bð Þjaj^p2jbj^pi

p 1ð Þ a bð Þ^p2sgn bð Þjbj^p

p 1

ð Þ a bð Þ^p2sgn bð Þjbj^p þp jajh ^p1 a bð Þ^p1i

jbj^p1

p 1

ð Þ a bð Þ^p2sgn bð Þjbj^p þp jajh ^p1 a bð Þ^p1i

jbj^p1

p 1

ð Þ sgn ah ð Þsgn bð Þjaj^pjbj^p2i

p 1ð Þ a bð Þ^psgn bð Þjbj^p2 þ2p a bð Þ^p1jbj^p1

p 1ð Þ a bð Þ^p2sgn bð Þjbj^p 2

6666 6666 64

3 7777 7777 75 ifa> b;

p ðp 1Þsgn að Þsgn bð Þjaj^p2jbj^p pjaj^p1jbj^p1 pjaj^p1jbj^p1 ðp 1Þsgn að Þsgn bð Þjaj^pjbj^p2

ifa¼ b;

p p 1

ð Þ sgn ah ð Þsgn bð Þjaj^p2jbj^pi

p 1ð Þ b að Þ^psgn að Þjaj^p2 þ2p b að Þ^p1jaj^p1

p 1ð Þ b að Þ^p2sgn að Þjaj^p

p 1

ð Þ b að Þ^p2sgn að Þjaj^p þp jbjh ^p1 b að Þ^p1i

jaj^p1

p 1

ð Þ b að Þ^p2sgn að Þjaj^p þp jbjh ^p1 b að Þ^p1i

jaj^p1

p 1

ð Þ sgn ah ð Þsgn bð Þjaj^pjbj^p2i

p 1ð Þ b að Þ^p2sgn að Þjaj^p 2

6666 6666 64

3 7777 7777 75 ifa< b:

8>

>>

<

>>

>:

Proof. Note that f tð Þ ¼ sgn tð Þjtj^p is continuously differentiable when p> 1 with f 0 tð Þ ¼ pjtj^p1. Moreover, f is twice continuously differentiable when p> 2 with f tð Þ ¼ p p 1ð Þsgn tð Þjtj^p2. Using these and the alternative formulas given in(13) and (14), the gra- dients and Hessians can be easily obtained. The calculations are omitted. h

The above proposition is a generalization of[5, Proposition 2.2], and[15, Proposition 4.3]. On the other hand, the following result is an extension of [15, Proposition 3.4, Proposition 4.5, and Proposition 5.4].

Proposition 3. Let p> 1. Then, the following hold:

(a)ra/

p

NRða; bÞ rb/

p NR

a; b

ð Þ > 0 on a; bfð Þja > b > 0 or a > b > 2ag;

¼ 0 onfða; bÞja 6 b or a > b ¼ 2a or a > b ¼ 0g;

< 0 otherwise:

8<

: (b)ra/

p

SNRða;bÞ r_b_/^p_SNR

ða;bÞ

> 0 on a;bfð Þja > b > 0 or a > b > 2ag and onfða;bÞjb > a > 0 or b > a > 2bg;

¼ 0 onnða;bÞj/^pSNRða;bÞ ¼ 0 or a > b ¼ 2a or b > a ¼ 2bo

;

< 0 otherwise:

8>

>>

<

>>

>:

(c)raw

p

SNRða; bÞ rbw

p

SNRða; bÞ > 0 on the first quadrant R²þþ, and w^pSNRða; bÞ ¼ 0 ()r_w^p_SNRða; bÞ ¼ 0.

Proof. UsingProposition 2(a), ra/^pNRða; bÞ rb/^pNRða; bÞ

¼ p² jaj^p1 a bð Þ^p1sgn að bÞ_þ

a bð Þ^p1sgn að bÞ_þ h

p²hjaj^p1 a bð Þ^p1i a b

ð Þ^p1 ifa> b

0 ifa6 b

(

:

Suppose now that a> b. Since g tð Þ :¼ t^p1is a strictly increasing function on 0½ ; 1Þ; jaj^p1 a bð Þ^p1> 0 if and only if jaj > a b, which happens if and only if b> 0 or b > 2a. This establishes

Proposition 3(a). Statement (b) easily follows from (a), while (c) can be easily verified using the result ofProposition 2(c). h

We now establish the growth behavior of the proposed families of functions. We first establish the following simple lemma.

Lemma 1. For any x2 0; 1½ and any p > 0, we have 1 x

ð Þ^p6 1

1þ px :

Proof. Define f : 0; 1½ ! R by f xð Þ ¼ 1 xð Þ^pð1þ pxÞ. A simple cal- culation yields f0 xð Þ ¼ p p þ 1ð Þx 1 xð Þ^p1. Then, f monotonically decreases on ½0; 1 from f 0ð Þ ¼ 1 to f 1ð Þ ¼ 0. Consequently, 06 f xð Þ 6 1. This completes the proof. h

Proposition 4. Let / 2 /^pNR; /^pSNR; w^pSNR

n o

. Then j/ a ^k; b^k j ! 1 for any sequence na^k; b^ko1

k¼1 in R² such that ja^kj ! 1 and jb^kj ! 1.

Proof. The proposition follows from the preceding lemma and analogous arguments in the proof of[2, Lemma 5.1]. h

3. Stability analysis

In this section, we consider the neural network given by (9) using the functions /^pNR; /^pSNRand w^pSNR. The corresponding merit functions defined by (8) will be denoted, respectively, by W^pNR;W^pS1NRandW^pS2NR. The case when p is an odd integer greater than 1 is the neural network studied in[2].

We note that since the results presented in Section2generalize the results for the discrete families originally formulated in [3,5,15], then the discussion presented in[2]can be extended to establish the properties of the induced merit functions W^pNR;W^pS1NRandW^pS2NR corresponding to the continuous generalization. In the following proposition, we summarize the properties of these merit functions. For conciseness and clarity, we present a shortened proof of the following result pointing out the arguments

(5)

that needed to be modified in the proofs of the results in[2]. We refer the reader to the monograph[8]for definitions and properties of nonlinear mappings (P0-functions, monotone functions, etc.) and the book[22]for standard results in the theory of ordinary differ- ential equations.

Proposition 5. Let p> 1. Then the following hold:

(a) IfðrF IÞ is a P-matrix, then every stationary point ofW^pNRis a global minimizer.

(b) If F xð Þ P 0; ðrF xð Þ I Þ is a P0-matrix and xis a stationary point ofW^pNR, then xis a global minimizer ofW^pNR.

(c) Suppose that x2XFandrF xð Þ is a P 0-matrix. If xis a stationary point ofW^pS1NRorW^pS2NR, then xis a global minimizer.

Proof. To prove (a) and (b), we define two diagonal matrices A xð Þ and B xð Þ where

Aiið Þ ¼ jxx _ij^p1 and Biið Þ ¼ xx _i Fið Þx

sgn x _i Fið Þx

þ; where xis an equilibrium point of(9)withWF¼W^pNR. Then, analogous arguments as in the proof of[2, Proposition 4.4 and Remark 4.1]lead to the desired conclusion. To prove (c), we proceed as in the proof of [2, Proposition 4.5]. That is, we verify the following properties:

(P1)8ða; bÞ 2 R²þ, we haveraw a; bð Þ rbw a; bð Þ P 0; and

(P2) 8ða; bÞ 2 R²þ, we have

raw a; bð Þ ¼ 0 ()rbw a; bð Þ ¼ 0 () / a; bð Þ ¼ 0,

where w :¼¹2/²and / 2 /^pSNR; w^pSNR

n o

. Property (P1) can be easily verified. To show (P2), we only need to show that given a; b P 0, the following holds:

(i)ra/^pSNRða; bÞ ¼ 0 implies /^pSNRða; bÞ ¼ 0; and (ii)raw^pSNRða; bÞ ¼ 0 implies w^pSNRða; bÞ ¼ 0.

We first prove (i). Ifra/

p

SNRða; bÞ ¼ 0, then we see fromPropo- sition 2 (b) that we must have a> b or a ¼ b ¼ 0. Otherwise, ra/

p

SNRða; bÞ ¼ p b að Þ^p1 would be positive. If a¼ b ¼ 0, then /^pSNRða; bÞ ¼ 0 as desired. If a> b, then 0¼¹_pra/^pSNRða; bÞ ¼ a^p1 a bð Þ^p1. Since t# t^p1 is strictly increasing on ½0; 1Þ, then a¼ a b, i.e. b¼ 0. Then /

p

SNRða; bÞ ¼ 0 since a > b ¼ 0 and /^pSNR is an NCP function. To prove (ii), assume that raw

p

SNRða; bÞ ¼ 0. From Proposition 2(c), we must have

0¼1

praw^p_SNRða;bÞ ¼ a^p1b^p a bð Þ^p1b^p if aP b;

a^p1b^p b að Þ^pa^p1þ b að Þ^p1a^p if a< b:

(

If aP b, then we can proceed as in[2, Prop 4.5]. If a< b, then

0¼ a^p1b^p b að Þ^pa^p1þ b að Þ^p1a^p

¼ a^p1b^p b að Þ^pþ b að Þ^p1a

: ð15Þ

From here, we conclude that a¼ 0. Otherwise, we must have b^p> b að Þ^p and so b^p b að Þ^pþ b að Þ^p1a> b að Þ^p1a> 0.

This contradicts(15). Hence, a¼ 0 and since b > a ¼ 0, we obtain that w^pSNRða; bÞ ¼ 0 by definition of an NCP function.

In view of the above proposition and the stability analysis presented in[2], we present herein analogous stability results. The proofs are similar to corresponding propositions for the discrete generalization established in[2], and are thus omitted. In particular, Proposition 3.2 (a) follows from[2, Theorem 5.1], Proposition 3.2 (b) and (c) follow from[2, Theorem 5.2], and Proposition 3.2 (d) is a consequence of[2, Theorem 5.2].

Proposition 6. Let xbe an equilibrium point of(9).

(a) If WF2 W^pNR;W^pS1NR;W^pS2NR

and F is a uniformly P- function, then the solution to(9)through any x⁰2 Rⁿconverges to x.

(b) IfWF¼W^pNR, then x2 SOL Fð Þ provided thatðrF IÞ is a P- matrix. If xis isolated, then it is asymptotically stable.

(c) If x2XF andWF¼W^pS1NRorWF¼W^pS2NR, then x2 SOL Fð Þ provided that F is a P0-function. If x is isolated, then it is asymptotically stable.

(d) If rUFð Þ is nonsingular, where / 2 /x ^pNR; /^pSNR; w^pSNR

n o

, and x is isolated, then x2 SOL Fð Þ and x is exponentially stable.

The parameter p has a very significant influence in the rate of convergence of the neural network. For the discrete type families, a few set of test problems was considered in[2], where the numerical experiments revealed that a lower value of p2 3; 5; 7; . . .f g often provides faster convergence. However, there is no theoretical evidence yet for this phenomenon.

In fact, as we shall see in Section 4, different convergence behaviors can be observed when we vary the values of p. In particular, a lower value of p does not always lead to faster convergence.

There are test instances when a higher value of p offers faster convergence rate. The numerical experiments presented in the next section suggest that there is no simple relation that can be obtained regarding the performance dependence on p of the neural network(9)withWF2 W^pNR;W^pS1NR;W^pS2NR

. Moreover, the simulations suggest that initial conditions have a significant influence on the performance of the neural network and its dependence on p. To make sense of these phenomenon, we establish the following theorem. The first part of the proof is a derivation of an error bound for the NCP(F) (see equation(18)) where F is a locally Lipschitz uniformly P-function. The tech- nique employed in the derivation is similar to the idea used in [8, Proposition 6.3.1].

Theorem 1. Consider the neural network(9)withW^F^¼W^pS1NRfor a given p> 1. Suppose that x2 SOL Fð Þ is exponentially stable and F is a uniformly P function that is locally Lipschitz continuous. Then there exist positive constants K;

x

and d such that for all t P 0, we have

kx tð Þ xk 6 K pþ 1 p

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2W^pS1NRð Þx⁰ q

¹_p

e^x^t 8x⁰2XF\ N_dð Þ;x

where N_dð Þ ¼ y : ky xx f k < dg.

(6)

Proof. Suppose F is uniformly P with modulus

j

> 0. Given x 2 Rⁿ, let j2 1; . . . ; nf g such that

xj x_j

Fjð Þ Fx jð Þx

P x i x_i

Fið Þ Fx ið Þx

ð Þ 8i¼ 1; . . . ; n:

Then

j

^{kx x}^k²6 x j x_j

Fjð Þ Fx jð Þx

¼ xjFjð Þ xx _j xj

Fjð Þ:x ð16Þ

Meanwhile, note that sð tþÞ tðþ tÞ P 0 for any s P 0 and t2 R. Since min x j; Fjð Þx

¼ xj x j Fjð Þx

þ, then taking s¼ x_j P 0 and t ¼ xj Fjð Þ, we havex

x_j xjþ min x j; Fjð Þx

Fjð Þ min xx j; Fjð Þx

P 0 which implies that

x_j xj

Fjð Þ P xx j xj

min x j; Fjð Þx

Fjð Þx

min x j; Fjð Þx

: ð17Þ

Since xjP min x j; Fjð Þx

and Fjð Þ P 0, we have from inequal-x ities(16) and (17)that

j

^{kx x}^k²6 F jð Þ Fx jð Þx

x_j xj

h i

min x j; Fjð Þx 6 kF xð ð Þ F xð Þk þ kx x kÞj min x j; Fjð Þx

j

Since F is locally Lipschitz, we conclude that given any x2 Rⁿin some neighborhood of x, there exists an index j¼ j xð Þ and L > 0 such that

j

^{kx x}^k²6 1 þ Lð Þ j min x j; Fjð Þx

j kx xk: ð18Þ Now, let x⁰2XF. We have from part (a) of the proof of [2, Lemma 5.1] and using Lemma 1 that /

p

SNRða; bÞ P

p

pþ1ðmin af ; bgÞ^p for any a; b P 0. By (18), there exists j¼ j x ⁰

2 1; . . . ; nf g such that

j

^kx⁰^xk 6 1 þ Lð Þ pþ 1

p /^p_SNRx⁰_j; Fj x⁰

¹_p

: ð19Þ

Since x is exponentially stable, there exist positive constants d; c and

x

such that for any tP 0, kx tð Þ xk 6 ce^x^tkx⁰ xk for all x⁰2 N_dð Þ. This, together with inequalityx (19), gives the desired result with K:¼_j^cð1þ LÞ. h

Similarly, we get the following error bound result for the other two merit functionsW^pNRandW^pS2NR.

Theorem 2. Consider the neural network(9)for a given p> 1, and let x2 SOL Fð Þ be exponentially stable. Suppose that F is a uniformly P-function and locally Lipschitz continuous. Then

(a) IfWF¼W^pNR, there exist positive constants K,

x

and d such that for all tP 0, we have

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2W^pNRð Þx⁰ q

¹_p

e^x^t 8x⁰2XF\ N_dð Þ:x

(b) IfWF¼W^pS2NR, there exist positive constants K,

x

and d such that for all tP 0, we have

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2W^pS2NRð Þx⁰ q

_2p¹

e^x^t 8x⁰2XF\ N_dð Þ:x

Proof. For aP b P 0, then /^pNRða; bÞ ¼ /^pSNRða; bÞ Ppþ1^p b^p as in part (a) of the proof of[2, Lemma 5.1]. When 06 a < b, we have /^pNRða; bÞ ¼ a^pPpþ1^p a^p. It follows that /^pNRða; bÞ Ppþ1^p ðmin af ; bgÞ^p. On the other hand, using the identity (14) and the fact that /^pSNRða; bÞ Ppþ1^p ðmin af ; bgÞ^p for any a; b P 0, we derive that w

p

SNRða; bÞ Ppþ1^p ðmin af ; bgÞ^2p. Using these identities and the same arguments as inTheorem 1, we get the desired inequalities. h

0 5 10 15 20 25

p 0

1 2 3 4 5 6 7

ga,b(p) (2.0748,4.2431)

a=4, b=0.5 a=4,b=3 a=4,b=2

Fig. 1. Graph of upper bound for the error termkx tð Þ xk for some values of a and b with a; b P 0 and a > b.

30 32 34 36 38 40

p 4.0019

4.00195 4.002 4.00205 4.0021 4.00215 4.0022

ga,b(p)

(34.4458,4.0022)

Fig. 2. Graph of g_4;0:5ð Þ on the interval 30; 40p ½ .

(7)

Table 1

Numerical results for NCP1 and NCP2 using the neural networks based on /^pNR; /^pSNRand w^pSNRfor different values of p.

p NCP1 NCP2

CT1 Gap1 CT2 Gap2 CT3 Gap3 CT1 Gap1 CT2 Gap2 CT3 Gap3

1.0 3.0E+0 1.4E7 3.0E+0 7.6E8 1.4E+3 6.1E3 1.9E+1 6.7E7 1.7E+1 3.4E7 6.7E+3 3.1E2

1.1 3.2E+0 1.5E7 2.7E+0 9.5E8 1.8E+3 8.4E3 5.1E+1 1.6E5 2.0E+1 3.1E7 7.9E+3 3.7E2

1.5 5.1E+0 1.9E7 3.1E+0 1.8E7 3.9E+3 2.5E2 1.1E+3 2.4E3 4.5E+1 6.0E7 1.4E+4 7.5E2

1.9 1.0E+1 4.1E7 6.5E+0 3.6E7 6.9E+3 5.4E2 4.7E+3 1.9E2 1.2E+2 1.9E6 2.0E+4 1.3E1

2.0 1.2E+1 4.2E7 8.0E+0 4.3E7 7.8E+3 6.4E2 6.0E+3 2.7E2 1.6E+2 2.6E6 2.1E+4 1.5E1

2.1 1.5E+1 6.1E7 1.0E+1 5.4E7 8.7E+3 7.5E2 7.3E+3 3.6E2 2.1E+2 3.7E6 2.2E+4 1.7E1

2.5 3.5E+1 1.5E6 2.5E+1 1.5E6 1.3E+4 1.3E1 1.3E+4 9.1E2 6.1E+2 1.5E5 2.6E+4 2.5E1

2.9 8.9E+1 4.0E6 6.7E+1 4.0E6 1.7E+4 2.0E1 1.9E+4 1.7E1 1.6E+3 6.2E5 2.9E+4 3.4E1

3.0 1.1E+2 5.2E6 8.6E+1 5.2E6 1.7E+4 2.2E1 2.0E+4 1.9E1 1.9E+3 8.8E5 3.0E+4 3.7E1

3.5 3.9E+2 2.0E5 3.1E+2 2.0E5 2.1E+4 3.4E1 2.7E+4 3.3E1 4.2E+3 4.2E4 3.3E+4 5.0E1

4.0 1.4E+3 8.0E5 1.1E+3 8.0E5 2.3E+4 4.8E1 3.2E+4 4.8E1 6.1E+3 1.3E3 3.4E+4 6.3E1

4.5 5.1E+3 3.4E4 4.2E+3 3.4E4 2.3E+4 6.3E1 3.5E+4 6.5E1 7.4E+3 2.9E3 3.5E+4 7.7E1

5.0 1.8E+4 1.7E3 1.5E+4 1.7E3 2.4E+4 7.8E1 3.8E+4 8.3E1 8.3E+3 5.1E3 3.6E+4 9.1E1

5.5 2.8E+4 2.8E2 1.7E+4 2.8E2 2.4E+4 9.2E1 4.0E+4 1.0E+0 9.0E+3 7.9E3 3.6E+4 1.0E+0

6.0 2.8E+4 6.6E2 4.6E+3 5.5E2 2.3E+4 1.1E+0 4.1E+4 1.2E+0 9.4E+3 1.1E2 3.6E+4 1.2E+0

6.5 2.8E+4 1.1E1 1.2E+4 5.7E2 2.3E+4 1.2E+0 4.2E+4 1.4E+0 9.8E+3 1.5E2 3.5E+4 1.3E+0

7.0 2.9E+4 1.6E1 2.9E+4 6.1E2 2.2E+4 1.3E+0 4.3E+4 1.5E+0 1.0E+4 1.9E2 3.5E+4 1.4E+0

20.0 3.7E+4 3.3E+0 1.5E+4 1.4E+0 1.2E+4 3.1E+0 3.1E+4 4.4E+0 * * 1.5E+4 2.9E+0

50.0 1.9E+4 6.9E+0 5.6E+3 2.2E+0 5.5E+3 4.1E+0 1.9E+4 6.3E+0 * * 6.3E+3 3.5E+0

Table 2

p NCP3 NCP4

1.01 1.5E+1 5.2E6 1.4E+1 1.2E6 5.1E+3 4.5E2 1.6E+1 2.3E6 1.4E+1 3.3E7 5.1E+3 1.9E3

1.1 4.3E+1 3.8E5 2.7E+1 1.9E10 5.8E+3 4.9E2 4.3E+1 1.8E5 1.1E+1 5.2E8 5.6E+3 2.5E3

1.5 9.7E+2 3.7E3 2.4E+2 3.3E7 8.7E+3 7.6E2 8.3E+2 1.6E3 3.5E+0 2.0E7 7.6E+3 7.1E3

1.9 4.0E+3 2.7E2 7.2E+2 9.1E6 1.1E+4 1.1E1 3.2E+3 1.2E2 1.4E+0 1.2E7 9.3E+3 1.4E2

2 5.0E+3 3.7E2 8.6E+2 1.6E5 1.1E+4 1.2E1 4.0E+3 1.6E2 1.1E+0 9.5E8 9.7E+3 1.7E2

2.1 6.1E+3 5.0E2 1.3E+1 6.0E1 1.1E+4 1.3E1 4.9E+3 2.1E2 9.0E1 7.0E8 1.0E+4 1.9E2

2.5 1.1E+4 1.1E1 8.2E+0 6.9E1 1.2E+4 1.7E1 8.5E+3 5.1E2 4.0E1 2.5E8 1.1E+4 1.8E2

2.9 1.6E+4 1.9E1 6.9E+0 7.4E1 1.3E+4 2.1E1 1.2E+4 9.3E2 1.9E1 6.2E9 1.2E+4 1.7E2

3 1.7E+4 2.1E1 6.8E+0 7.5E1 1.3E+4 2.2E1 1.3E+4 1.0E1 1.6E1 3.6E9 1.3E+4 2.4E2

3.5 2.2E+4 3.1E1 6.6E+0 8.0E1 1.3E+4 2.7E1 1.6E+4 1.7E1 7.0E2 1.6E10 1.4E+4 5.3E2

4 2.6E+4 3.8E1 7.4E+0 8.2E1 1.3E+4 3.2E1 1.9E+4 2.4E1 4.0E2 1.1E10 1.4E+4 8.0E2

4.5 2.8E+4 4.3E1 8.9E+0 8.4E1 1.3E+4 3.6E1 2.1E+4 3.2E1 3.0E2 3.7E10 1.5E+4 1.1E1

5 3.0E+4 4.6E1 1.1E+1 8.6E1 1.3E+4 4.1E1 2.3E+4 4.0E1 3.0E2 2.5E13 1.5E+4 1.3E1

5.5 3.1E+4 4.8E1 1.4E+1 8.7E1 1.3E+4 4.5E1 2.4E+4 4.8E1 2.0E2 5.3E10 1.5E+4 1.6E1

6 3.1E+4 4.8E1 1.8E+1 8.8E1 1.2E+4 4.9E1 2.5E+4 5.6E1 2.0E2 4.5E13 1.5E+4 1.9E1

6.5 2.9E+4 4.7E1 2.4E+1 8.9E1 1.0E+4 5.4E1 2.5E+4 6.3E1 2.0E2 1.3E15 1.5E+4 2.1E1

7 2.6E+4 4.8E1 3.1E+1 9.0E1 8.2E+3 6.0E1 2.5E+4 7.1E1 2.0E2 1.5E15 1.5E+4 2.4E1

20 * * * * * * 4.6E+4 1.9E+0 9.1E+4 4.5E16 1.1E+4 6.8E1

50 * * * * * * * * * * 5.2E+3 9.7E1

Table 3

p NCP5 NCP6

1.01 2.4E+2 7.9E6 2.4E+2 7.9E6 3.6E+4 1.6E2 2.8E+1 4.0E6 2.4E+1 3.0E6 6.3E+3 5.9E2

1.1 1.6E+2 5.6E6 1.6E+2 5.6E6 3.7E+4 1.6E2 7.6E+1 2.6E5 1.4E+1 1.5E6 6.5E+3 6.9E2

1.5 3.7E+1 1.1E6 3.7E+1 1.1E6 4.1E+4 1.9E2 1.1E+3 5.6E3 5.9E+0 4.6E8 7.5E+3 8.5E2

1.9 1.1E+1 1.3E7 1.1E+1 1.3E7 4.3E+4 3.8E2 4.4E+3 4.3E2 4.6E+0 1.6E7 9.7E+3 7.1E2

2 8.7E+0 4.7E8 8.7E+0 4.7E8 4.4E+4 5.3E2 5.6E+3 6.0E2 4.7E+0 1.7E7 1.0E+4 6.2E2

2.1 7.0E+0 1.7E9 7.0E+0 1.7E9 4.4E+4 6.7E2 6.8E+3 8.0E2 * * 1.1E+4 5.4E2

2.5 3.4E+0 7.6E8 3.4E+0 7.6E8 4.7E+4 1.1E1 1.2E+4 1.9E1 * * 1.2E+4 2.8E2

2.9 2.1E+0 1.1E7 2.1E+0 1.1E7 4.9E+4 1.4E1 1.8E+4 3.5E1 * * 1.4E+4 2.1E2

3 2.0E+0 5.5E8 2.0E+0 5.5E8 4.9E+4 1.5E1 1.9E+4 4.0E1 * * 1.4E+4 2.2E2

3.5 1.4E+0 3.7E8 1.4E+0 3.7E8 5.1E+4 2.0E1 2.5E+4 6.5E1 * * 1.6E+4 3.7E2

4 1.0E+0 6.0E8 1.0E+0 6.0E8 5.2E+4 2.4E1 3.0E+4 9.2E1 * * 1.7E+4 6.2E2

4.5 8.0E1 3.1E8 8.0E1 3.1E8 5.2E+4 2.9E1 3.4E+4 1.2E+0 * * 1.8E+4 8.8E2

5 9.0E1 4.1E11 9.0E1 4.1E11 5.2E+4 3.3E1 3.8E+4 1.5E+0 * * 1.8E+4 1.1E1

5.5 7.0E1 1.2E10 7.0E1 1.2E10 5.2E+4 3.7E1 4.0E+4 1.7E+0 * * 1.9E+4 1.3E1

6 5.0E+2 8.9E16 1.0E+3 8.9E16 5.1E+4 4.0E1 3.9E+4 2.0E+0 * * 1.9E+4 1.4E1

6.5 5.0E1 8.5E11 5.0E1 8.5E11 5.1E+4 4.3E1 3.3E+4 2.2E+0 * * 1.9E+4 1.6E1

7 4.0E1 3.3E10 4.0E1 3.3E10 5.0E+4 4.7E1 4.6E+4 2.4E+0 * * 1.9E+4 1.7E1

20 1.0E1 1.5E12 1.0E1 1.5E12 3.1E+4 8.4E1 3.6E+4 6.8E+0 * * 1.4E+4 7.6E1

50 * * * * * * 1.7E+4 1.5E+1 * * 7.2E+3 1.5E+0