Journal of Computational and Applied Mathematics

(1)

Contents lists available atScienceDirect

Journal of Computational and Applied Mathematics

journal homepage:www.elsevier.com/locate/cam

A new class of neural networks for NCPs using smooth perturbations of the natural residual function

Jan Harold Alcantara, Jein-Shan Chen

^∗^,¹

Department of Mathematics, National Taiwan Normal University, Taiwan

a r t i c l e i n f o

Article history:

Received 10 February 2020

Received in revised form 4 January 2022

MSC:

37-N40 65-K10 65-K15

Keywords:

Complementarity problem Smoothing method Neural network Stability

a b s t r a c t

We present a new class of neural networks for solving nonlinear complementarity problems (NCPs) based on some family of real-valued functions (denoted by_F) that can be used to construct smooth perturbations of the level curve defined byΦNR(x,^y)=0, whereΦ_NRis the natural residual function (also called the ‘‘min’’ function). We introduce two important subclasses of_F, which deserve particular attention because of their significantly different theoretical and numerical properties. One of these subfamilies yields a smoothing function forΦ_NR, while the other subfamily only yields a smoothing curve forΦNR(x,^y)=0. We also propose a simple framework for generating functions from these subclasses. Using the smoothing approach, we build two types of neural networks and provide sufficient conditions to guarantee asymptotic and exponential stability of equilibrium solutions. Finally, we present extensive numerical experiments to validate the theoretical results and to illustrate the difference in numerical performance of functions from the two subclasses. Numerical comparisons with existing neural networks for NCPs are also demonstrated.

1. Introduction

A nonlinear complementarity problem (NCP) consists of nonlinear inequalities with nonnegativity and orthogonality conditions on multiple variables and given multivariate functions. Such problems arise in constrained optimization and equilibrium problems [1]. Moreover, applications of NCPs in operations research, engineering, and economics motivated significant research efforts in the past decades [1–3], which resulted in various numerical techniques including the merit function approach [4–6], nonsmooth Newton method [7–9], and regularization approach [10,11]. We refer the interested reader to the monograph [1] and the paper [12] for a survey and thorough discussion of solution methods for complementarity problems.

Smoothing methods belong to another class of solution methods that have been extensively used in solving complementarity problems [10,13–17]. A natural smoothing technique for the NCP is to construct a differentiable approximation of

φ

NR(s

,

^t)

:=

min

{

s

,

^t

} =

s

−

(s

−

t)+ (1.1)

using a smoothing function for the plus function. Meanwhile, Haddou and Maheux [18] recently introduced a novel smoothing approach to handle NCPs. In their approach, they utilized smooth perturbations of

φ

NR that do not necessarily

∗ Corresponding author.

E-mail addresses: 80640005s@ntnu.edu.tw(J.H. Alcantara),jschen@math.ntnu.edu.tw(J.-S. Chen).

1 The author’s work is supported by Ministry of Science and Technology, Taiwan.

(2)

correspond to smoothing functions for

φ

NR. These perturbations can be viewed as smooth approximations of the level curve

φ

NR(s

,

^t)

=

0, which are then used to obtain approximate solutions of the NCP. Unfortunately, there have been no further studies on algorithmic procedure for NCPs following this smoothing framework. Moreover, the choice of smooth perturbations as well as a procedure for controlling the perturbation parameter are some numerical issues that were left for future studies in [18].

Meanwhile, neural network (NN) approaches for complementarity problems have also been explored in [19–21] mainly because it is desirable to have a real-time solution of the NCP, which may not be attainable with the usual approaches mentioned above. Neural networks are hardware-implementable, i.e. via integrated circuits, and therefore exhibit real- time processing. Hopfield and Tank originally introduced neural networks for optimization [22,23], and since then have been used in solving linear and nonlinear programming problems and variational inequalities [24–30]. Notice, however, that the NCP is not an optimization problem. Nevertheless, the NCP can be reformulated as a smooth minimization problem by constructing a merit function usually through the use of NCP-functions [4–6,19–21]. A nonnegative merit function usually serves as an energy function, which is then used to formulate a steepest-descent dynamical system whose equilibrium solutions correspond to NCP solutions under some suitable conditions.

A neural network based on the Fischer–Burmeister (FB) function was designed to handle P₀-NCPs in [21]. In [20], these results were extended to the generalized Fischer–Burmeister (GFB) function, an NCP function that involves a parameter p

∈

(1

, ∞

). It was shown that for the latter NN, better numerical performance of the network can be achieved by choosing a larger value of p. These neural networks have good stability and convergence properties and are not very sensitive to initial conditions. Aside from FB and GFB neural networks, three classes of NNs based on the discrete generalization of

φ

NRgiven in(1.1)were recently formulated in [19]. This neural network has relatively better convergence speed than FB and GFB neural networks, but its stability requires stricter assumptions and the NN is usually more sensitive to initial conditions as compared to FB and GFB networks.

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family_F of smoothing functions. We introduce two subclasses of_Fand propose a simple method to generate functions from these subclasses. Interestingly, the aforementioned issues on how to choose the smooth perturbations of

φ

NR and how to control the decrease of the smoothing parameter can be best addressed by considering these two subclasses. Sufficient conditions to guarantee stability of the neural network are provided and are illustrated through several numerical experiments. Finally, we compare the present neural networks with the well-known FB and GFB neural networks in solving P₀- and non-P₀-NCPs.

This paper is organized as follows: In Section2, we recall the smoothing method proposed in [18] and introduce the family_F. In Section3, we discuss a simple idea on how to generate smoothing functions. We also prove a characterization result for the two subfamilies of_F introduced. In Section4, two neural networks for NCPs are presented. We provide several sufficient conditions for Lyapunov and exponential stability of the neural networks. Several numerical simulations are presented in Section5to understand how to choose a smoothing function for the NCP. A comparative analysis with FB and GFB neural networks is also shown. Conclusions and some recommendations for future studies are presented in Section6.

The notations used for this study are as follows. Rⁿdenotes the n-dimensional Euclidean space endowed with the usual inner product, and R^m^×ⁿdenotes the space of m

×

n real matrices. We let Rⁿ+and Rⁿ++denote the nonnegative and positive orthant of Rⁿ, respectively. M^T denotes the transpose of a matrix M. M_ijand M·_jwill be used to denote the (i

,

j)-entry of M and jth column of M, respectively. For M

∈

_Rⁿ^×ⁿandΛ

⊆ {

1

, . . . ,

ⁿ

}

, we denote by M_Λthe principal submatrix of M indexed byΛ(i.e. the submatrix of M corresponding to rows and columns indexed byΛ^).Λ^cdenotes the complement ofΛ. For any differentiable function f

:

_R²

→

_R,

∇

_af (s

,

^{t) and}

∇

_bf (s

,

t) means the partial derivative of f w.r.t. s and t, respectively. Given a differentiable mapping F

=

(F₁

, . . . ,

^Fm)^T

:

_Rⁿ

→

_R^m,

∇

F (x)

= [∇

F₁(x)

· · · ∇

F_m(x)

] ∈

_Rⁿ^×^mdenotes the transposed Jacobian of F at x, where

∇

F_i(x) denotes the gradient of F_iat x. Given a family of real-valued functions on Rⁿ:

{ φ

r

:

r

>

⁰

}

, we denote by

φ

0the pointwise limit lim_r↘₀

φ

r, whenever it exists.

2. Smoothing approach for NCP

In this section, we present the smoothing approach for NCP proposed by Haddou and Maheux [18] and introduce two important classes of smoothing functions. We also recall some concepts related to nonlinear mappings.

Let F

:

_Rⁿ

→

_Rⁿ be given. The nonlinear complementarity problem, which we denote by NCP(F ), is to find a point x

∈

_Rⁿsuch that

x

≥

0

,

^{F (x)}

≥

0 and

⟨

x

,

^{F (x)}

⟩ =

0

.

^(2.1)

The set of all solutions of NCP(F ) is denoted by SOL(F ), which we assume to be nonempty. We can also show the equivalence of NCP(F ) and the equation x

=

P_K(x

−

F (x)), where K

=

_Rⁿ₊and P_K denotes the projection onto K . In light of this equivalence, we see that x solves(2.1)if and only ifΦNR(x

,

^{F (x))}

=

0 whereΦ

:

_Rⁿ

×

_Rⁿ

→

_Rⁿ is a nonsmooth mapping given by

ΦNR(x

,

^y)

=

⎛

⎜

⎝

φ

NR(x₁

,

^y1)

φ

NR(x

...

_n

,

^yn)

⎞

⎟

⎠ .

2

(3)

In this paper, we will consider nonlinear monotone and P₀-functions. We recall some basic types of nonlinear mappings.

Definition 2.1. Let F

=

(F₁

, . . . ,

^Fn)^T

:

_Rⁿ

→

_Rⁿ. Then, the mapping F is said to be (i) monotone if

⟨

x

−

y

,

^{F (x)}

−

F (y)

⟩ ≥

0 for all x

,

^y

∈

_Rⁿ.

(ii) strictly monotone if

⟨

x

−

y

,

^{F (x)}

−

F (y)

⟩ >

0 for all x

,

^y

∈

_Rⁿand x

̸=

y.

(iii) strongly monotone with modulus

µ >

^{0 if}

⟨

x

−

y

,

^{F (x)}

−

F (y)

⟩ ≥ µ∥

^x

−

y

∥

²for all x

,

^y

∈

_Rⁿ. (iv) a P₀-function if max

1≤i≤n xi̸=_yi

(x_i

−

y_i)(F_i(x)

−

F_i(y))

≥

0 for all x

,

^y

∈

_Rⁿand x

̸=

y.

(v) a P-function if max

1≤_i≤_n(xi

−

yi)(Fi(x)

−

Fi(y))

>

0 for all x

,

^y

∈

_Rⁿand x

̸=

y.

(vi) a uniform P-function with modulus

κ >

^{0 if max}

1≤_i≤_n(x_i

−

y_i)(F_i(x)

−

F_i(y))

≥ κ∥

^x

−

y

∥

², for all x

,

^y

∈

_Rⁿ.

A matrix is called a P₀-matrix (resp. P-matrix) if all its principals minors are nonnegative (resp. positive). Note that F is a P₀-function if and only if

∇

F (x) is a P₀-matrix for all x

∈

_Rⁿ. If

∇

F (x) is a P-matrix for all x

∈

_Rⁿ, then F is a P-function.

However, the converse is not necessarily true.

We now recall the smoothing approach described in [18]. First, we consider a continuously differentiable strictly increasing function

θ

^{such that}

t→−∞lim

θ

^(t)

= −∞ , θ

⁽⁰⁾

=

0 and lim

t→+∞

θ

^(t)

=

1 (2.2)

We also impose a condition that

θ

(t) should approach 1 fast enough but with some uniformity for large values of t. This condition is precisely defined as follows.

Definition 2.2. Let a

∈

(0

,

1) and suppose

θ

(t) is a strictly increasing C¹function such that

θ

⁽⁰⁾

=

0 and lim_t→∞

θ

^(t)

=

1.

We say that

θ

satisfies condition (H_a) if there exists t_a

>

0 such that 1

2

+

¹

2

θ

^(at)

≤ θ

^(t)

∀

t

≥

t_a

.

We say that

θ

belongs to class_Fif it satisfies condition (H_a) for some a

∈

(0

,

^1).

For instance, the following functions were considered in [18]:

θ

⁽¹⁾^(t)

= {

_t

t+₁ if t

≥

0

t if t

<

⁰ ^and

θ

⁽²⁾^(t)

=

1

−

e⁻^t

.

^(2.3)

θ

⁽¹⁾satisfies (H_a) for all a

∈

(0

,

¹

/

^{2) while}

θ

⁽²⁾ satisfies (H_a) for all a

∈

(0

,

^1).

For each r

>

0, we define the function

φ

r

:

_R²

→

_{R by}

φ

r(s

,

^t)

:=

r

θ

⁻¹

( θ (

_s

r

)

+ θ (

t

r

)

−

1

)

.

^(2.4)

Note that strict monotonicity of

θ

was imposed to guarantee its invertibility as required in(2.4). We summarize here some important facts proved in [18].

Lemma 2.1. Let

θ

be a strictly increasing C¹function satisfying conditions(2.2). Then, the following holds.

(i)

φ

r(s

,

^t)

≤

min

{

s

,

^t

}

for all s

,

^t

∈

_{R and r}

>

^0.

(ii) If

θ ∈

F, then lim

r↘₀

φ

r(s

,

^t)

=

0

⇐⇒

min

{

s

,

^t

} =

0

∀

s

,

^t

∈

_R

.

(iii) If

θ

satisfies condition (H_a) for all a

∈

(0

,

^{1), then}

lim

r↘₀

φ

r(s

,

^t)

=

min

{

s

,

^t

} ∀

s

,

^t

∈

_R++

.

Let G_r

:

_Rⁿ

×

_Rⁿ

→

_Rⁿbe given by

G_r(x

,

^y)

=

⎛

⎜

⎝

φ

r(x₁

,

^y1)

φ

r(x_n

... ,

^yn)

⎞

⎟

⎠ .

^(2.5)

3

(4)

ByLemma 2.1(iii), G_r is a smoothing function forΦNR over Rⁿ++

×

_Rⁿ₊₊ if

θ

satisfies (H_a) for all a

∈

(0

,

1). In contrast, we may not obtain a smoothing function forΦNR if there exists a

∈

(0

,

1) for which (H_a) does not hold. For instance, when

θ = θ

⁽¹⁾, one can verify that lim_r↘₀G_r(x

,

^y)

<

ΦNR(x

,

y) (see also [18]). In view of these contrasting properties, we introduce two subclasses of_F.

Definition 2.3. Let

θ ∈

F. We say that

θ

belongs to class_F₁if it satisfies condition (H_a) for all a

∈

(0

,

1). Otherwise, we say that

θ

^{belongs to}F2.

In Section3, we will prove that the result ofLemma 2.1(iii) holds only for functions in_F₁. Despite not obtaining a smoothing function forΦNRwhen

θ ∈

F2,Lemma 2.1(ii) is a very useful result to apply Haddou and Maheux’s smoothing strategy. To see this, define the functions

Φr(x)

:=

G_r(x

,

^{F (x))} ^and Φ0(x)

:=

lim

r↘₀Φr(x)

,

provided that the limit exists. Then byLemma 2.1(ii),(2.1)is equivalent to solving the system of equationsΦ0(x)

=

0.

Moreover, it is shown in [18] that whenever F is a P₀-function,Φr is a P-function for any r

>

^{0 while}Φ0is a P₀-function whenever it exists. Consequently, one may use Gowda and Tawhid’s theory for P₀-equations given in [31] by viewingΦ0

as a continuous perturbation ofΦr. Indeed,Lemma 2.3 which was given in [18] is an easy consequence of Theorem 4 of [31]. Before stating the convergence result, we make some assumptions.

Assumption 1. SOL(F ) is nonempty and bounded,

θ ∈

F andΦ0exists.

Some conditions which will guarantee the existence ofΦ0are given in the following result, whose proofs can be found in [18].

Lemma 2.2. Let s

,

^t

>

^{0. Then,}

φ

0(s

,

^t)

:=

lim

r↘0

φ

r(s

,

^t) ^exists ^and

φ

0(s

,

^t)

≤

min

{

s

,

^t

}

if any of the following condition holds:

(i) There exists

ε >

0 such that_∂^∂_r

φ

r(s

,

^t)

≤

0 for all r

∈

(0

, ε

^);

(ii) V

:=

(

− ψ

^′

◦ ψ

⁻¹⁾

× ψ

⁻¹ is locally subadditive at 0⁺; i.e. there exists

η >

0 such that for all 0

< α, β, α + β < η

^{, we} have

V (

α + β

⁾

≤

V (

α

⁾

+

V (

β

⁾

,

where

ψ :=

¹

− θ

^.

Lemma 2.3. SupposeAssumption 1holds and that F is a P₀-function.

(i) There exists an

ˆ

r

>

0 such that for any r

∈

(0

, ˆ

^r),Φr(x)

=

0 has a unique solution x^(r)and the mapping r

↦→

x^(r)is continuous on (0

, ˆ

^r).

(ii) lim

r↘₀ inf

x∗∈_{SOL(F )}

∥

x^(r)

−

x^∗

∥ =

0.

Lemma 2.3(i) implies the well-posedness of the equationΦr(x)

=

0, whileLemma 2.3(ii) suggests the following typical algorithm employed in smoothing techniques, which is the one presented in [18].

Algorithm 1.

Let

ε >

^{0, and x}⁰

∈

_Rⁿ₊₊. Set r⁰

=

max

{

1

, √

max₁≤_i≤_n

|

x⁰_iF_i(x⁰)

|}

and k

=

0.

For k

=

0

,

¹

, . . .

^{until max}1≤_i≤_n

|

x^k_iF_i(x^k)

| ≤ ε

^{, do}

1. SolveΦr^k(x)

=

G_r_k(x

,

^{F (x))}

=

0, where G_r(

· , ·

) is defined in(2.5), to get a solution x^k. 2. Set r^k⁺¹

=

min

{

0

.

^1r^k

,

^(r^k⁾²

, √

max₁≤_i≤_n

|

x^k_iF_i(x^k)

|}

.

We remark that if x^(r)solvesΦr(x)

=

0, then x^(r)and F (x^(r)) are both in Rⁿ++. 3. Designing smoothing functions

In Section2, we introduced two subclasses of_Fbased on whether or not condition (H_a) is satisfied for all a

∈

(0

,

^1).

In this section, we discuss some important remarks and results related to condition (H_a). Moreover, we present a simple and intuitive way to generate functions

θ

^fromF. Under some suitable assumptions, the construction way also provides information on the classification of

θ

, i.e. whether

θ

^{belongs to}F1or_F₂.

4

(5)

We consider first a function

θ

^satisfying^(2.2)restricted to R⁺. Observe that there is a one-to-one correspondence between these functions and probability density functions. Indeed, if p(t)

:= θ

^′(t), then p(t)

≥

0 and

∫

∞ 0

p(t) dt

=

lim

t→∞

∫

t 0

θ

^′⁽

τ

^{) d}

τ =

^lim

t→∞(

θ

^(t)

− θ

⁽⁰⁾⁾

=

1

.

Hence, the function

θ

corresponds to a probability density function on R⁺. Conversely, one natural way to generate

θ

^is to consider a probability density function p

:

_R+

→

_R+and taking

θ

^(t)

=

∫

t 0

p(

τ

^{) d}

τ

for all t

∈ [

0

, ∞

). This construction way was also mentioned in [32].

Another natural and intuitive way to generate

θ

with the desired asymptotic behavior at infinity is to consider a differential equation with an equilibrium solution at 1. Consider for instance the separable autonomous differential equation given by

d

θ

dt

=

f (

θ

⁾

, θ

⁽⁰⁾

=

0

.

^(3.1)

Throughout the paper, the following are our assumptions on f : Assumption 2.

(i) f is C¹on an open interval containing

[

0

,

¹

]

. (ii) f

>

^{0 on}

[

0

,

1) and f (1)

=

0.

Assumption 2(i) is important in ensuring that the initial-value problem(3.1)has a unique solution. On the other hand, Assumption 2(ii) is required to obtain an increasing function

θ

(t) which approaches 1 as t

→ ∞

.

Proposition 3.1. Let f be any function satisfyingAssumption 2. Then the initial-value problem(3.1)has a unique solution which is defined for all t

≥

0 and lim_t→∞

θ

^(t)

=

1.

Proof. The existence and uniqueness of solution follows fromAssumption 2(i) (see [33]). Let

[

0

,

T ) be the maximal interval of existence of the solution

θ

^{(t) of}^(3.1).

Since f (

θ

⁾

>

^{0 for all}

θ ∈

⁽⁰

,

^1),

θ

(t) is an increasing function. Thus, we may let L

=

lim_t→∞

θ

(t). Note also that the constant function 1 is a solution of the differential equation^d_dt^θ

=

f (

θ

). Since solution through a point is unique,

θ

^(t)

<

¹ for all t

∈ [

0

,

T ). Consequently, L

≤

1 and

{ θ

^(t)

:

t

∈ [

0

,

^{T )}

} ⊆ [

0

,

¹

]

. Hence,

θ

(t) is defined for all t

≥

0 (see [33]).

Finally, note that from(3.1), we have

θ

^(t)

=

∫

t 0

f (

θ

⁽

τ

^{)) d}

τ.

Since

θ

^(t)

→

L, we conclude that

∫

∞

0 f (

θ

⁽

τ

^{)) d}

τ

is convergent. It follows that f (L)

=

lim

t→∞f (

θ

^(t))

=

0

.

ByAssumption 2(ii), we conclude that L

=

1, as desired. □

It was mentioned in the preceding section that satisfying condition (H_a) is very important to guarantee the applicability of the smoothing approach. The benefit of constructing

θ

^using^(3.1)is that condition (H_a) can be easily deduced if f has a specific form. We describe this in the following theorem.

Theorem 3.1. Letf be a continuously differentiable function on an open interval containing

¯ [

0

,

¹

]

such that_f

¯ >

^{0 on}

[

0

,

¹

]

. Suppose that f takes the form f (

θ

⁾

= ¯

f (

θ

⁾⁽¹

− θ

⁾^k^{, where k}

≥

1. If

θ

(t) denotes the unique solution of(3.1), then

θ ∈

F. In particular,

(i) If k

=

1, then

θ ∈

F1.

(ii) If k

>

^{1, then}

θ ∈

F2. Moreover, condition (H_a) is satisfied when a

∈ (

0

,

₂k¹−1

)

and is not satisfied when a

∈ (

1

2^k−1

,

¹

)

. Before we proveTheorem 3.1, we need the following lemma. This lemma will also be useful later inTheorems 4.3and 4.4.

Lemma 3.1. Under the hypothesis ofTheorem 3.1, we have

tlim→∞

1

− θ

^(t) 1

− θ

^(at)

=

{

0 if k

=

1 a

1

k−1 if k

>

¹

∀

a

∈

(0

,

¹⁾

.

5

(6)

Proof. Let a

∈

(0

,

1) and define

ψ

^(t)

:=

1

− θ

(t). Note that d

dt

ψ

^(t)

= −

^d

θ

dt

= −¯

f (

θ

^(t))(1

− θ

^(t))^k

= −¯

f (

θ

^(t))(

ψ

^(t))^k

.

^(3.2) Suppose first that k

=

1. Since

θ

⁽⁰⁾

=

0, then

ψ

⁽⁰⁾

=

1. Moreover, we have from(3.2)that

1

− θ

^(t)

= ψ

^(t)

=

exp

(

−

∫

t 0

¯

f (

θ

⁽

τ

^{)) d}

τ )

.

Consequently, we have for any a

∈

(0

,

^{1) that}

1

− θ

^(t) 1

− θ

^(at)

=

exp

(

− ∫

t

0_{f (}

¯ θ

⁽

τ

^{)) d}

τ )

exp

(

− ∫

at

0 _{f (}

¯ θ

⁽

τ

^{)) d}

τ ) =

exp

(

−

∫

t at

f (

¯ θ

⁽

τ

^{)) d}

τ )

(3.3)

Since_{f (}

¯ θ

⁾

>

^{0 in}

[

0

,

¹

]

, there exists m

>

0 such that_{f (}

¯ θ

⁾

>

^{m for all}

θ ∈ [

⁰

,

¹

]

. Continuing from(3.3), 1

− θ

^(t)

1

− θ

^(at)

≤

exp

(−

^m(1

−

a)t

) ∀

t

≥

0

.

Letting t

→ ∞

gives the desired limit for the first case.

Now, suppose that k

>

^{1. From}(3.2), we have (1

− θ

^(t))¹⁻^k

=

(

ψ

^(t))¹⁻^k

=

1

+

(k

−

1)

∫

t 0

¯

f (

θ

⁽

τ

^{)) d}

τ.

Note that due to_{f (1)}

¯ ̸=

0, we have

∫

∞

0 _{f (}

¯ θ

⁽

τ

^{)) d}

τ = ∞

. Then, this leads to

tlim→∞

(

1

− θ

^(t) 1

− θ

^(at)

)

k−₁

=

lim

t→∞

1

+

(k

−

1)

∫

at

0 _{f (}

¯ θ

⁽

τ

^{)) d}

τ

1

+

(k

−

1)

∫

t

0

¯

_{f (}

θ

⁽

τ

^{)) d}

τ =

lim

t→∞

a

¯

_{f (}

θ

^(at))

¯

f (

θ

^(t))

=

a

,

where the last equality follows from the fact that_{f (1)}

¯ ̸=

0 and

θ

^(t)

→

1 byProposition 3.1. □

Proof ofTheorem 3.1. Fix a

∈

(0

,

1) and define h_a(t)

=

¹₂

+

¹₂

θ

^(at)

− θ

(t) for all t

≥

0. Differentiating h_a, we obtain h^′_a(t)

=

^a

2

θ

^′^(at)

− θ

^′^(t)

=

^a

2f (

θ

^(at))

−

f (

θ

^(t))

.

^(3.4)

Since_{f (1)}

¯ ̸=

0 and

θ

^(t)

→

1, we have fromLemma 3.1that lim

t→∞

f (

θ

^(t)) f (

θ

^(at))

=

lim

t→∞

f (

¯ θ

^(t))(1

− θ

^(t))^k f (

¯ θ

^(at))(1

− θ

^(at))^k

=

{

0 if k

=

1 a

k

k−1 if k

>

¹

.

^(3.5)

Suppose now that k

=

1. By(3.5), we can find t_a

>

0 such that f (

θ

^(t))

<

^a₂^{f (}

θ

(at)) for all t

>

^ta. From(3.4), we see that h^′_a(t)

>

0 for all t

>

^taand therefore h_a(t) is strictly increasing on (t_a

, ∞

). But clearly, h_a(t) approaches 0 as t

→ ∞

. Thus, h_a(t)

<

0 for all t

>

^ta. That is,

θ

(t) satisfies condition (H_a) for all a

∈

(0

,

^1).

On the other hand, suppose k

>

^{1. If 0}

<

^a

<

₂k¹−1, then a

2

−

a^k⁻^k¹

=

a

(

1

2

−

a^k⁻¹¹

)

>

⁰

.

Then, by(3.5), we can find t_a

>

0 such that _{f (}^{f (}_θ^θ_(at))^(t))

<

^a₂ ^{for all t}

>

^ta. As in the preceding case,

θ

(t) satisfies condition (H_a) for all a

∈

(

0

,

₂k¹−1

)

. If ₂_k¹−1

<

^a

<

^{1, then a}^k⁻^k¹

−

^a₂

>

0 and so there exists ta

>

0 for which _{f (}^{f (}_θ^θ_(at))^(t))

>

^a₂ ^{for all t}

>

^ta. It follows that h_a(t)

>

^{0 over (t}a

, ∞

). This completes the proof. _□

Remark 3.1. Given a function f satisfyingAssumption 2, we may applyTheorem 3.1provided that there exists k

≥

1 such that L

:=

lim_θ→₁f (

θ

⁾⁽¹

− θ

⁾⁻^k

̸=

0. In this case, we define

¯

_{f (}

θ

⁾

:=

f (

θ

⁾⁽¹

− θ

⁾⁻^k^for

θ ∈ [

⁰

,

1) and define_{f (1)}

¯ :=

L.

We then obtain the desired factorization f (

θ

⁾

= ¯

f (

θ

⁾⁽¹

− θ

⁾^k^.

In general, this factorization is not always achievable. To see this, consider the function g(t)

=

t³

(

1

+

sin1 t

)

+

t(1

−

cos t)

.

It is easy to verify that g is continuously differentiable on R. Moreover, since the first term of g is nonnegative on (0

,

¹

]

while the second term is greater than 0 on (0

,

¹

]

, then g is strictly positive on (0

,

¹

]

. In addition,

lim

t→₀₊

g(t)

t^k

=

0

∀

k

∈ [

1

,

³⁾

,

6

(7)

and the above limit does not exist for k

≥

3. Defining f (

θ

⁾

:=

g(1

− θ

), we see that f satisfiesAssumption 2. But, there is no k

≥

1 such that lim_θ→₁₋f (

θ

⁾⁽¹

− θ

⁾⁻^kexists and is nonzero.

The following is an easy consequence ofTheorem 3.1and the above remark.

Corollary 3.1. If f^′(1)

<

^{0, then}

θ ∈

F1.

Example 3.1. We generate five functions from(3.1), two of which are the ones used in [18] given by Eq.(2.3). It can be easily verified that V given inLemma 2.2is locally subadditive at 0⁺for each of the functions below.

The parameter k has a significant role in numerical simulations. Notice that (1

− θ

⁾^kdecreases in value as k increases when

θ ∈

⁽⁰

,

1). In view of(3.1)and the form of f given inTheorem 3.1, k controls the rate of growth of the function

θ

^(t).

In Section5, it is shown that functions with faster growth rate yield faster convergence time for the neural networks.

However, the rate of increase of

θ

should not be very fast so as to avoid ill-conditioning. Hence, the parameter k is useful when we take into account the conditioning of the problem. That is, a higher value of k may be adapted when the problem is ill-conditioned.

We now give an example of a function

θ

that does not belong to_F. Example 3.2. Define

θ

^(t)

=

²

π

∫

t 0

sin²

τ

² ^d

τ

^{. Indeed,}

θ

is a strictly increasing continuously differentiable function such that

θ

⁽⁰⁾

=

0 and

θ →

^{1 as t}

→ ∞

. The function h_adefined in the proof ofTheorem 3.1oscillates between positive and negative values of t, and therefore does not satisfy condition (H_a) for any a

∈

(0

,

^1).

Remark 3.2. One can observe that class_F₁functions can be viewed, in some sense, as the limit case of class_F₂functions.

First, notice that inLemma 3.1, the zero limit obtained for class_F₁function can be viewed as the limit of a^k⁻¹¹ as k

↘

1, where a^k⁻¹¹ is the calculated limit for class_F₂function. InTheorem 3.1, we see that for

θ ∈

F2, condition (H_a) is satisfied only for a

∈

I_k

:=

(0

,

²¹⁻^k). Note that I_k

↗

(0

,

^{1) as k}

↘

1 and that condition (H_a) holds for all a

∈

(0

,

1) for class_F₁ functions.

Remark 3.3. We mention some remarks regarding the definition of

θ

^{over (}

−∞ ,

0). In order for G_r given by(2.5)to be smooth over Rⁿ

×

_Rⁿfor any r

>

0, we must use a differentiable extension of

θ

^{for t}

<

0. In this paper, we let

θ

^(t)

=

tf (0) for t

<

0 in this paper, similar to the construction of

θ

⁽¹⁾ ⁱⁿ^(2.3).

We conclude this section with the following proposition which implies that the limit in Lemma 2.1(iii) is in fact a characterization of functions in_F₁. That is, a smoothing function of

φ

NR over the positive orthant is obtained if and only if

θ ∈

F1.

Proposition 3.2. Suppose

θ ∈

F2and leta

¯ ∈

inf S, where S

= {

a

∈

(0

,

¹⁾

|

condition (H_a) does not hold

} .

Let P¯_a

= {

(s

,

^t)

∈

_R²₊₊

: ¯

as

<

^t

<

_a^s¯

}

. Then, we have

lim

r↘₀

φ

r(s

,

^t)

<

^min

{

s

,

^t

} ∀

(s

,

^t)

∈

P¯_a

,

whenever the limit exists.

Proof. We note first that (a

¯ ,

¹⁾

⊆

S. Indeed, the set S is nonempty and

¯

a

∈

(0

,

^{1) since}

θ ∈

F2. Ifa

˜ ∈

S, for any u

>

^0,

θ

⁽

v

⁾

<

¹₂

+

¹₂

θ

⁽a

˜ v

^{) for some}

v ≥

^{u. Since}

θ

is increasing, we see that condition (H_a) also does not hold for any a

∈

(

˜

a

,

^1).

Now, fix (s

,

^t)

∈

P_a¯ and suppose s

=

min

{

s

,

^t

}

. Given any a

∈

(a

¯ ,

1), then by the above note, there exists a sequence

{

rn

}

such that rn

↘

0 and

θ (

t

rn

)

<

¹ 2

+

¹

2

θ (

at

rn

) .

Since

θ

^and

θ

⁻¹are increasing and s

≤

t, the above inequality yields

φ

rn(s

,

^t)

=

r_n

θ

⁻¹

( θ

(

s r_n

) + θ

(

t r_n

)

−

1

)

≤

r_n

θ

⁻¹

(

2

θ (

t

r_n

)

−

1

)

<

^at

Letting n

→ ∞

, we obtain lim_n→∞

φ

rn(s

,

^t)

≤

at for any a

∈

(a

¯ ,

1). Since (s

,

^t)

∈

P_a¯, then s

> ¯

at and by letting a

↘ ¯

a, we obtain

lim

n→∞

φ

rn(s

,

^t)

≤ ¯

at

<

^s

=

min

{

s

,

^t

}

which is the desired result. □

It is interesting to note that as inRemark 3.2, class_F₁can again be viewed as the limit case of class_F₂whena

¯ ↘

0.

More precisely, the limit inLemma 2.1(iii) holds on

⋃

a¯∈₍₀,¹⁾P_a¯

=

_R²₊₊when

θ ∈

F1.

7

(8)

4. Smoothed neural networks

In this section, we present two gradient dynamical systems to solve the NCP (2.1) using the smoothing approach presented in Section2. Similar to the approach used in [19,21], the stability analysis of our neural networks relies on the use of Lyapunov functions. For more details, we refer the reader to [33,34].

4.1. The first neural network

We consider first a neural network which can be used to obtain approximate solutions of NCP(F ). FromLemma 2.3, solutions of(2.1)can be obtained by successively solving for decreasing values of r the equationΦr(x)

=

0, which is exactly the motivation ofAlgorithm 1. Note that solving the aforementioned equation, if a solution exists, is equivalent to solving the smooth minimization problem

min

x∈RⁿΨr(x)

:=

¹

2

∥

_Φ_r(x)

∥

²

.

^(4.1)

This motivates the steepest descent-based neural network dx

dt

= − ρ∇

Ψr(x)

,

^x(0)

=

x₀

,

^(NN1)

where

ρ >

0 is a time-scaling parameter and r

>

0 is a sufficiently small fixed number. Consequently, neural network (NN1)can be used to deal with Step 1 ofAlgorithm 1.

Observe that a solution of(4.1)is an equilibrium point of(NN1). However, the converse is not necessarily true. We are interested on conditions that can be imposed on F so that an equilibrium point of the neural network(NN1)is also an optimal solution of(4.1), and consequently an approximated solution of(2.1). To this end, we establish first an important property of

∇

_Φ_r.

Theorem 4.1. Let F be a P₀-function, r

>

0 and suppose that

θ

^′^(u)

̸=

0 for all u

∈

_{R. Then}

∇

_Φ_r(x) is a nonsingular for any x

∈

_Rⁿand r

>

^0.

Proof. First, we note that

∇

_a

φ

r(s

,

^t)

= ∂

∂

^s

φ

r(s

,

^t)

=

(

θ

⁻¹⁾^′

(

θ (

_s r

)

+ θ (

t

r

)

−

1

)

θ

^′

(

_s r

)

= θ

^′

(

_s

r

) θ

^′

(

θ

⁻¹

(

θ (

^s_r

) + θ (

_r^t

) −

₁

))

^(4.2)

>

⁰

,

where we used the fact that

θ

^′⁽

θ

⁻¹^(u))

·

(

θ

⁻¹⁾^′^(u)

=

1. By symmetry, we also have

∇

_b

φ

r(s

,

^t)

:=

_∂^∂_t

φ

r(s

,

^t)

>

^0.

Let x

∈

_Rⁿand r

>

0, and denote by A_r(x) and B_r(x) the n

×

n diagonal matrices such that (A_r(x))_ii

= ∇

_a

φ

r(x_i

,

^Fi(x)) and (B_r(x))_ii

= ∇

_b

φ

r(x_i

,

^Fi(x))

.

Note that A_r(x) and B_r(x) are invertible since

∇

_a

φ

r(x_i

,

^Fi(x)) and

∇

_b

φ

r(x_i

,

^Fi(x)) are both strictly positive for all i, as noted above. Furthermore, we have by Chain Rule that

∇

_Φ_r(x)

=

A_r(x)

+ ∇

F (x)

·

B_r(x)

=

(D_r(x)

+ ∇

F (x))

·

B_r(x)

,

^(4.3) where D_r(x)

=

A_r(x)B_r(x)⁻¹. Since D_r(x) is a diagonal matrix,

det(D_r(x)

+ ∇

F (x))

= ∑

Λ⊆{¹,...,ⁿ}

det(D_Λ) det(

∇

F (x)_Λ^c) (4.4)

where D_Λdenotes the principal submatrix of D_r(x) (see, for instance, Chapter 2 of [35]). Each term in the above summation is nonnegative since F is a P₀-function and the diagonal entries of D_r(x) are positive. In particular, the term corresponding to

α = {

¹

, . . . ,

ⁿ

}

is precisely det(D_r(x))

>

^{0. Thus, D}r(x)

+ ∇

F (x) is nonsingular. Since B_r(x) is also nonsingular, we conclude from(4.3)that

∇

_Φ_r(x) is nonsingular. □

Observe that the hypothesis on

θ

of the above theorem holds if it is generated from(3.1)using Assumption 2and Remark 3.3. We now state some important consequences of the above result.

Corollary 4.1. Under the hypotheses ofTheorem 4.1, the following holds:

(i) Every equilibrium point of(NN1)is an optimal solution of(4.1).

(ii) An isolated equilibrium point of(NN1)is exponentially stable.

8