Neural network based on systematically generated smoothing functions for absolute value equation

(1)

https://doi.org/10.1007/s12190-019-01262-1 O R I G I N A L R E S E A R C H

Neural network based on systematically generated smoothing functions for absolute value equation

B. Saheya¹· Chieu Thanh Nguyen²· Jein-Shan Chen²

Received: 4 January 2019 / Published online: 20 April 2019

Abstract

In this paper, we summarize several systematic ways of constructing smoothing functions and illustrate eight smoothing functions accordingly. Then, based on these systematically generated smoothing functions, a unified neural network model is proposed for solving absolute value equation. The issues regarding the equilibrium point, the trajectory, and the stability properties of the neural network are addressed. More- over, numerical experiments with comparison are presented, which suggests what kind of smoothing functions work well along with the neural network approach.

Keywords Absolute value equations· Neural network · Smoothing function Mathematics Subject Classifications 65K10· 93B40 · 26D07

1 Introduction

The main target that we tackle with in this paper is the so-called absolute value equation (AVE for short), whose mathematical format is as below. In fact, the original standard AVE is described by

B. Saheya: The author’s work is supported by National Key R&D Program of China (Award No.:

2017YFC1405605) and Foundation of Inner Mongolia Normal University (Award No.: 2017YJRC003) J.-S. Chen: The author’s work is supported by Ministry of Science and Technology, Taiwan.

B

Jein-Shan Chen [email protected] B. Saheya

[email protected] Chieu Thanh Nguyen [email protected]

1 College of Mathematical Science, Inner Mongolia Normal University, Hohhot 010022, Inner Mongolia, People’s Republic of China

2 Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

(2)

Ax− |x| = b, (1) where A∈ Rⁿ^×nand b∈ Rⁿ. Here|x| means the componentwise absolute value of vector x∈ Rⁿ. Another generalized absolute value equation is in the form of

Ax + B|x| = b, (2)

where B ∈ Rⁿ^×n, B = O. When B = −I , I is the identity matrix, the AVE (2) reduces to the special form (1).

The AVE (1) was first introduced by Rohn in [29] and recently has been investigated by many researchers, for example, Hu and Huang [7], Saheya and Chen [32], Jiang and Zhang [9], Ketabchi and Moosaei [10], Mangasarian [14–18], Mangasarian and Meyer [21], Prokopyev [25], and Rohn [31]. In particular, Mangasarian and Meyer [21] show that the AVE (1) is equivalent to the bilinear program, the generalized LCP (linear complementarity problem), and the standard LCP provided 1 is not an eigenvalue of A. With these equivalent reformulations, they also prove that the AVE (1) is NP-hard in its general form and provide existence results. Prokopyev [25] further obtain an improvement indicating that the AVE (1) can be equivalently recast as (a larger) LCP without any assumption on A and B, and also provides a relationship with mixed integer programming. It is known that, if solvable, the AVE (1) can have either unique solution or multiple (e.g., exponentially many) solutions. Indeed, various sufficiency conditions on solvability and non-solvability of the AVE (1) with unique and multiple solutions are discussed in [21,25,30]. Some variants of the AVEs including the absolute value equation associated with second-order cone (SOCAVE) and the absolute value programs, are investigated in [8] and [38], respectively. Furthermore, some other type of absolute value equation, an extension of the AVE (2), is considered [8,19,20].

Roughly, there have three approaches for dealing with the AVEs (1)–(2). The first one is reformulating the AVEs (1)–(2) as complementarity problem and then solve it accordingly. The second one is to recast the AVEs (1)–(2) as a system of nonsmooth equations and then tackle with the nonsmooth equations by applying nonsmooth New- ton algorithm [26] or smoothing Newton algorithm [27]. The third one is applying the neural network approach. In this paper, we follow the third idea for solving the AVEs (1)–(2). Inspired by our another recent work [24], we will combine various smoothing functions with the neural network approach. Different from [24,32], the smoothing functions studied in this paper are not only constructed from one way, they are generated by several systematic ways. Accordingly, this one can be viewed as a follow-up of [24,32].

Now, we quickly go over neural network approach which is different from traditional optimization methods. To consider this approach, the main reason lies on the real-time solutions of optimization problems, which are sometimes required in prac- tice. It is well known that the neural networks approach is a very promising approach to solving the real-time optimization problem. In general, the neural networks can be implemented using integrated circuits and were first introduced in the 1980s by Hop- field and Tank [6,34] for optimization problems. Since then, significant research results have been achieved for various optimization problems, including linear programming [39], quadratic programming [1], linear complementarity problems [12], nonlinear

(3)

complementarity problem [13] and nonlinear programming [5]. In general, the essence of neural network approach is to construct a nonnegative energy function and establish a dynamic system that represents an artificial neural network. A first order differential equation represents the dynamic system. Furthermore, it is expected that the dynamic system will converge to its static state (or an equilibrium point), which corresponds to the solution for the underlying optimization problem, starting from an initial point.

Although similar idea was employed by Wang, Yu and Guo in [36], only one smoothing function was studied therein. In this paper, we present systematical ways about how to construct smoothing functions for AVE (1) and illustrate eight smoothing functions accordingly. After that, we design a gradient descent neural network model by using these eight different smoothing functions. We not only discuss the stability of the neural networks, but also give numerical comparison for these smoothing functions.

In fact, the new upshot of this paper lies on the numerical comparison, which suggest what kind of smoothing functions work well along with the neural network approach.

2 Preliminaries

By looking into the mathematical format of the aforementioned AVEs, it is observed that the absolute value function|x| is the key component. Indeed, the absolute value function also plays an important role in a lot of applications, like machine learning and image processing, etc. In particular, the absolute value function|x| is not differentiable at x = 0, which causes limits in analysis and application. To conquer this hurdle, researchers consider smoothing methods and construct smoothing functions for it. We summarize all possible ways to construct smoothing functions for|x| as below. For more details, please refer to [2,4,11,23,28,35].

1. Smoothing by the convex conjugate

Let X be a real topological vector space, and let X^∗be the dual space to X . For any function f : dom f → R, its convex conjugate f^∗: (dom f )^∗→ R is defined in terms of the supremum by

f^∗(y) := sup

x∈dom f

x^Ty− f (x) .

In light of this, one can build up smooth approximation of f , denoted by f_μ, by adding strongly convex component to its dual g:= f^∗, namely,

f_μ(x) = sup

z∈domg

z^Tx− g(x) − μd(z)

= (g + μd)^∗(x),

for some 1-strongly convex and continuous function d(·) (called proximity function).

Here, d(·) is 1-strongly convex which satisfies

d((1 − t)x + ty) ≤ (1 − t)d(x) + td(y) −1

2t(1 − t)x − y²,

for all x, y and t ∈ (0, 1). Note that |x| = sup_|z|≤1zx. If we take d(z) := z²/2, then the constructed smoothing function via conjugation leads to

(4)

φ1(μ, x) = sup

|z|≤1

zx−μ

2z²

=

⎧⎨

⎩

x²

2μ, if |x| ≤ μ,

|x| −μ

2, otherwise. (3)

which is the traditional Huber function.

It is also possible to consider another expression:

|x| = sup

z₁+z2=1 z₁,z20

(z1− z2)x.

Under this case, if we take d(z) := z1log z1+ z2log z2+ log 2, the constructed smoothing function by conjugation becomes

φ2(μ, x) = μ log

cosh

x μ

, (4)

where cosh(x) := e^x+ e^−x

2 . Alternatively, choosing d(y) := 1 −

1− y² gives another smoothing function:

φ3(μ, x) = sup

−1≤y≤1

x y+ μ

1− y²− μ

=

x²+ μ²− μ. (5)

2. The Moreau proximal smoothing

Suppose that E is an Euclidean space and f : E → (−∞, ∞] is a closed and proper convex function. One natural tool for generating an approximate smoothing function is through the use of the so-called proximal map introduced by Moreau [22].

The Moreau proximal approximation yields a family of approximations{ f_μ^px}_μ>0as below:

f_μ^px(x) = inf

u∈E

f(u) + 1

2μu − x²

. (6)

It is known that the Moreau proximal approximation f_μ^px(x) is convex continuous, finite-valued, and differentiable with gradient ∇ fμ^px which is Lipschitz continuous with constant _μ¹, see [22]. When applying the Moreau proximal smoothing way to construct the smoothing function for the absolute value function|x|, it also yields the Huber smoothing functionφ1(μ, x) by using the Moreau envelope [2].

3. Nesterov’s smoothing

There is a class of nonsmooth convex functions considered in [23], which is given by

q(x) = max{u, Ax − φ(u) | u ∈ Q}, x ∈ E,

whereE, V are finite dimensional vector spaces, Q ⊆ V^∗is compact and convex,φ is a continuous convex function on Q, and A : E → V is a linear map. The smooth approximation of q suggested in [23] is described by the convex function

(5)

q_μ(x) = max{u, Ax − φ(u) − μd(u) | u ∈ Q}, x ∈ E, (7) where d(·) is a prox-function for Q. It was proved in [23, Theorem 1] that the convex function q_μ(x) is C¹^,1(E). More specifically, its gradient mapping is Lipschitz contin- uous with constant L_μ= A²

σμ and the gradient is described by∇qμ(x) = Auμ(x), where u_μ(x) is the unique minimizer of (7).

For the absolute value function q(x) = |x| with x ∈ R¹. Let A = 1, b = 0, E = R¹, Q = {u ∈ R¹| |u| ≤ 1} and taking d(u) := ¹₂u². Then, we have

q_μ(x) = max

u {Ax − b, u − μd(u) | u ∈ Q}

= max

u

xu−μ

2u²

= x²

2μ, if|x| ≤ μ,

|x| −^μ₂, otherwise.

As we see, it also yields the Huber smoothing functionφ1(μ, x) defined by (3) through this approximation way.

4. The infimal-convolution smoothing technique

Suppose thatE is a finite vector space and f , g : E → (−∞, ∞]. The infimal convolution of f and g, f g : E → [−∞, +∞] is defined by

( f g)(x) = inf

y∈E{ f (y) + g(x − y)} .

In light of the concept of infimal convolution, one can also construct smoothing approx- imation functions. More specifically, we consider f : E → (−∞, ∞] which is a closed proper convex function and letω : E → R be a C¹^,1 convex function with Lipschitz gradient constant 1/σ (σ > 0). Suppose that for any μ > 0 and any x ∈ E, the following infimal convolution is finite:

f_μ^ic(x) = inf

u∈E

f(u) + μω

x− u μ

= ( f ω_μ)(x), (8)

where ω_μ(·) = μω

μ·

. Then, f_μ^ic is called the infimal-convolution μ-smooth approximation of f . In particular, whenμ ∈ R++ and p ∈ (1, +∞), the infimal convolution of a convex function and a power of the norm function is obtained as below:

f

1 μp · ^p

= inf

u∈E

f(u) +

1

μpx − u^p

. (9)

For the absolute value function, it can be verified that f_μ(x) = (| · |)

μ∗p1 | · |^p is the Huber function of order p, i.e.,

(6)

Fig. 1 |x| and Huber function of order p (μ = 0.3)

f_μ(x) =

⎧⎨

⎩

|x| − ^p⁻¹_p μ^p¹⁻¹, if |x| > μ^p⁻¹¹ ,

|x|^p

μp, if |x| ≤ μ^p−1¹ . (10)

Note that when p= 2 in the above expression (10), the Huber function of order p reduces to the Huber functionφ1(μ, x) as shown in (3). Figure1depicts the Huber function of order p with various value of p. To the contrast, plugging p= 2 into infimal convolution formula (9) yields the Moreau approximation (6). For more details about infimal convolution and its induces approximation functions, please refer to [2,3].

5. The convolution smoothing technique

The smoothing approximation via convolution for the absolute value function is a popular way, which can be found in [4,11,28,35]. Its construction idea is described as follows. First, one constructs a smoothing approximation for the plus function (x)₊= max{0, x}. To this end, we consider the piecewise continuous function d(x) with finite number of pieces which is a density (kernel) function, that is, it satisfies

d(x) ≥ 0 and

_+∞

−∞ d(x)dx = 1.

Next, define ˆs(μ, x) := _μ¹d

x μ

, where μ is a positive parameter. Suppose that

_+∞

−∞ |x| d(x)dx < +∞, then a smoothing approximation (denoted by ˆp(μ, x)) for (x)₊is obtained as below:

ˆp(μ, x) =

_+∞

−∞ (x − s)₊ˆs(μ, s)ds =

x

−∞(x − s)ˆs(μ, s)ds.

(7)

The following are four well-known smoothing functions for the plus function [4,28]:

ˆφ1(μ, x) = x + μ log

1+ e⁻^x^μ

. (11)

ˆφ2(μ, x) =

⎧⎨

⎩

x if x ≥ ^μ₂,

1 2μ

x+^μ₂2

if −^μ₂ < x < ^μ₂,

0 if x ≤ −^μ₂.

(12)

ˆφ3(μ, x) =

4μ²+ x²+ x

2 . (13)

ˆφ4(μ, x) =

⎧⎨

⎩

x−^μ₂ if x> μ,

x²

2μ if 0≤ x ≤ μ, 0 if x< 0.

(14)

where their corresponding kernel functions are

d1(x) = e^−x (1 + e^−x)², d2(x) =

1 if −¹₂ ≤ x ≤ ¹₂, 0 otherwise, d3(x) = 2

(x²+ 4)³², d4(x) =

1 if 0≤ x ≤ 1, 0 otherwise.

Using the fact that |x| = (x)++ (−x)₋. Then, the smoothing function of |x| via convolution can be written as

ˆp(μ, |x|) = ˆp(μ, x) + ˆp(μ, −x) =

_+∞

−∞ |x − s| ˆs(μ, s)ds.

Analogous to (11)–(14), we reach the following smoothing functions for|x|:

φ4(μ, x) = μ log

1+ e⁻^μ^x + log

1+ e^x^μ

. (15)

φ5(μ, x) =

⎧⎨

⎩

x if x≥ ^μ₂,

x²

μ +^μ₄ if −^μ₂ < x < ^μ₂,

−x if x≤ −^μ₂.

(16)

φ6(μ, x) =

4μ²+ x². (17)

as well as the Huber function (3). If we take a Epanechnikov kernel function

K(x) = ₃

4(1 − x²) if |x| ≤ 1,

0 otherwise,

(8)

we achieve the smoothing function for|x|:

φ7(μ, x) =

⎧⎪

⎨

⎪⎩

x if x> μ,

−₈^x_μ⁴3 +^3x₄_μ² +³₈^μ if −μ ≤ x ≤ μ,

−x if x< −μ.

(18)

Moreover, if we take a Gaussian kernel function K(x) = ^√¹₂_πe⁻^x2² for all x ∈ R.

Then, it yields

ˆs(μ, x) := 1 μK

x μ

= 1

2πμ²e⁻

x2 2μ2.

Hence, we obtain the below smoothing function [35] for|x|:

φ8(μ, x) = x erf

x

√2μ

+

2

πμe⁻²^x2^μ2. (19) where the error function is defined by

erf(x) = 2

√π

_x

0

e^−u²du, ∀x ∈ R.

To sum up, we have eight smoothing functions in total through the above construc- tions. Figure2depicts the graphs of all the aforementioned smoothing functionsφi, i = 1, . . . , 8 and the absolute value equation. Not only from the geometric view, φi, i = 1, . . . , 8 are clearly smoothing functions of |x|, it can be also verified theoretically in Proposition2.1.

φ1(μ, x) = sup

|z|≤1

zx−μ

2z²

=

⎧⎨

⎩

x²

2μ, if |x| ≤ μ,

|x| −μ

2, otherwise.

φ2(μ, x) = μ log

cosh

x μ

. φ3(μ, x) = sup

−1≤y≤1

x y+ μ

1− y²− μ

=

x²+ μ²− μ.

φ4(μ, x) = μ log

1+ e⁻^x^μ + log

1+ e^μ^x

. φ5(μ, x) =

⎧⎨

⎩

x if x≥ ^μ₂,

x²

μ +^μ₄ if −^μ₂ < x <^μ₂,

−x if x≤ −^μ₂. φ (μ, x) =

4μ²+ x².

(9)

Fig. 2 The graphs of|x| and the smoothing functions φi, i = 1, . . . , 8 (μ = 0.3)

φ7(μ, x) =

⎧⎪

⎨

⎪⎩

x if x> μ,

−₈^x_μ⁴3 +^3x₄_μ² +³₈^μ if −μ ≤ x ≤ μ,

−x if x< −μ.

φ8(μ, x) = x erf

x

√2μ

+

2

πμe⁻²^x2^μ2.

From Fig.2, we see that the local behavior of all eight smoothing functions can be described as

φ3≤ φ2≤ φ1≤ |x| ≤ φ5≤ φ7≤ φ8≤ φ4≤ φ6. (20) In particular, three smoothing functionφ1,φ2,φ3approach to|x| from below with φ1≥ φ2≥ φ3. To the contrast, the other five smoothing functionsφ4,φ5,φ6,φ7,φ8

appraoch to|x| from above with φ5≤ φ7≤ φ8≤ φ4≤ φ6. Apparently, the smoothing functionφ1andφ5are closest to|x| among these smoothing functions.

Besides the geometric observation, we also provide algebraic analysis for (20).

Noting that each functionφi(μ, x), for i = 1, 2, . . . , 8, is symmetric, so we only need to prove (20) with x ≥ 0. To proceed, for fixed μ > 0, we let y = _μ^x. The verifications consist of seven parts.

Part (1):φ3(μ, x) ≤ φ2(μ, x). To verify this inequality, we consider

f(y) = log

e^y+ e^−y 2

−

y²+ 1 + 1.

Then, we compute the derivation of f(y) as below:

f(y) = e^y− e^−y

e^y+ e^−y − y y²+ 1

(10)

= e^2y− 1

e^2y+ 1− y y²+ 1

= 1 − 2

e^2y+ 1 − 1 +

y ²+ 1 − y y²+ 1

= 1

y²+ 1

y²+ 1 + y − 2 e^2y+ 1

= e^2y− 1 − 2y²− 2y y²+ 1

e^2y+ 1

y²+ 1

y²+ 1 + y.

For convenience, we denote g(y) = e^2y− 1 − 2y²− 2y

y²+ 1. It is known that the function e^xcan be expressed as

e^x =^∞

n=0

xⁿ

n!, (21)

which indicates e^x − 1 ≥ x +^x₂² +^x₆³. Then, it follows that

g(y) ≥ 2y +(2y)²

2 +(2y)³

6 − 2y²− 2y y²+ 1

= 4y³ 3 + 2y

1−

y²+ 1

= 4y³

3 − 2y³

1+ y²+ 1

= 2y³

2

3 − 1

1+ y²+ 1

≥ 2y³

2 3−1

2

≥ 0, ∀y ≥ 0.

This implies that f(y) ≥ 0 for all y ≥ 0. Thus, f is monotonically nondecreasing which yields f(y) ≥ f (0) = 0. Then, we verify the assertion that φ3(μ, x) ≤ φ2(μ, x).

Part (2): φ2(μ, x) ≤ φ1(μ, x). In order to prove this inequality, we discuss two cases.

(i) For 0≤ x ≤ μ, this implies that 0 ≤ y ≤ 1. Considering

f(y) = y²

− log

e^y+ e^−y

(11)

yields that

f(y) = y −e^2y− 1

e^2y+ 1 = ye^2y+ y − e^2y+ 1 e^2y+ 1 . By denoting g(y) := ye^2y+ y − e^2y+ 1 and using (21) leads to

g(y) = ye^2y+ y − e^2y+ 1

= y^∞

n=0

(2y)ⁿ n! −^∞

n=0

(2y)ⁿ

n! + y + 1

= y^∞

n=0

(2y)ⁿ n! −

_∞

n=0

(2y)ⁿ⁺¹ (n + 1)!+ 1

+ y + 1

= y

_∞

n=0

(2y)ⁿ n!

1− 2

n+ 1

+ y

≥ 0, ∀y ∈ [0, 1].

Therefor, we obtain that f(y) ≥ 0 for all y ∈ [0, 1].

(ii) For x > μ, this implies that y > 1. Considering

f(y) = y −1 2− log

e^y+ e^−y 2

gives

f(y) = 1 −e^2y− 1 e^2y+ 1 > 0.

To sum up, we obtain that f(y) ≥ 0 for all y ∈ [0, 1] in both cases. Following the same arguments as in part(1), we conclude thatφ2(μ, x) ≤ φ1(μ, x).

Part (3):φ1(μ, x) ≤ |x| and |x| ≤ φ5(μ, x). It is easy to verify these inequalities.

We omit the verification.

Part (4):φ5(μ, x) ≤ φ7(μ, x). We will prove this inequality by discussing three cases.

(i) For x> μ, it is easy to see that φ5(μ, x) = φ7(μ, x) = x.

(ii) For ^μ₂ ≤ x ≤ μ, it means ¹₂ ≤ y ≤ 1. Considering

f(y) = −y⁴ 8 +3y²

4 +3 8 − y

= −y⁴+ 6y²− 8y + 6 8

= −(y²− 1)²+ 4(y − 1)²+ 3 8

(12)

= (y − 1)²

4− (y + 1)² + 3 8

ad using the facts of ¹₂≤ y ≤ 1 and⁹₄ ≤ (y + 1)²≤ 4, it follows that f (y) ≥ 0.

(iii) For 0≤ x < ^μ₂, 0≤ y <¹₂. Considering

f(y) = −y⁴ 8 +3y²

4 +3

8− y²−1 4

= −y⁴ 8 −y²

4 +1 8

= −y⁴− 2y²+ 1 8

= 2− (y²+ 1)² 8

and applying the facts 0 ≤ y < ¹₂ and 1 ≤ (y²+ 1)² ≤ ²⁵₁₆, it follows that f(y) > 0. From all the above, we achieve that φ5(μ, x) ≤ φ7(μ, x).

Part (5):φ7(μ, x) ≤ φ8(μ, x). To proceed this assertion, we discuss two cases.

(i) For 0≤ x ≤ μ, this implies that 0 ≤ y ≤ 1. Consider

f(y) = yerf

y

√2

+

2

πe⁻^y2² +y⁴ 8 −3y²

4 −3 8.

By applying [35, Lemma 2.5], we have erf

√y 2

≥

1− e⁻^y2²

¹₂

. Then, it implies that

f(y) ≥ y

1− e⁻^y2²

¹

2 +

2

πe⁻^y2² + y⁴ 8 −3y²

4 −3

8 := g(y).

It is easy to verify g(y) is monotonically decreasing on [0, 1] which indicates that g(y) ≥ g(1) > 0. Hence, we obtain f (y) > 0 and φ7(μ, x) ≤ φ8(μ, x) is proved.

(ii) For x > μ, it means y > 1. Consider

f(y) = yerf

y

√2

+

2

πe⁻^y2² − y,

which yields f(y) = erf

√y 2

− 1 < 0. Hence, f (y) is monotonically decreasing on[1, +∞). Moreover, using [35, Lemma 2.5] gives

f(y) ≥ y

1− e⁻^y2²

¹

2 +

2

e⁻^y2² − y.

(13)

Taking the limit in this inequality, we obtain limy→∞ f(y) ≥ 0. Therefore, f (y) ≥ limy→∞ f(y) ≥ 0, which show the assertion φ7(μ, x) ≤ φ8(μ, x).

Part (6):φ8(μ, x) ≤ φ4(μ, x). Consider

f(y) = log

1+ e^−y + log

1+ e^y

− yerf

y

√2

−

2 πe⁻^y2² . Then, we have

f(y) = e^y− 1 e^y+ 1− erf

y

√2

= 1 −

2

e^y+ 1+ erf

y

√2

:= 1 − g(y),

which says that

g(y) = − 2e^y e^y+ 1+ 2

√π

√1 2e⁻^y2²

= 2

− e^y e^y+ 1

√2 2πe⁻^y2²

= 2

⎛

⎝−

√2πe√^y+ (1 + e^y)e⁻^y2² 2π(1 + e^y)

⎞

⎠

≤ 2

−√

2πe^y+ 1 + e^y

√2π(1 + e^y)

, as y ≥ 0,

= 2

(1 −√

2π)e^y+ 1

√2π(1 + e^y)

≤ 2(1 −√

2π + 1) < 0, as e^y≥ 1, 1 −√ 2π < 0.

Hence, g(y) < g(0) = 1 which leads to f(y) > 0 for all y ≥ 0. Then, it follows that f(y) > f (0) = 2 log 2 −

π2 > 0, and hence φ8(μ, x) ≤ φ4(μ, x) is proved.

Part (7):φ4(μ, x) ≤ φ6(μ, x). Consider

f(y) =

4+ y²− log

1+ e^−y + log

1+ e^y , which gives

f(y) = y

4+ y²−e^y− 1 e^y+ 1

= y

4+ y²− 1 + 2 e^y+ 1

(14)

= y− 4+ y² 4+ y² + 2

e^y+ 1

=−4(1 + e^y) + 2

4+ y² y+

4+ y² 4+ y²

y+ 4+ y²

(1 + e^y) .

For convenience, we denote g(y) := −2(1 + e^y) +

4+ y²

y+

4+ y²

= −2e^y+ 2 + y²+ y 4+ y².

Because e^y > 1 + y + ^y₂², it yields g(y) < −2y + y

4+ y²= y

2−

4+ y²

≤ 0, ∀y ≥ 0.

This means that f(y) < 0, i.e., f (y) is monotonically decreasing on [0, +∞). On the other hand, we know that

ylim→∞ f(y) = lim

y→∞

4+ y²− log

1+ e^−y + log

1+ e^y

= lim

y→∞

4+ y²− y + y − log

1+ e^−y + log

1+ e^y

= lim

y→∞

4+ y²− y + lim

y→∞y− log

1+ e^−y + log

1+ e^y

= lim

y→∞y− log 1+ e^y

= lim

y→∞log e^y 1+ e^y = 0.

Thus, f(y) ≥ limy→∞ f(y) = 0 which implies that φ4(μ, x) ≤ φ6(μ, x).

From Parts (1)–(7), the proof of (20) is complete.

Proposition 2.1 Letφi : R² → R for i = 1, . . . , 6 be defined as in (3)–(5) and (15)–(19), respectively. Then, we have

(a) φi is continuously differentiable at(μ, x) ∈ R₊₊× R;

(b) lim_μ↓0φi(μ, x) = |x|.

Proof The proof is straightforward and we omit it.

Next, we recall some materials about first order differential equations (ODE):

˙w(t) = H(w(t)), w(t0) = w0∈ Rⁿ, (22) where H : Rⁿ → Rⁿ is a mapping. We also introduce three kinds of stability that will be discussed later. These materials can be found in usual ODE textbooks. A point

(15)

w^∗ = w(t^∗) is called an equilibrium point or a steady state of the dynamic system (22) if H(w^∗) = 0. If there is a neighborhood ^∗⊆ Rⁿofw^∗such that H(w^∗) = 0 and H(w) = 0 ∀w ∈ ^∗\{w^∗}, then w^∗is called an isolated equilibrium point.

Lemma 2.1 Suppose that H : Rⁿ → Rⁿ is a continuous mapping. Then, for any t0 > 0 and w0 ∈ Rⁿ, there exists a local solutionw(t) to (22) with t ∈ [t0, τ) for someτ > t0. If, in addition, H is locally Lipschitz continuous at x0, then the solution is unique; if H is Lipschitz continuous inRⁿ, thenτ can be extended to ∞.

Letw(t) be a solution to dynamic system (22). An isolated equilibrium pointw^∗ is Lyapunov stable if for anyw0 = w(t0) and any ε > 0, there exists a δ > 0 such thatw(t) − w^∗ < ε for all t ≥ t0andw(t0) − w^∗ < δ. An isolated equilibrium pointw^∗ is said to be asymptotic stable if in addition to being Lyapunov stable, it has the property thatw(t) → w^∗as t → ∞ for all w(t0) − w^∗ < δ. An isolated equilibrium pointw^∗is exponentially stable if there exists aδ > 0 such that arbitrary pointw(t) of (22) with the initial conditionw(t0) = w0andw(t0) − w^∗ < δ is well defined on[0, +∞) and satisfies

w(t) − w^∗ ≤ ce^−ωtw(t0) − w^∗ ∀t ≥ t0, where c> 0 and ω > 0 are constants independent of the initial point.

Let ⊆ Rⁿbe an open neighborhood of ¯w. A continuously differentiable function V : Rⁿ → R is said to be a Lyapunov function at the state ¯w over the set for Eq. (22) if

V( ¯w) = 0, V (w) > 0, ∀w ∈ \{ ¯w},

˙V (w) ≤ 0, ∀w ∈ \{ ¯w}.

The Lyapunov stability and asymptotical stability can be verified by using Lyapunov function, which is a useful tool for analysis.

Lemma 2.2 (a) An isolated equilibrium pointw^∗is Lyapunov stable if there exists a Lyapunov function over some neighborhood^∗ofw^∗.

(b) An isolated equilibrium pointw^∗is asymptotically stable if there exists a Lya- punov function over some neighborhood^∗ofw^∗such that ˙V(w) < 0, ∀w ∈

^∗\{w^∗}.

3 Neural network model for AVE

In order to design a suitable neural network for absolute value Eq. (1), the key step is to construct an appropriate energy function E(x) for which the global minimization x^∗ is simultaneously a solution of the AVE (1). One approach to constructing a desired energy function is the merit function method. The basic idea in this approach is to transform the AVE (1) into an unconstrained problem.

To this end, we define Hi : Rⁿ⁺¹→ Rⁿ⁺¹as Hi(μ, x) =

!μ

Ax+ Bi(μ, x) − b

"

, for μ ∈ R, and x ∈ Rⁿ, (23)

(16)

wherei : Rⁿ⁺¹→ Rⁿis given by

i(μ, x) :=

⎡

⎢⎢

⎢⎣

φi(μ, x1) φi(μ, x2) ...φi(μ, xn)

⎤

⎥⎥

⎥⎦, for μ ∈ R, and x ∈ Rⁿ, (24)

with various smoothing functionsφi : R² → R that is introduced in Sect.2. Then, the AVE (1) can be transformed into an unconstrained optimization problem:

min(μ, x) = 1

2Hi(μ, x)². (25)

Letw = (μ, x), the AVE (1) is equivalent to Hi(μ, x) = 0. It is clear that if w^∗ ∈ R++× Rⁿsolves Hi(w) = 0, then w^∗solves∇(w) = 0. Applying the gradient approach to the minimization of the energy function (25), we obtain the system of differential equation:

du(t)

dt = −ρ ∇(v(t), u(t)) = −ρ∇ Hi(v(t), u(t))^THi(v(t), u(t)),

u(t0) = u0. (26)

where u0 = x0 ∈ Rⁿ, v(t) = μ0e^−t,ρ > 0 is a time scaling factor. In fact, let- ting τ = ρt leads to ^du_dt^(t) = ρ^du_d^(τ)_τ . Hence, it follows from (26) that ^du_d^(τ)_τ =

−∇(¹₂Hi(w^∗)²). In view of this, we set ρ = 1 in the subsequent analysis.

Assumption 3.1 The minimal singular value of the matrix A is strictly greater than the maximal singular value of the matrix B.

Proposition 3.1 The AVE (2) is uniquely solvable for any b∈ Rⁿif Assumption3.1is satisfied.

Proof Please see [9, Proposition 2.3] for a proof.

Proposition 3.2 Leti(μ, x) for i = 1, . . . , 8 be defined as in (24). Then, we have (a) Hi(μ, x) = 0 if and only if x solves the AVE (2);

(b) Hi is continuously differentiable onRⁿ\ {0} with the Jacobian matrix given by

∇ Hi(μ, x) := [A + B ∇i(μ, x)] , (27) where

∇i(μ, x) :=

⎡

⎢⎢

⎣

∂φi(μ, x1)

∂x1

0 · · · 0

0 ∂φi(μ, x2)

∂x2 · · · 0

... ... ... ...

0 · · · 0 ∂φi(μ, xn)

⎤

⎥⎥

⎦ .

(17)

Proof The arguments are straightforward and we omit them.

Proposition 3.3 Let Hi and∇ Hi be given as in (23) and (27), respectively. Suppose that Assumption3.1holds. Then,∇ Hi(μ, x) is invertible at any x ∈ Rⁿandμ > 0.

Proof The result follows from Proposition3.2immediately.

Proposition 3.4 Let : Rⁿ→ R₊be given by (25). Then, the following results hold.

(a) (x) ≥ 0, ∀(μ, t) ∈ R++× R and (μ, x) = 0 if and only if x solve the AVE (2).

(b) The function(x) is continuously differentiable on Rⁿ⁺¹\ {0} with

∇(μ, x) = ∇ H^TH(μ, x), where∇ H is the Jacobian of H(μ, x).

(c) The function(w(t)) is nonincreasing with respect to t.

Proof Parts (a)–(b) follow from Proposition3.2immediately.

For part (c), we observe that d(w(t))

dt =

)dw

dt , ∇(μ, x)

*

= −ρ ∇(μ, x), ∇(μ, x)

= −ρ ∇(μ, x)²< 0,

for all x∈ \{x^∗}. Then, the desired result follows.

4 Stability and existence

In this section, we first adress the relation between the solution of AVE (1) and the equilibrium point of neural network (26). Then, we discuss the issues of the stability and the the solution trajectory of the neural network (26).

Lemma 4.1 Let x^∗be a equilibrium of the neural network (26) and suppose that the singular values of A∈ Rⁿ^×nexceed 1. Then x^∗solves the system (1).

Proof Since ∇ ((w^∗)) = ∇ Hi^THi(w^∗) and from the Proposition3.3obtain∇ H is nonsingular. It is clear to see that

∇ (w^∗)

= 0,

if and only if Hi(w^∗) = 0.

Theorem 4.1 (a) For any initial pointw0= w(t0), there exists a unique continuously maximal solutionw(t) with t ∈ [t0, τ) for the neural network (26).

(b) If the level setL(w0) :=+

w | Hi(w)²≤ H(w0)²,

is bounded, thenτ can be extended to∞.

(18)

Proof This proof is exactly the same as the one in [33, Proposition 3.4], so we omit it

here.

Now, we are going to analyze the stability of an isolated equilibrium x^∗of the neural network (26), which is to assert that∇(x^∗) = 0 and ∇(x) = 0 for x ∈ \{x^∗},

 is a neighborhood of x^∗.

Theorem 4.2 If the singular values of A ∈ Rⁿ^×n exceed 1, then the isolated equi- librium x^∗of the neural network (26) is asymptotically stable, and hence Lyapunov stable.

Proof We consider the Lyapunov function (w) : → R defined by (25). First, it is clear that(x) ≥ 0 and from (a) of Proposition3.4we have(·) is continuously differentiable. Considering the singular values of A exceed 1 and Proposition3.3we obtain ∇ H(w^∗) is nonsingular. Then applying ∇ H(x^∗) and Lemma4.1, we have H(w^∗) = 0 and (w^∗) = 0. Furthermore, if (w) = 0 on , then H(w) = 0 and hence∇ = 0 on . This yields that w = w^∗sincew^∗is isolated.

Secondly, consider the (b) of Proposition3.4and Lemma2.2, it says the isolated equilibrium x^∗is asymptotically stable, and hence is Lyapunov stable.

Theorem 4.3 If the singular values of A∈ Rⁿ^×nexceed 1, then the isolated equilib- rium x^∗of the neural network (26) is exponentially stable.

Proof The proof is routine and similar to that in the literature. For completeness, we include it again. Let = R₊₊×Rⁿ, it is clear that H(·) is continuously differentiable, which implies

H(w) = H(w^∗) + ∇ H(w^∗)^T(w − w^∗) + o(w − w^∗), ∀ w ∈ . (28) Let g(t) := ¹₂w(t) − w^∗²and we compute the derivative of g(t) as below:

dg(t) dt =

dw dt

T

(w(t) − w^∗) = −ρ

2∇(H(w)²)^T(w(t) − w^∗)

= −ρ (∇ H(w) · H(w))^T(w(t) − w^∗)

= −ρ H(w)^T∇ H(w)^T(w(t) − w^∗)

= −ρ (w(t) − w^∗)^T∇ H(w^∗)∇ H(w)^T(w(t) − w^∗)

−ρ o(w − w^∗)^T∇ H(w)^T(w(t) − w^∗),

where the last equality is due to (28). To proceed, we claim two assertions. First, we claim that (w − w^∗)^T∇ H(w^∗)∇ H(w)^T(w − w^∗) ≥ κ||w − w^∗||², for some κ. To see this, from the Propositions 3.3 and 3.4, we know ∇ H(w) is nonsin- gular and H is a continuously differentiable function, which implies the matrix

∇ H(w^∗)∇ H(w^∗)^T is symmetric and positive semi-definite. Hence, we have(w − w^∗)^T∇ H(w^∗)∇ H(w^∗)^T(w − w^∗) ≥ κ1w − w^∗² > 0 over \{w^∗} for some κ1≥ 0. Then, by the continuity of ∇ H(·), we can conclude that

(w − w^∗)^T∇ H(w^∗)∇ H(w)^T(w − w^∗) ≥ κw − w^∗²> 0, for some κ ≥ 0.

(19)

Secondly, we claim that

−ρ o(w − w^∗)^T∇ H(w)^T(w(t) − w^∗) ≤ εw − w^∗², for some ε > 0.

This is because that

-- − ρo(w − w^∗)^T∇ H(w)^T(w(t) − w^∗)--

w − w^∗² ≤ ρ∇ H(w)

o(w − w^∗)

w − w^∗

,

where the right-hand side vanishes eventually. Thus, it yields that

−ρ o(w − w^∗)^T∇ H(w)^T(w(t) − w^∗) ≤ εw − w^∗², for some ε > 0.

Now, from the above two assertions and noting that g(t) = ¹₂w(t) − w^∗², we have dg(t)

dt ≤ 2(−ρκ + ε)g(t), which gives

g(t) ≤ e²^{(−ρκ+ε)t}g(t0).

Thus, we have

w(t) − w^∗ ≤ e^{(−ρκ+ε)t}w(t0) − w^∗,

which saysw^∗is exponentially stable as we can setρ larger enough such that −ρκ +

ε < 0. Then, the proof is complete.

5 Numerical results

In order to demonstrate the effectiveness of the proposed neural network, we test several examples for our neural network (26) in this section. The numerical imple- mentation is coded by Mathematica 11.3 and the ordinary differential equation solver adopted is NDSolve[ ], which uses an Runge–Kutta (2,3) formula. The initial point of each problems are selected by randomly and the initial point is same for different smoothing functions. The results are collected together in Tables1,2,3and4, where

φi Denotes the smoothing functionsφi, i = 1, . . . , 8 N Denotes the number of iterations

t Denotes the time when algorithm terminates

Er Denotes the value ofx(t) − x^∗ when algorithm terminates

H(xt) Denotes the value ofH(x(t)) = Ax − |x| − b when algorithm terminates C T Denotes the CPU time in seconds

(20)

Table 1 Computing results of Example5.1(dt= 0.2)

Function N t Er H(x0) CT

φ1 34 6.8 9.3686 × 10⁻⁷ 0.0000136037 1.5090863

φ2 36 7.2 8.70587 × 10⁻⁷ 0.0000126414 0.7760444

φ3 38 7.6 8.41914 × 10⁻⁷ 0.000012225 0.4980285

φ4 2 0.4 2.90785 × 10⁻¹⁵ 1.59872 × 10⁻¹⁴ 0.0340019

φ5 2 0.4 1.11772 × 10⁻¹² 8.41527 × 10⁻¹² 0.0740043

φ6 10 2.0 7.52691 × 10⁻⁷ 0.0000109295 0.1150066

φ7 2 0.4 1.29976 × 10⁻¹² 8.61527 × 10⁻¹² 0.0730042

φ8 34 6.8 9.3686 × 10⁻⁷ 0.0000136037 0.6880393

Table 2 Computing results of Example5.2(dt= 0.2)

Function N T Er H(x0) CT

φ1 55 11 9.84216 × 10⁻⁷ 2.03995 × 10⁻⁷ 3.1531804

φ2 57 11.4 9.14593 × 10⁻⁷ 1.89565 × 10⁻⁷ 1.3690783

φ3 59 11.8 8.84473 × 10⁻⁷ 1.83322 × 10⁻⁷ 0.6920396

φ4 2 0.4 2.67859 × 10⁻⁹ 4.86096 × 10⁻¹⁰ 0.0370021

φ5 2 0.4 2.68658 × 10⁻⁹ 4.87548 × 10⁻¹⁰ 0.1510086

φ6 18 3.6 9.56666 × 10⁻⁷ 2.57215 × 10⁻⁷ 0.2210127

φ7 2 0.4 2.16413 × 10⁻⁹ 3.92736 × 10⁻¹⁰ 0.1600091

φ8 55 11 9.84216 × 10⁻⁷ 2.03995 × 10⁻⁷ 1.5320877

Example 5.1 Consider the following absolute value equation where

A=

⎛

⎜⎜

⎝

10 1 2 0

1 11 3 1

0 2 12 1

1 7 0 13

⎞

⎟⎟

⎠ , b =

⎛

⎜⎜

⎝ 12 15 14 20

⎞

⎟⎟

⎠ .

We can verify that one solution of the absolute value equations is x^∗= (1, 1, 1, 1).

The parameterρ is set to be 1, time step is set to be dt = 0.2 and the initial point is generated by randomly. Table1summarizes the computing results for Example5.1.

From Table1, we see that nor matter from the trajectory convergence time, the error, or the computation time, the smoothing functionφ4,φ5,φ7perform significantly better than other functions. Figure3depicts the norm errorx(t) − x^∗ with various time.

This figure indicates the smoothing functionsφ4,φ5,φ7also outperform than others (it follows byφ6).