• 沒有找到結果。

Neural network based on systematically generated smoothing functions for absolute value equation

N/A
N/A
Protected

Academic year: 2022

Share "Neural network based on systematically generated smoothing functions for absolute value equation"

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

https://doi.org/10.1007/s12190-019-01262-1 O R I G I N A L R E S E A R C H

Neural network based on systematically generated smoothing functions for absolute value equation

B. Saheya1· Chieu Thanh Nguyen2· Jein-Shan Chen2

Received: 4 January 2019 / Published online: 20 April 2019

© Korean Society for Informatics and Computational Applied Mathematics 2019

Abstract

In this paper, we summarize several systematic ways of constructing smoothing func- tions and illustrate eight smoothing functions accordingly. Then, based on these systematically generated smoothing functions, a unified neural network model is pro- posed for solving absolute value equation. The issues regarding the equilibrium point, the trajectory, and the stability properties of the neural network are addressed. More- over, numerical experiments with comparison are presented, which suggests what kind of smoothing functions work well along with the neural network approach.

Keywords Absolute value equations· Neural network · Smoothing function Mathematics Subject Classifications 65K10· 93B40 · 26D07

1 Introduction

The main target that we tackle with in this paper is the so-called absolute value equation (AVE for short), whose mathematical format is as below. In fact, the original standard AVE is described by

B. Saheya: The author’s work is supported by National Key R&D Program of China (Award No.:

2017YFC1405605) and Foundation of Inner Mongolia Normal University (Award No.: 2017YJRC003) J.-S. Chen: The author’s work is supported by Ministry of Science and Technology, Taiwan.

B

Jein-Shan Chen jschen@math.ntnu.edu.tw B. Saheya

saheya@imnu.edu.cn Chieu Thanh Nguyen thanhchieu90@gmail.com

1 College of Mathematical Science, Inner Mongolia Normal University, Hohhot 010022, Inner Mongolia, People’s Republic of China

2 Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

(2)

Ax− |x| = b, (1) where A∈ Rn×nand b∈ Rn. Here|x| means the componentwise absolute value of vector x∈ Rn. Another generalized absolute value equation is in the form of

Ax + B|x| = b, (2)

where B ∈ Rn×n, B = O. When B = −I , I is the identity matrix, the AVE (2) reduces to the special form (1).

The AVE (1) was first introduced by Rohn in [29] and recently has been investigated by many researchers, for example, Hu and Huang [7], Saheya and Chen [32], Jiang and Zhang [9], Ketabchi and Moosaei [10], Mangasarian [14–18], Mangasarian and Meyer [21], Prokopyev [25], and Rohn [31]. In particular, Mangasarian and Meyer [21] show that the AVE (1) is equivalent to the bilinear program, the generalized LCP (linear complementarity problem), and the standard LCP provided 1 is not an eigenvalue of A. With these equivalent reformulations, they also prove that the AVE (1) is NP-hard in its general form and provide existence results. Prokopyev [25] further obtain an improvement indicating that the AVE (1) can be equivalently recast as (a larger) LCP without any assumption on A and B, and also provides a relationship with mixed integer programming. It is known that, if solvable, the AVE (1) can have either unique solution or multiple (e.g., exponentially many) solutions. Indeed, various sufficiency conditions on solvability and non-solvability of the AVE (1) with unique and multiple solutions are discussed in [21,25,30]. Some variants of the AVEs including the abso- lute value equation associated with second-order cone (SOCAVE) and the absolute value programs, are investigated in [8] and [38], respectively. Furthermore, some other type of absolute value equation, an extension of the AVE (2), is considered [8,19,20].

Roughly, there have three approaches for dealing with the AVEs (1)–(2). The first one is reformulating the AVEs (1)–(2) as complementarity problem and then solve it accordingly. The second one is to recast the AVEs (1)–(2) as a system of nonsmooth equations and then tackle with the nonsmooth equations by applying nonsmooth New- ton algorithm [26] or smoothing Newton algorithm [27]. The third one is applying the neural network approach. In this paper, we follow the third idea for solving the AVEs (1)–(2). Inspired by our another recent work [24], we will combine various smoothing functions with the neural network approach. Different from [24,32], the smoothing functions studied in this paper are not only constructed from one way, they are gener- ated by several systematic ways. Accordingly, this one can be viewed as a follow-up of [24,32].

Now, we quickly go over neural network approach which is different from tradi- tional optimization methods. To consider this approach, the main reason lies on the real-time solutions of optimization problems, which are sometimes required in prac- tice. It is well known that the neural networks approach is a very promising approach to solving the real-time optimization problem. In general, the neural networks can be implemented using integrated circuits and were first introduced in the 1980s by Hop- field and Tank [6,34] for optimization problems. Since then, significant research results have been achieved for various optimization problems, including linear programming [39], quadratic programming [1], linear complementarity problems [12], nonlinear

(3)

complementarity problem [13] and nonlinear programming [5]. In general, the essence of neural network approach is to construct a nonnegative energy function and establish a dynamic system that represents an artificial neural network. A first order differential equation represents the dynamic system. Furthermore, it is expected that the dynamic system will converge to its static state (or an equilibrium point), which corresponds to the solution for the underlying optimization problem, starting from an initial point.

Although similar idea was employed by Wang, Yu and Guo in [36], only one smoothing function was studied therein. In this paper, we present systematical ways about how to construct smoothing functions for AVE (1) and illustrate eight smoothing functions accordingly. After that, we design a gradient descent neural network model by using these eight different smoothing functions. We not only discuss the stability of the neural networks, but also give numerical comparison for these smoothing functions.

In fact, the new upshot of this paper lies on the numerical comparison, which suggest what kind of smoothing functions work well along with the neural network approach.

2 Preliminaries

By looking into the mathematical format of the aforementioned AVEs, it is observed that the absolute value function|x| is the key component. Indeed, the absolute value function also plays an important role in a lot of applications, like machine learning and image processing, etc. In particular, the absolute value function|x| is not differentiable at x = 0, which causes limits in analysis and application. To conquer this hurdle, researchers consider smoothing methods and construct smoothing functions for it. We summarize all possible ways to construct smoothing functions for|x| as below. For more details, please refer to [2,4,11,23,28,35].

1. Smoothing by the convex conjugate

Let X be a real topological vector space, and let Xbe the dual space to X . For any function f : dom f → R, its convex conjugate f: (dom f )→ R is defined in terms of the supremum by

f(y) := sup

x∈dom f



xTy− f (x) .

In light of this, one can build up smooth approximation of f , denoted by fμ, by adding strongly convex component to its dual g:= f, namely,

fμ(x) = sup

z∈domg



zTx− g(x) − μd(z)

= (g + μd)(x),

for some 1-strongly convex and continuous function d(·) (called proximity function).

Here, d(·) is 1-strongly convex which satisfies

d((1 − t)x + ty) ≤ (1 − t)d(x) + td(y) −1

2t(1 − t)x − y2,

for all x, y and t ∈ (0, 1). Note that |x| = sup|z|≤1zx. If we take d(z) := z2/2, then the constructed smoothing function via conjugation leads to

(4)

φ1(μ, x) = sup

|z|≤1

 zxμ

2z2

=

⎧⎨

x2

2μ, if |x| ≤ μ,

|x| −μ

2, otherwise. (3)

which is the traditional Huber function.

It is also possible to consider another expression:

|x| = sup

z1+z2=1 z1,z20

(z1− z2)x.

Under this case, if we take d(z) := z1log z1+ z2log z2+ log 2, the constructed smoothing function by conjugation becomes

φ2(μ, x) = μ log

 cosh

x μ



, (4)

where cosh(x) := ex+ e−x

2 . Alternatively, choosing d(y) := 1 −

1− y2 gives another smoothing function:

φ3(μ, x) = sup

−1≤y≤1



x y+ μ

1− y2− μ



=

x2+ μ2− μ. (5)

2. The Moreau proximal smoothing

Suppose that E is an Euclidean space and f : E → (−∞, ∞] is a closed and proper convex function. One natural tool for generating an approximate smoothing function is through the use of the so-called proximal map introduced by Moreau [22].

The Moreau proximal approximation yields a family of approximations{ fμpx}μ>0as below:

fμpx(x) = inf

u∈E

f(u) + 1

2μu − x2

. (6)

It is known that the Moreau proximal approximation fμpx(x) is convex continuous, finite-valued, and differentiable with gradient ∇ fμpx which is Lipschitz continuous with constant μ1, see [22]. When applying the Moreau proximal smoothing way to construct the smoothing function for the absolute value function|x|, it also yields the Huber smoothing functionφ1(μ, x) by using the Moreau envelope [2].

3. Nesterov’s smoothing

There is a class of nonsmooth convex functions considered in [23], which is given by

q(x) = max{ u, Ax − φ(u) | u ∈ Q}, x ∈ E,

whereE, V are finite dimensional vector spaces, Q ⊆ Vis compact and convex,φ is a continuous convex function on Q, and A : E → V is a linear map. The smooth approximation of q suggested in [23] is described by the convex function

(5)

qμ(x) = max{ u, Ax − φ(u) − μd(u) | u ∈ Q}, x ∈ E, (7) where d(·) is a prox-function for Q. It was proved in [23, Theorem 1] that the convex function qμ(x) is C1,1(E). More specifically, its gradient mapping is Lipschitz contin- uous with constant Lμ= A2

σμ and the gradient is described by∇qμ(x) = Auμ(x), where uμ(x) is the unique minimizer of (7).

For the absolute value function q(x) = |x| with x ∈ R1. Let A = 1, b = 0, E = R1, Q = {u ∈ R1| |u| ≤ 1} and taking d(u) := 12u2. Then, we have

qμ(x) = max

u { Ax − b, u − μd(u) | u ∈ Q}

= max

u

 xuμ

2u2



= x2

2μ, if|x| ≤ μ,

|x| −μ2, otherwise.

As we see, it also yields the Huber smoothing functionφ1(μ, x) defined by (3) through this approximation way.

4. The infimal-convolution smoothing technique

Suppose thatE is a finite vector space and f , g : E → (−∞, ∞]. The infimal convolution of f and g, f  g : E → [−∞, +∞] is defined by

( f  g)(x) = inf

y∈E{ f (y) + g(x − y)} .

In light of the concept of infimal convolution, one can also construct smoothing approx- imation functions. More specifically, we consider f : E → (−∞, ∞] which is a closed proper convex function and letω : E → R be a C1,1 convex function with Lipschitz gradient constant 1/σ (σ > 0). Suppose that for any μ > 0 and any x ∈ E, the following infimal convolution is finite:

fμic(x) = inf

u∈E

f(u) + μω

x− u μ



= ( f  ωμ)(x), (8)

where ωμ(·) = μω

μ·



. Then, fμic is called the infimal-convolution μ-smooth approximation of f . In particular, whenμ ∈ R++ and p ∈ (1, +∞), the infimal convolution of a convex function and a power of the norm function is obtained as below:

f

 1 μp · p



= inf

u∈E

f(u) +

 1

μpx − up



. (9)

For the absolute value function, it can be verified that fμ(x) = (| · |) 

μ∗p1 | · |p is the Huber function of order p, i.e.,

(6)

Fig. 1 |x| and Huber function of order p (μ = 0.3)

fμ(x) =

⎧⎨

|x| − p−1p μp1−1, if |x| > μp−11 ,

|x|p

μp, if |x| ≤ μp−11 . (10)

Note that when p= 2 in the above expression (10), the Huber function of order p reduces to the Huber functionφ1(μ, x) as shown in (3). Figure1depicts the Huber function of order p with various value of p. To the contrast, plugging p= 2 into infimal convolution formula (9) yields the Moreau approximation (6). For more details about infimal convolution and its induces approximation functions, please refer to [2,3].

5. The convolution smoothing technique

The smoothing approximation via convolution for the absolute value function is a popular way, which can be found in [4,11,28,35]. Its construction idea is described as follows. First, one constructs a smoothing approximation for the plus function (x)+= max{0, x}. To this end, we consider the piecewise continuous function d(x) with finite number of pieces which is a density (kernel) function, that is, it satisfies

d(x) ≥ 0 and

 +∞

−∞ d(x)dx = 1.

Next, define ˆs(μ, x) := μ1d

x μ



, where μ is a positive parameter. Suppose that

+∞

−∞ |x| d(x)dx < +∞, then a smoothing approximation (denoted by ˆp(μ, x)) for (x)+is obtained as below:

ˆp(μ, x) =

 +∞

−∞ (x − s)+ˆs(μ, s)ds =

 x

−∞(x − s)ˆs(μ, s)ds.

(7)

The following are four well-known smoothing functions for the plus function [4,28]:

ˆφ1(μ, x) = x + μ log

1+ exμ

. (11)

ˆφ2(μ, x) =

⎧⎨

x if xμ2,

1 2μ

x+μ22

if −μ2 < x < μ2,

0 if x ≤ −μ2.

(12)

ˆφ3(μ, x) =

2+ x2+ x

2 . (13)

ˆφ4(μ, x) =

⎧⎨

xμ2 if x> μ,

x2

2μ if 0≤ x ≤ μ, 0 if x< 0.

(14)

where their corresponding kernel functions are

d1(x) = e−x (1 + e−x)2, d2(x) =

1 if −12 ≤ x ≤ 12, 0 otherwise, d3(x) = 2

(x2+ 4)32, d4(x) =

1 if 0≤ x ≤ 1, 0 otherwise.

Using the fact that |x| = (x)++ (−x). Then, the smoothing function of |x| via convolution can be written as

ˆp(μ, |x|) = ˆp(μ, x) + ˆp(μ, −x) =

 +∞

−∞ |x − s| ˆs(μ, s)ds.

Analogous to (11)–(14), we reach the following smoothing functions for|x|:

φ4(μ, x) = μ log



1+ eμx + log

1+ exμ

. (15)

φ5(μ, x) =

⎧⎨

x if xμ2,

x2

μ +μ4 if −μ2 < x < μ2,

−x if x≤ −μ2.

(16)

φ6(μ, x) =

2+ x2. (17)

as well as the Huber function (3). If we take a Epanechnikov kernel function

K(x) = 3

4(1 − x2) if |x| ≤ 1,

0 otherwise,

(8)

we achieve the smoothing function for|x|:

φ7(μ, x) =

⎧⎪

⎪⎩

x if x> μ,

8xμ43 +3x4μ2 +38μ if −μ ≤ x ≤ μ,

−x if x< −μ.

(18)

Moreover, if we take a Gaussian kernel function K(x) = 12πex22 for all x ∈ R.

Then, it yields

ˆs(μ, x) := 1 μK

x μ



= 1

2πμ2e

x2 2μ2.

Hence, we obtain the below smoothing function [35] for|x|:

φ8(μ, x) = x erf

 x

√2μ

 +

2

πμe2x2μ2. (19) where the error function is defined by

erf(x) = 2

π

 x

0

e−u2du, ∀x ∈ R.

To sum up, we have eight smoothing functions in total through the above construc- tions. Figure2depicts the graphs of all the aforementioned smoothing functionsφi, i = 1, . . . , 8 and the absolute value equation. Not only from the geometric view, φi, i = 1, . . . , 8 are clearly smoothing functions of |x|, it can be also verified theoretically in Proposition2.1.

φ1(μ, x) = sup

|z|≤1

 zxμ

2z2

=

⎧⎨

x2

2μ, if |x| ≤ μ,

|x| −μ

2, otherwise.

φ2(μ, x) = μ log

 cosh

x μ



. φ3(μ, x) = sup

−1≤y≤1



x y+ μ

1− y2− μ



=

x2+ μ2− μ.

φ4(μ, x) = μ log



1+ exμ + log

1+ eμx

. φ5(μ, x) =

⎧⎨

x if xμ2,

x2

μ +μ4 if −μ2 < x <μ2,

−x if x≤ −μ2. φ (μ, x) =

4μ2+ x2.

(9)

Fig. 2 The graphs of|x| and the smoothing functions φi, i = 1, . . . , 8 (μ = 0.3)

φ7(μ, x) =

⎧⎪

⎪⎩

x if x> μ,

8xμ43 +3x4μ2 +38μ if −μ ≤ x ≤ μ,

−x if x< −μ.

φ8(μ, x) = x erf

 x

√2μ

 +

2

πμe2x2μ2.

From Fig.2, we see that the local behavior of all eight smoothing functions can be described as

φ3≤ φ2≤ φ1≤ |x| ≤ φ5≤ φ7≤ φ8≤ φ4≤ φ6. (20) In particular, three smoothing functionφ1,φ2,φ3approach to|x| from below with φ1≥ φ2≥ φ3. To the contrast, the other five smoothing functionsφ4,φ5,φ6,φ7,φ8

appraoch to|x| from above with φ5≤ φ7≤ φ8≤ φ4≤ φ6. Apparently, the smoothing functionφ1andφ5are closest to|x| among these smoothing functions.

Besides the geometric observation, we also provide algebraic analysis for (20).

Noting that each functionφi(μ, x), for i = 1, 2, . . . , 8, is symmetric, so we only need to prove (20) with x ≥ 0. To proceed, for fixed μ > 0, we let y = μx. The verifications consist of seven parts.

Part (1):φ3(μ, x) ≤ φ2(μ, x). To verify this inequality, we consider

f(y) = log

ey+ e−y 2



y2+ 1 + 1.

Then, we compute the derivation of f(y) as below:

f(y) = ey− e−y

ey+ e−yy y2+ 1

(10)

= e2y− 1

e2y+ 1− y y2+ 1

= 1 − 2

e2y+ 1 − 1 +

y 2+ 1 − y y2+ 1

= 1

y2+ 1

y2+ 1 + y − 2 e2y+ 1

= e2y− 1 − 2y2− 2y y2+ 1

e2y+ 1

y2+ 1

y2+ 1 + y.

For convenience, we denote g(y) = e2y− 1 − 2y2− 2y

y2+ 1. It is known that the function excan be expressed as

ex =

n=0

xn

n!, (21)

which indicates ex − 1 ≥ x +x22 +x63. Then, it follows that

g(y) ≥ 2y +(2y)2

2 +(2y)3

6 − 2y2− 2y y2+ 1

= 4y3 3 + 2y

 1−

y2+ 1



= 4y3

3 − 2y3

1+ y2+ 1

= 2y3

2

3 − 1

1+ y2+ 1



≥ 2y3

2 3−1

2



≥ 0, ∀y ≥ 0.

This implies that f(y) ≥ 0 for all y ≥ 0. Thus, f is monotonically nondecreasing which yields f(y) ≥ f (0) = 0. Then, we verify the assertion that φ3(μ, x) ≤ φ2(μ, x).

Part (2): φ2(μ, x) ≤ φ1(μ, x). In order to prove this inequality, we discuss two cases.

(i) For 0≤ x ≤ μ, this implies that 0 ≤ y ≤ 1. Considering

f(y) = y2

− log

ey+ e−y

(11)

yields that

f(y) = y −e2y− 1

e2y+ 1 = ye2y+ y − e2y+ 1 e2y+ 1 . By denoting g(y) := ye2y+ y − e2y+ 1 and using (21) leads to

g(y) = ye2y+ y − e2y+ 1

= y

n=0

(2y)n n! −

n=0

(2y)n

n! + y + 1

= y

n=0

(2y)n n!





n=0

(2y)n+1 (n + 1)!+ 1



+ y + 1

= y





n=0

(2y)n n!

 1− 2

n+ 1



+ y

≥ 0, ∀y ∈ [0, 1].

Therefor, we obtain that f(y) ≥ 0 for all y ∈ [0, 1].

(ii) For x > μ, this implies that y > 1. Considering

f(y) = y −1 2− log

ey+ e−y 2



gives

f(y) = 1 −e2y− 1 e2y+ 1 > 0.

To sum up, we obtain that f(y) ≥ 0 for all y ∈ [0, 1] in both cases. Following the same arguments as in part(1), we conclude thatφ2(μ, x) ≤ φ1(μ, x).

Part (3):φ1(μ, x) ≤ |x| and |x| ≤ φ5(μ, x). It is easy to verify these inequalities.

We omit the verification.

Part (4):φ5(μ, x) ≤ φ7(μ, x). We will prove this inequality by discussing three cases.

(i) For x> μ, it is easy to see that φ5(μ, x) = φ7(μ, x) = x.

(ii) For μ2 ≤ x ≤ μ, it means 12 ≤ y ≤ 1. Considering

f(y) = −y4 8 +3y2

4 +3 8 − y

= −y4+ 6y2− 8y + 6 8

= −(y2− 1)2+ 4(y − 1)2+ 3 8

(12)

= (y − 1)2

4− (y + 1)2 + 3 8

ad using the facts of 12≤ y ≤ 1 and94 ≤ (y + 1)2≤ 4, it follows that f (y) ≥ 0.

(iii) For 0≤ x < μ2, 0≤ y <12. Considering

f(y) = −y4 8 +3y2

4 +3

8− y2−1 4

= −y4 8 −y2

4 +1 8

= −y4− 2y2+ 1 8

= 2− (y2+ 1)2 8

and applying the facts 0 ≤ y < 12 and 1 ≤ (y2+ 1)22516, it follows that f(y) > 0. From all the above, we achieve that φ5(μ, x) ≤ φ7(μ, x).

Part (5):φ7(μ, x) ≤ φ8(μ, x). To proceed this assertion, we discuss two cases.

(i) For 0≤ x ≤ μ, this implies that 0 ≤ y ≤ 1. Consider

f(y) = yerf

 y

√2

 +

2

πey22 +y4 8 −3y2

4 −3 8.

By applying [35, Lemma 2.5], we have erf

y 2

 ≥



1− ey22

12

. Then, it implies that

f(y) ≥ y



1− ey22

1

2 +

2

πey22 + y4 8 −3y2

4 −3

8 := g(y).

It is easy to verify g(y) is monotonically decreasing on [0, 1] which indicates that g(y) ≥ g(1) > 0. Hence, we obtain f (y) > 0 and φ7(μ, x) ≤ φ8(μ, x) is proved.

(ii) For x > μ, it means y > 1. Consider

f(y) = yerf

 y

√2

 +

2

πey22 − y,

which yields f(y) = erf

y 2

− 1 < 0. Hence, f (y) is monotonically decreasing on[1, +∞). Moreover, using [35, Lemma 2.5] gives

f(y) ≥ y



1− ey22

1

2 +

2

ey22 − y.

(13)

Taking the limit in this inequality, we obtain limy→∞ f(y) ≥ 0. Therefore, f (y) ≥ limy→∞ f(y) ≥ 0, which show the assertion φ7(μ, x) ≤ φ8(μ, x).

Part (6):φ8(μ, x) ≤ φ4(μ, x). Consider

f(y) = log

1+ e−y + log

1+ ey

− yerf

 y

√2



2 πey22 . Then, we have

f(y) = ey− 1 ey+ 1− erf

 y

√2



= 1 −

 2

ey+ 1+ erf

 y

√2



:= 1 − g(y),

which says that

g(y) = − 2ey ey+ 1+ 2

π

√1 2ey22

= 2



ey ey+ 1

√2 2πey22



= 2

⎝−

√2πe√y+ (1 + ey)ey22 2π(1 + ey)

≤ 2

−√

2πey+ 1 + ey

√2π(1 + ey)



, as y ≥ 0,

= 2

(1 −

2π)ey+ 1

√2π(1 + ey)



≤ 2(1 −

2π + 1) < 0, as ey≥ 1, 1 −√ 2π < 0.

Hence, g(y) < g(0) = 1 which leads to f(y) > 0 for all y ≥ 0. Then, it follows that f(y) > f (0) = 2 log 2 −

π2 > 0, and hence φ8(μ, x) ≤ φ4(μ, x) is proved.

Part (7):φ4(μ, x) ≤ φ6(μ, x). Consider

f(y) =

4+ y2− log

1+ e−y + log

1+ ey , which gives

f(y) = y

4+ y2ey− 1 ey+ 1

= y

4+ y2− 1 + 2 ey+ 1

(14)

= y− 4+ y2 4+ y2 + 2

ey+ 1

=−4(1 + ey) + 2

4+ y2 y+

4+ y2 4+ y2

y+ 4+ y2

(1 + ey) .

For convenience, we denote g(y) := −2(1 + ey) +

4+ y2

 y+

4+ y2



= −2ey+ 2 + y2+ y 4+ y2.

Because ey > 1 + y + y22, it yields g(y) < −2y + y

4+ y2= y

 2−

4+ y2



≤ 0, ∀y ≥ 0.

This means that f(y) < 0, i.e., f (y) is monotonically decreasing on [0, +∞). On the other hand, we know that

ylim→∞ f(y) = lim

y→∞

4+ y2− log

1+ e−y + log

1+ ey

= lim

y→∞

4+ y2− y + y − log

1+ e−y + log

1+ ey

= lim

y→∞

4+ y2− y + lim

y→∞y− log

1+ e−y + log

1+ ey

= lim

y→∞y− log 1+ ey

= lim

y→∞log ey 1+ ey = 0.

Thus, f(y) ≥ limy→∞ f(y) = 0 which implies that φ4(μ, x) ≤ φ6(μ, x).

From Parts (1)–(7), the proof of (20) is complete.

Proposition 2.1 Letφi : R2 → R for i = 1, . . . , 6 be defined as in (3)–(5) and (15)–(19), respectively. Then, we have

(a) φi is continuously differentiable at(μ, x) ∈ R++× R;

(b) limμ↓0φi(μ, x) = |x|.

Proof The proof is straightforward and we omit it. 

Next, we recall some materials about first order differential equations (ODE):

˙w(t) = H(w(t)), w(t0) = w0∈ Rn, (22) where H : Rn → Rn is a mapping. We also introduce three kinds of stability that will be discussed later. These materials can be found in usual ODE textbooks. A point

(15)

w = w(t) is called an equilibrium point or a steady state of the dynamic system (22) if H(w) = 0. If there is a neighborhood ⊆ Rnofwsuch that H(w) = 0 and H(w) = 0 ∀w ∈ \{w}, then wis called an isolated equilibrium point.

Lemma 2.1 Suppose that H : Rn → Rn is a continuous mapping. Then, for any t0 > 0 and w0 ∈ Rn, there exists a local solutionw(t) to (22) with t ∈ [t0, τ) for someτ > t0. If, in addition, H is locally Lipschitz continuous at x0, then the solution is unique; if H is Lipschitz continuous inRn, thenτ can be extended to ∞.

Letw(t) be a solution to dynamic system (22). An isolated equilibrium pointw is Lyapunov stable if for anyw0 = w(t0) and any ε > 0, there exists a δ > 0 such thatw(t) − w < ε for all t ≥ t0andw(t0) − w < δ. An isolated equilibrium pointw is said to be asymptotic stable if in addition to being Lyapunov stable, it has the property thatw(t) → was t → ∞ for all w(t0) − w < δ. An isolated equilibrium pointwis exponentially stable if there exists aδ > 0 such that arbitrary pointw(t) of (22) with the initial conditionw(t0) = w0andw(t0) − w < δ is well defined on[0, +∞) and satisfies

w(t) − w ≤ ce−ωtw(t0) − w ∀t ≥ t0, where c> 0 and ω > 0 are constants independent of the initial point.

Let ⊆ Rnbe an open neighborhood of ¯w. A continuously differentiable function V : Rn → R is said to be a Lyapunov function at the state ¯w over the set  for Eq. (22) if

V( ¯w) = 0, V (w) > 0, ∀w ∈ \{ ¯w},

˙V (w) ≤ 0, ∀w ∈ \{ ¯w}.

The Lyapunov stability and asymptotical stability can be verified by using Lyapunov function, which is a useful tool for analysis.

Lemma 2.2 (a) An isolated equilibrium pointwis Lyapunov stable if there exists a Lyapunov function over some neighborhoodofw.

(b) An isolated equilibrium pointwis asymptotically stable if there exists a Lya- punov function over some neighborhoodofwsuch that ˙V(w) < 0, ∀w ∈

\{w}.

3 Neural network model for AVE

In order to design a suitable neural network for absolute value Eq. (1), the key step is to construct an appropriate energy function E(x) for which the global minimization x is simultaneously a solution of the AVE (1). One approach to constructing a desired energy function is the merit function method. The basic idea in this approach is to transform the AVE (1) into an unconstrained problem.

To this end, we define Hi : Rn+1→ Rn+1as Hi(μ, x) =

!μ

Ax+ Bi(μ, x) − b

"

, for μ ∈ R, and x ∈ Rn, (23)

(16)

wherei : Rn+1→ Rnis given by

i(μ, x) :=

⎢⎢

⎢⎣

φi(μ, x1) φi(μ, x2) ...φi(μ, xn)

⎥⎥

⎥⎦, for μ ∈ R, and x ∈ Rn, (24)

with various smoothing functionsφi : R2 → R that is introduced in Sect.2. Then, the AVE (1) can be transformed into an unconstrained optimization problem:

min (μ, x) = 1

2Hi(μ, x)2. (25)

Letw = (μ, x), the AVE (1) is equivalent to Hi(μ, x) = 0. It is clear that if w ∈ R++× Rnsolves Hi(w) = 0, then wsolves∇ (w) = 0. Applying the gradient approach to the minimization of the energy function (25), we obtain the system of differential equation:

du(t)

dt = −ρ ∇ (v(t), u(t)) = −ρ∇ Hi(v(t), u(t))THi(v(t), u(t)),

u(t0) = u0. (26)

where u0 = x0 ∈ Rn, v(t) = μ0e−t,ρ > 0 is a time scaling factor. In fact, let- ting τ = ρt leads to dudt(t) = ρdud(τ)τ . Hence, it follows from (26) that dud(τ)τ =

−∇(12Hi(w)2). In view of this, we set ρ = 1 in the subsequent analysis.

Assumption 3.1 The minimal singular value of the matrix A is strictly greater than the maximal singular value of the matrix B.

Proposition 3.1 The AVE (2) is uniquely solvable for any b∈ Rnif Assumption3.1is satisfied.

Proof Please see [9, Proposition 2.3] for a proof. 

Proposition 3.2 Leti(μ, x) for i = 1, . . . , 8 be defined as in (24). Then, we have (a) Hi(μ, x) = 0 if and only if x solves the AVE (2);

(b) Hi is continuously differentiable onRn\ {0} with the Jacobian matrix given by

∇ Hi(μ, x) := [A + B ∇i(μ, x)] , (27) where

i(μ, x) :=

⎢⎢

⎢⎢

⎢⎢

⎢⎢

∂φi(μ, x1)

∂x1

0 · · · 0

0 ∂φi(μ, x2)

∂x2 · · · 0

... ... ... ...

0 · · · 0 ∂φi(μ, xn)

⎥⎥

⎥⎥

⎥⎥

⎥⎥

.

(17)

Proof The arguments are straightforward and we omit them. 

Proposition 3.3 Let Hi and∇ Hi be given as in (23) and (27), respectively. Suppose that Assumption3.1holds. Then,∇ Hi(μ, x) is invertible at any x ∈ Rnandμ > 0.

Proof The result follows from Proposition3.2immediately. 

Proposition 3.4 Let : Rn→ R+be given by (25). Then, the following results hold.

(a) (x) ≥ 0, ∀(μ, t) ∈ R++× R and (μ, x) = 0 if and only if x solve the AVE (2).

(b) The function (x) is continuously differentiable on Rn+1\ {0} with

∇ (μ, x) = ∇ HTH(μ, x), where∇ H is the Jacobian of H(μ, x).

(c) The function (w(t)) is nonincreasing with respect to t.

Proof Parts (a)–(b) follow from Proposition3.2immediately.

For part (c), we observe that d (w(t))

dt =

)dw

dt , ∇ (μ, x)

*

= −ρ ∇ (μ, x), ∇ (μ, x)

= −ρ ∇ (μ, x)2< 0,

for all x∈ \{x}. Then, the desired result follows. 

4 Stability and existence

In this section, we first adress the relation between the solution of AVE (1) and the equilibrium point of neural network (26). Then, we discuss the issues of the stability and the the solution trajectory of the neural network (26).

Lemma 4.1 Let xbe a equilibrium of the neural network (26) and suppose that the singular values of A∈ Rn×nexceed 1. Then xsolves the system (1).

Proof Since ∇ ( (w)) = ∇ HiTHi(w) and from the Proposition3.3obtain∇ H is nonsingular. It is clear to see that

∇ (w)

= 0,

if and only if Hi(w) = 0. 

Theorem 4.1 (a) For any initial pointw0= w(t0), there exists a unique continuously maximal solutionw(t) with t ∈ [t0, τ) for the neural network (26).

(b) If the level setL(w0) :=+

w | Hi(w)2≤ H(w0)2,

is bounded, thenτ can be extended to∞.

(18)

Proof This proof is exactly the same as the one in [33, Proposition 3.4], so we omit it

here. 

Now, we are going to analyze the stability of an isolated equilibrium xof the neural network (26), which is to assert that∇ (x) = 0 and ∇ (x) = 0 for x ∈ \{x},

 is a neighborhood of x.

Theorem 4.2 If the singular values of A ∈ Rn×n exceed 1, then the isolated equi- librium xof the neural network (26) is asymptotically stable, and hence Lyapunov stable.

Proof We consider the Lyapunov function (w) :  → R defined by (25). First, it is clear that (x) ≥ 0 and from (a) of Proposition3.4we have (·) is continuously differentiable. Considering the singular values of A exceed 1 and Proposition3.3we obtain ∇ H(w) is nonsingular. Then applying ∇ H(x) and Lemma4.1, we have H(w) = 0 and (w) = 0. Furthermore, if (w) = 0 on , then H(w) = 0 and hence∇ = 0 on . This yields that w = wsincewis isolated.

Secondly, consider the (b) of Proposition3.4and Lemma2.2, it says the isolated equilibrium xis asymptotically stable, and hence is Lyapunov stable. 

Theorem 4.3 If the singular values of A∈ Rn×nexceed 1, then the isolated equilib- rium xof the neural network (26) is exponentially stable.

Proof The proof is routine and similar to that in the literature. For completeness, we include it again. Let = R++×Rn, it is clear that H(·) is continuously differentiable, which implies

H(w) = H(w) + ∇ H(w)T(w − w) + o(w − w), ∀ w ∈ . (28) Let g(t) := 12w(t) − w2and we compute the derivative of g(t) as below:

dg(t) dt =

dw dt

T

(w(t) − w) = −ρ

2∇(H(w)2)T(w(t) − w)

= −ρ (∇ H(w) · H(w))T(w(t) − w)

= −ρ H(w)T∇ H(w)T(w(t) − w)

= −ρ (w(t) − w)T∇ H(w)∇ H(w)T(w(t) − w)

−ρ o(w − w)T∇ H(w)T(w(t) − w),

where the last equality is due to (28). To proceed, we claim two assertions. First, we claim that (w − w)T∇ H(w)∇ H(w)T(w − w) ≥ κ||w − w||2, for some κ. To see this, from the Propositions 3.3 and 3.4, we know ∇ H(w) is nonsin- gular and H is a continuously differentiable function, which implies the matrix

∇ H(w)∇ H(w)T is symmetric and positive semi-definite. Hence, we have(w − w)T∇ H(w)∇ H(w)T(w − w) ≥ κ1w − w2 > 0 over \{w} for some κ1≥ 0. Then, by the continuity of ∇ H(·), we can conclude that

(w − w)T∇ H(w)∇ H(w)T(w − w) ≥ κw − w2> 0, for some κ ≥ 0.

(19)

Secondly, we claim that

−ρ o(w − w)T∇ H(w)T(w(t) − w) ≤ εw − w2, for some ε > 0.

This is because that

-- − ρo(w − w)T∇ H(w)T(w(t) − w)--

w − w2 ≤ ρ∇ H(w)

o(w − w)

w − w

 ,

where the right-hand side vanishes eventually. Thus, it yields that

−ρ o(w − w)T∇ H(w)T(w(t) − w) ≤ εw − w2, for some ε > 0.

Now, from the above two assertions and noting that g(t) = 12w(t) − w2, we have dg(t)

dt ≤ 2(−ρκ + ε)g(t), which gives

g(t) ≤ e2(−ρκ+ε)tg(t0).

Thus, we have

w(t) − w ≤ e(−ρκ+ε)tw(t0) − w,

which sayswis exponentially stable as we can setρ larger enough such that −ρκ +

ε < 0. Then, the proof is complete. 

5 Numerical results

In order to demonstrate the effectiveness of the proposed neural network, we test several examples for our neural network (26) in this section. The numerical imple- mentation is coded by Mathematica 11.3 and the ordinary differential equation solver adopted is NDSolve[ ], which uses an Runge–Kutta (2,3) formula. The initial point of each problems are selected by randomly and the initial point is same for different smoothing functions. The results are collected together in Tables1,2,3and4, where

φi Denotes the smoothing functionsφi, i = 1, . . . , 8 N Denotes the number of iterations

t Denotes the time when algorithm terminates

Er Denotes the value ofx(t) − x when algorithm terminates

H(xt) Denotes the value ofH(x(t)) = Ax − |x| − b when algorithm terminates C T Denotes the CPU time in seconds

(20)

Table 1 Computing results of Example5.1(dt= 0.2)

Function N t Er H(x0) CT

φ1 34 6.8 9.3686 × 10−7 0.0000136037 1.5090863

φ2 36 7.2 8.70587 × 10−7 0.0000126414 0.7760444

φ3 38 7.6 8.41914 × 10−7 0.000012225 0.4980285

φ4 2 0.4 2.90785 × 10−15 1.59872 × 10−14 0.0340019

φ5 2 0.4 1.11772 × 10−12 8.41527 × 10−12 0.0740043

φ6 10 2.0 7.52691 × 10−7 0.0000109295 0.1150066

φ7 2 0.4 1.29976 × 10−12 8.61527 × 10−12 0.0730042

φ8 34 6.8 9.3686 × 10−7 0.0000136037 0.6880393

Table 2 Computing results of Example5.2(dt= 0.2)

Function N T Er H(x0) CT

φ1 55 11 9.84216 × 10−7 2.03995 × 10−7 3.1531804

φ2 57 11.4 9.14593 × 10−7 1.89565 × 10−7 1.3690783

φ3 59 11.8 8.84473 × 10−7 1.83322 × 10−7 0.6920396

φ4 2 0.4 2.67859 × 10−9 4.86096 × 10−10 0.0370021

φ5 2 0.4 2.68658 × 10−9 4.87548 × 10−10 0.1510086

φ6 18 3.6 9.56666 × 10−7 2.57215 × 10−7 0.2210127

φ7 2 0.4 2.16413 × 10−9 3.92736 × 10−10 0.1600091

φ8 55 11 9.84216 × 10−7 2.03995 × 10−7 1.5320877

Example 5.1 Consider the following absolute value equation where

A=

⎜⎜

10 1 2 0

1 11 3 1

0 2 12 1

1 7 0 13

⎟⎟

⎠ , b =

⎜⎜

⎝ 12 15 14 20

⎟⎟

⎠ .

We can verify that one solution of the absolute value equations is x= (1, 1, 1, 1).

The parameterρ is set to be 1, time step is set to be dt = 0.2 and the initial point is generated by randomly. Table1summarizes the computing results for Example5.1.

From Table1, we see that nor matter from the trajectory convergence time, the error, or the computation time, the smoothing functionφ4,φ5,φ7perform significantly better than other functions. Figure3depicts the norm errorx(t) − x with various time.

This figure indicates the smoothing functionsφ4,φ5,φ7also outperform than others (it follows byφ6).

參考文獻

相關文件

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

Huang, A nonmonotone smoothing-type algorithm for solv- ing a system of equalities and inequalities, Journal of Computational and Applied Mathematics, vol. Hao, A new

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1

Since the generalized Fischer-Burmeister function ψ p is quasi-linear, the quadratic penalty for equilibrium constraints will make the convexity of the global smoothing function

Specifically, in Section 3, we present a smoothing function of the generalized FB function, and studied some of its favorable properties, including the Jacobian consistency property;

Specifically, in Section 3, we present a smoothing function of the generalized FB function, and studied some of its favorable properties, including the Jacobian consis- tency

Optim. Humes, The symmetric eigenvalue complementarity problem, Math. Rohn, An algorithm for solving the absolute value equation, Eletron. Seeger and Torki, On eigenvalues induced by

Recently, the paper [36] investigates a family of smoothing functions along with a smoothing-type algorithm to tackle the absolute value equation associated with second-order