2 Overview and Contributions of the Paper

(1)

to appear in Neurocomputing, 2019

Neural networks based on three classes of NCP-functions for solving nonlinear complementarity problems

Jan Harold Alcantara¹ Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan

Jein-Shan Chen ² Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan

February 6, 2019 (revised on April 15, 2019)

Abstract. In this paper, we consider a family of neural networks for solving nonlinear complementarity problems (NCP). The neural networks are constructed from the merit functions based on three classes of NCP-functions: the generalized natural residual function and its two symmetrizations. In this paper, we first characterize the stationary points of the induced merit functions. Growth behavior of the complementarity functions is also described, as this will play an important role in describing the level sets of the merit functions. In addition, the stability of the steepest descent-based neural network model for NCP is analyzed. We provide numerical simulations to illustrate the theoretical results, and also compare the proposed neural networks with existing neural networks based on other well-known NCP-functions. Numerical results indicate that the performance of the neural network is better when the parameter p associated with the NCP-function is smaller. The efficiency of the neural networks in solving NCPs is also reported.

Keywords. NCP-function; natural residual function; complementarity problem; neural network; stability.

1E-mail: 80640005s@ntnu.edu.tw

2Corresponding author. E-mail: jschen@math.ntnu.edu.tw. The research is supported by Ministry of Science and Technology, Taiwan.

(2)

1 Introduction and Motivation

Given a function F : IRⁿ → IRⁿ, the nonlinear complementarity problem (NCP) is to find a point x ∈ IRⁿ such that

x ≥ 0, F (x) ≥ 0, hx, F (x)i = 0, (1)

where h·, ·i is the Euclidean inner product and ≥ means the component-wise order on IRⁿ. Throughout this paper, we assume that F is continuously differentiable, and let F = (F1, . . . , Fn)^T with Fi : IRⁿ→ IR for i = 1, . . . , n.

For decades, substantial research efforts have been devoted in the study of nonlinear complementarity problems because of their wide range of applications in many areas such as optimization, operations research, engineering, and economics [8, 9, 12, 48]. Some source problems of NCPs include models of equilibrium problems in the aforementioned fields and complementarity conditions in constrained optimization problems [9, 12].

There are many methods in solving the NCP (1). In general, these solution methods may be categorized into two classes, depending on whether or not they make use of the so-called NCP-function (see Definition 2.1). Some techniques that usually exploit NCP-functions include merit function approach [11, 19, 26], nonsmooth Newton method [10, 45], smoothing methods [4, 31], and regularization approach [17, 37]. On the other hand, interior-point method [29, 30] and proximal point algorithm [33] are some well- known approaches to solve (1) which do not utilize NCP-functions in general. The excellent monograph of Facchinei and Pang [9] provides a thorough survey and discussion of solution methods for complementarity problems and variational inequalities.

The above numerical approaches can efficiently solve the NCP; however, it is often desirable in scientific and engineering applications to obtain a real-time solution.

One promising approach that can provide real-time solutions is the use of neural networks, which were first introduced in optimization by Hopfield and Tank in the 1980s [13, 38]. Neural networks based on circuit implementation exhibit real-time process- ing. Furthermore, prior researches show that neural networks can be used efficiently in linear and nonlinear programming, variational inequalities and nonlinear complementarity problems [2, 7, 14, 15, 20, 23, 42, 43, 44, 47, 49] and as well as in other fields [25, 28, 36, 34, 39, 40, 46, 50, 51, 55].

Motivated by the preceding discussion, we construct a new family of neural networks based on recently discovered discrete-type NCP-functions to solve NCPs. Neural networks based on the Fischer-Burmeister (FB) function [23] and the generalized Fischer- Burmeister function [2] have already been studied. The latter NCP-functions, which have been extensively used in the different solution methods, are strongly semismooth functions, which often provide efficient performance [9]. In this paper, we explore the use of smooth NCP-functions as building blocks of the proposed neural networks. Moreover, the NCP-functions we consider herein have piecewise-defined formulas, as opposed to the FB and generalized FB functions which have simple formulations. In turn, the subsequent

(3)

analysis is more complicated. Nevertheless, we show that the proposed neural networks may offer promising results too. The analysis and numerical reports in this paper, on the other hand, pave the way for the use of piecewise-defined NCP-functions.

This paper is organized as follows: In Section 2, we revisit equivalent reformulations of the NCP (1) using NCP-functions. We also elaborate on the purpose and limitations of the paper. In Section 3, we review some mathematical preliminaries related to nonlinear mappings and stability analysis. We also summarize some important properties of the three classes of NCP-functions we used in constructing the neural networks. In Section 4, we describe the general properties of the neural networks, which include the characterization of stationary points of the induced merit functions. In Section 5, we look at the growth behavior of the three classes of NCP-functions considered. This result will be used to prove the boundedness of the level sets of the induced merit functions. We also prove some stability properties of the neural networks. In Section 6, we present the results of our numerical simulations. Conclusions and some recommendations for future studies are discussed in Section 7.

Throughout the paper, IRⁿ denotes the space of n-dimensional real column vectors, IR^m×n denotes the space of m × n real matrices, and A^T denotes the transpose of A ∈ IR^m×n. For any differentiable function f : IRⁿ → IR, ∇f (x) means the gradient of f at x. For any differentiable mapping F = (F₁, . . . , F_m)^T : IRⁿ → IR^m, ∇F (x) = [∇F₁(x) · · · ∇F_m(x)] ∈ IR^n×m denotes the transposed Jacobian of F at x. We assume that p is an odd integer greater than 1, unless otherwise specified.

2 Overview and Contributions of the Paper

In this section, we give an overview of this research. We begin by looking at equivalent reformulations of the nonlinear complementarity problem (1) using NCP-functions, which is defined as follows.

Definition 2.1 A function φ : IR × IR → IR is called an NCP-function if it satisfies φ(a, b) = 0 ⇐⇒ a ≥ 0, b ≥ 0, ab = 0.

The well-known natural-residual function given by

φ_NR(a, b) = a − (a − b)₊ = min{a, b}

is an example of an NCP-function, which is widely used in solving NCP. Recently, in [3], the discrete-type generalization of φ_NR is proposed and described by

φ^p

NR(a, b) = a^p− (a − b)^p₊ where p > 1 is odd integer. (2) It is shown in [3] that φ^p_NR is twice continuously differentiable. However, its surface is not symmetric, which may result to difficulties in designing and analyzing solution methods

(4)

[16]. To conquer this, two symmetrizations of the φ^p

NR are presented in [1]. A natural symmetrization of φ^p

NR is given by φ^p

S−NR(a, b) =







a^p− (a − b)^p if a > b, a^p = b^p if a = b, b^p− (b − a)^p if a < b.

(3)

The above NCP-function is symmetric, but is only differentiable on {(a, b) | a 6= b or a = b = 0}. It was however shown in [16] that φ^p

S−NR is semismooth and is directionally differentiable. The second symmetrization of φ^p_NR is described by

ψ_S−NR^p (a, b) =







a^pb^p − (a − b)^pb^p if a > b, a^pb^p = a^2p if a = b, a^pb^p − (b − a)^pa^p if a < b,

(4)

which possesses both differentiability and symmetry. The functions φ^p_NR, φ^p_S−NR and ψ^p_S−NR are three classes of the four discrete-type families of NCP-functions which are recently discovered, together with the discrete-type generalization of the Fischer-Burmeister function given by

φ^p

D−FB(a, b) =p

x²+ y²p

− (x + y)^p. (5)

A comprehensive discussion of their properties is presented in [16].

To see how an NCP-function φ can be useful in solving NCP (1), we define Φ : IRⁿ→ IRⁿ by

Φ(x) =







φ(x1, F1(x)) ... φ(x_n, F_n(x))





. (6)

It is easy to see that x^∗ solves NCP (1) if and only if Φ(x^∗) = 0 (see also Proposition 4.1 (a)). Thus, the NCP is equivalent to the nonlinear system of equations Φ(x) = 0.

Meanwhile, if φ is an NCP-function, then ψ : IR × IR → IR₊ given by ψ(a, b) := 1

2|φ(a, b)|² (7)

is also an NCP-function. Accordingly, if we define Ψ : IRⁿ→ IR₊ by Ψ(x) =

n

X

i=1

ψ(x_i, F_i(x)) = 1

2kΦ(x)k², (8)

then the NCP can be reformulated as a minimization problem min_x∈IRⁿΨ(x). Hence, Ψ given by (8) is a merit function for the NCP, that is, its global minimizer coincides with the solution of the NCP. Consequently, it is only natural to consider the steepest descent-based neural network

dx(t)

dt = −ρ∇Ψ(x(t)), x(t₀) = x⁰, (9)

(5)

where ρ > 0 is a time-scaling factor. The above neural network (9) is also motivated by the ones considered in [23] and in [2], where the NCP functions used are the Fischer- Burmeister (FB) function given by

φ_FB(a, b) =√

a²+ b²− (a + b), (10)

and the generalized Fischer-Burmeister functions given by φ^p

FB(a, b) = k(a, b)k_p− (a + b) where p ∈ (1, +∞), (11) respectively. We aim to compare the neural networks based on the generalized natural- residual functions (2), (3) and (4) with the well-studied networks based on the FB functions (10) and (11).

One of the contributions of this paper lies on establishing the theoretical properties of the generalized natural residual functions. These are fundamental in designing NCP- based solution methods, and in this paper, we use the neural network approach. Basic properties of these functions are already presented in [16]. The purpose of this paper is to elaborate some more properties and applications of the newly discovered discrete-type classes of NCP-functions given by (2), (3) and (4). Specifically, we look at the properties of their induced merit functions given by (8). First, it is important for us to determine the correspondence between the solutions of NCP (1) and the stationary points of Ψ.

From the above discussion (also see Proposition 4.1(d)), we already know that an NCP solution is a stationary point. On the other hand, we also want to determine which stationary points of Ψ are solutions to the NCP. For certain NCP functions such as the Mangasarian and Solodov function [19], FB function [11] and generalized FB function [5], a stationary point of the merit function was shown to be a solution to the NCP when F is monotone or a P₀-function. It should be pointed out that these NCP-functions possess the following nice properties:

(P1) ∇_aψ(a, b) · ∇_bψ(a, b) ≥ 0 for all (a, b) ∈ IR²; and

(P2) For all (a, b) ∈ IR², ∇_aψ(a, b) = 0 ⇐⇒ ∇_bψ(a, b) = 0 ⇐⇒ φ(a, b) = 0.

However, these properties are not possessed by φ^p

NR, φ^p

S−NR and ψ^p

S−NR, which leads to some difficulties in the subsequent analysis. Hence, we seek for other conditions which will guarantee that a stationary point is an NCP solution. Furthermore, we also want to look at the growth behavior of the functions (2), (3) and (4). This will play a key role in characterizing the level sets of the induced merit functions. It must be noted that since the NCP functions φ^p

S−NR and ψ^p

S−NR are piecewise-defined functions, then the analyses of their growth behavior and the properties of their induced merit functions are more difficult, as compared with the commonly used FB functions (10) and (11) which have simple formulations.

Another purpose of this paper is to discuss the stability properties of the neural networks based on φ^p

NR, φ^p

S−NR and ψ^p

S−NR. We further look into different examples to

(6)

see the influence of p on the convergence of trajectories of the neural network to the NCP solution. Finally, we compare the numerical performance of these three types of neural networks with two well-studied neural networks based on the FB function [23] and generalized FB function [2].

We recall that a solution x^∗ is said to be degenerate if {i | x^∗_i = F_i(x^∗) = 0} is not empty. Note that if x^∗ is degenerate and φ is differentiable at x^∗, then ∇Φ(x^∗) is singular. Consequently, one should not expect a locally fast convergence of numerical methods based on smooth NCP-functions if the computed solution is degenerate [9, 18].

Because of the differentiability of φ^p

NR, φ^p

S−NR and ψ^p

S−NR on the feasible region of the NCP problem, it is also expected that the convergence of the trajectories of the neural network (9) to a degenerate solution could be slow. Hence, in this paper, we will give particular attention to nondegenerate NCPs.

3 Preliminaries

In this section, we review some special nonlinear mappings, some properties of φ^p_NR, φ^p_S−NR and ψ^p

S−NR, as well as some tools from stability theory in dynamical systems that will be crucial in our analysis. We begin with recalling concepts related to nonlinear mappings.

Definition 3.1 Let F = (F₁, . . . , F_n)^T : IRⁿ→ IRⁿ. Then, the mapping F is said to be (a) monotone if hx − y, F (x) − F (y)i ≥ 0 for all x, y ∈ IRⁿ.

(b) strictly monotone if hx − y, F (x) − F (y)i > 0 for all x, y ∈ IRⁿ and x 6= y.

(c) strongly monotone with modulus µ > 0 if hx − y, F (x) − F (y)i ≥ µkx − yk² for all x, y ∈ IRⁿ.

(d) a P₀-function if max

1≤i≤n xi6=yi

(x_i− y_i)(F_i(x) − F_i(y)) ≥ 0 for all x, y ∈ IRⁿ and x 6= y.

(e) a P -function if max

1≤i≤n(x_i− y_i)(F_i(x) − F_i(y)) > 0 for all x, y ∈ IRⁿ and x 6= y.

(f ) a uniform P -function with modulus κ > 0 if max

1≤i≤n(x_i−y_i)(F_i(x)−F_i(y)) ≥ κkx−yk², for all x, y ∈ IRⁿ.

From Definition 3.1, the following one-sided implications can be obtained:

F is strongly monotone =⇒ F is a uniform P -function =⇒ F is a P₀-function.

(7)

It is known that F is monotone (resp. strictly monotone) if and only if ∇F (x) is positive semidefinite (resp. positive definite) for all x ∈ IRⁿ. In addition, F is a P₀-function if and only if ∇F (x) is a P₀-matrix for all x ∈ IRⁿ; that is, its principal minors are nonnegative. Further, if ∇F (x) is a P -matrix (that is, its principal minors are positive) for all x ∈ IRⁿ, then F is a P -function. However, we point out that a P -function does not necessarily have a Jacobian which is a P -matrix.

The following characterization of P -matrices and P₀-matrices will be useful in our analysis.

Lemma 3.1 A matrix M ∈ IR^n×n is a P -matrix (resp. a P₀-matrix) if and only if whenever x_i(M x)_i ≤ 0 (resp. x_i(M x)_i < 0) for all i, then x = 0.

Proof. Please see [6]. 2

The following two lemmas summarize some properties of φ^p

NR, φ^p

S−NR and ψ^p

S−NR that will be useful in our subsequent analysis.

Lemma 3.2 Let p > 1 be an odd integer. Then, the following hold.

(a) The function φ^p_NR is twice continuously differentiable. Its gradient is given by

∇φ^p

NR(a, b) = p a^p−1− (a − b)^p−2(a − b)₊ (a − b)^p−2(a − b)₊

.

(b) The function φ^p

S−NR is twice continuously differentiable on the set Ω := {(a, b) | a 6=

b}. Its gradient is given by

∇φ^p_S−NR(a, b) = p [ a^p−1− (a − b)^p−1, (a − b)^p−1]^T if a > b, p [ (b − a)^p−1, b^p−1− (b − a)^p−1]^T if a < b.

Further, φ^p_S−NR is differentiable at (0, 0) with ∇φ^p_S−NR(0, 0) = [0, 0]^T. (c) The function ψ^p

S−NR is twice continuously differentiable. Its gradient is given by

∇ψ^p

S−NR(a, b) =







p [ a^p−1b^p− (a − b)^p−1b^p, a^pb^p−1− (a − b)^pb^p−1+ (a − b)^p−1b^p]^T if a > b, p [ a^p−1b^p, a^pb^p−1]^T = pa^2p−1[1 , 1 ]^T if a = b, p [ a^p−1b^p− (b − a)^pa^p−1+ (b − a)^p−1a^p, a^pb^p−1− (b − a)^p−1a^p]^T if a < b.

Proof. Please see [3, Proposition 2.2], [1, Propositions 2.2 and 3.2], and [16, Proposition 4.3]. 2

Lemma 3.3 Let p > 1 be a positive odd integer. Then, the following hold.

(8)

(a) If φ ∈ {φ^p

NR, φ^p

S−NR}, then φ(a, b) > 0 ⇐⇒ a > 0, b > 0. On the other hand, ψ^p

S−NR(a, b) ≥ 0 on IR². (b) ∇_aφ^p_NR(a, b) · ∇_bφ^p_NR(a, b)







> 0 on {(a, b) | a > b > 0 or a > b > 2a},

= 0 on {(a, b) | a ≤ b or a > b = 2a or a > b = 0},

< 0 otherwise,

∇_aφ^p

S−NR(a, b) · ∇_bφ^p

S−NR(a, b) > 0 on {(a, b) | a > b > 0}S{(a, b) | b > a > 0}, and

∇_aψ^p

S−NR(a, b) · ∇_bψ^p

S−NR(a, b) > 0 on the first quadrant IR²₊₊.

(c) If φ ∈ {φ^p_NR, φ^p_S−NR}, then ∇aφ(a, b) · ∇bφ(a, b) = 0 provided that φ(a, b) = 0. On the other hand, ψ^p

S−NR(a, b) = 0 ⇐⇒ ∇ψ^p

S−NR(a, b) = 0. In particular, we have

∇_aψ^p

S−NR(a, b) · ∇_bψ^p

S−NR(a, b) = 0 provided that ψ^p

S−NR(a, b) = 0.

Proof. Please see [16, Propositions 3.4, 4.5, and 5.4]. 2

Next, we recall some materials about first order differential equations (ODE):

˙x(t) = H(x(t)), x(t₀) = x⁰ ∈ IRⁿ (12) where H : IRⁿ → IRⁿ is a mapping. We also introduce three kinds of stability that we will consider later. These materials can be found in ODE textbooks; see [27].

Definition 3.2 A point x^∗ = x(t^∗) is called an equilibrium point or a steady state of the dynamic system (12) if H(x^∗) = 0. If there is a neighborhood Ω^∗ ⊆ IRⁿ of x^∗ such that H(x^∗) = 0 and H(x) 6= 0 ∀x ∈ Ω^∗\{x^∗}, then x^∗ is called an isolated equilibrium point.

Lemma 3.4 Assume that H : IRⁿ → IRⁿ is a continuous mapping. Then, for any t₀ ≥ 0 and x₀ ∈ IRⁿ, there exists a local solution x(t) for (12) with t ∈ [t₀, τ ) for some τ > t₀. If, in addition, H is locally Lipschitz continuous at x0, then the solution is unique; if H is Lipschitz continuous in IRⁿ, then τ can be extended to ∞.

Definition 3.3 (Stability in the sense of Lyapunov) Let x(t) be a solution for (12). An isolated equilibrium point x^∗ is Lyapunov stable if for any x₀ = x(t₀) and any ε > 0, there exists a δ > 0 such that kx(t) − x^∗k < ε for all t ≥ t₀ and kx(t₀) − x^∗k < δ.

Definition 3.4 (Asymptotic stability) An isolated equilibrium point x^∗ is said to be asymptotically stable if in addition to being Lyapunov stable, it has the property that x(t) → x^∗ as t → ∞ for all kx(t₀) − x^∗k < δ.

(9)

Definition 3.5 (Lyapunov function) Let Ω ⊆ IRⁿ be an open neighborhood of ¯x. A continuously differentiable function W : IRⁿ → IR is said to be a Lyapunov function at the state ¯x over the set Ω for equation (12) if







W (¯x) = 0, W (x) > 0, ∀x ∈ Ω\{¯x}.

dW (x(t))

dt = ∇W (x(t))^TH(x(t)) ≤ 0, ∀x ∈ Ω.

Lemma 3.5 (a) An isolated equilibrium point x^∗ is Lyapunov stable if there exists a Lyapunov function over some neighborhood Ω^∗ of x^∗.

(b) An isolated equilibrium point x^∗ is asymptotically stable if there is a Lyapunov function over some neighborhood Ω^∗ of x^∗ such that dW (x(t))

dt < 0 for all x ∈ Ω^∗\{x^∗}.

Definition 3.6 (Exponential stability) An isolated equilibrium point x^∗ is exponentially stable if there exists a δ > 0 such that arbitrary point x(t) of (12) with the initial condition x(t₀) = x₀ and kx(t₀) − x^∗k < δ is well-defined on [0, +∞) and satisfies

kx(t) − x^∗k₂ ≤ ce^−ωtkx(t₀) − x^∗k ∀t ≥ t₀, where c > 0 and ω > 0 are constants independent of the initial point.

The following result will also be helpful in our stability analysis.

Lemma 3.6 Let F be locally Lipschitzian. If all V ∈ ∂F (x) are nonsingular, then there is a neighborhood N (x) of x and a constant C such that for any y ∈ N (x) and any V ∈ ∂F (y), V is nonsingular and kV⁻¹k ≤ C

Proof. Please see [32, Propositions 3.1]. 2

4 Neural network model

In this section, we describe the properties of the neural network (9) based on the functions φ^p

NR, φ^p

S−NR and ψ^p

S−NR. Before this, we summarize first some important properties of Ψ as defined in (8) for general NCP-functions. Proposition 4.1 (a) is in fact Lemma 2.2 in [19]. On the other hand, Proposition 4.1(b) and (e) are true for all gradient systems (9).

Proposition 4.1 Let Ψ : IRⁿ → IR₊be defined as in (8), with φ being any NCP-function, and let ψ be as in (7). Suppose that F is continuously differentiable. Then,

(10)

(a) Ψ(x) ≥ 0 for all x ∈ IRⁿ. If the NCP (1) has a solution, x is a global minimizer of Ψ(x) if and only if x solves the NCP.

(b) Ψ(x(t)) is a nonincreasing function of t, where x(t) is a solution of (9).

(c) Let x ∈ IRⁿ, and suppose that φ is differentiable at (x_i, F_i(x)) for each i = 1, . . . , n.

Then

∇Ψ(x) = ∇_aψ(x, F (x)) + ∇F (x)∇_bψ(x, F (x)) (13) where

∇_aψ(x, F (x)) := [∇_aψ(x₁, F₁(x)), . . . , ∇_aψ(x_n, F_n(x))]^T ,

∇bψ(x, F (x)) := [∇bψ(x1, F1(x)), . . . , ∇bψ(xn, Fn(x))]^T .

(d) Let x be a solution to the NCP such that φ is differentiable at (x_i, F_i(x)) for each i = 1, . . . , n. Then, x is a stationary point of Ψ.

(e) Every accumulation point of a solution x(t) of neural network (9) is an equilibrium point.

Proof. (a) It is clear that Ψ ≥ 0. Notice that Ψ(x) = 0 if and only if Φ(x) = 0, which occurs if and only if φ(x_i, F_i(x)) = 0 for all i. Since φ is an NCP-function, this is equivalent to having x_i ≥ 0, F_i(x) ≥ 0 and x_iF_i(x) = 0. Thus, Ψ(x) = 0 if and only if x ≥ 0, F (x) ≥ 0 and hx, F (x)i = 0. This proves part (a).

(b) The desired result follows from dΨ(x(t))

dt = ∇Ψ(x(t))^Tdx

dt = ∇Ψ(x(t))^T(−ρ∇Ψ(x(t))) = −ρk∇Ψ(x(t))k² ≤ 0 for all solutions x(t).

(c) The formula is clear from chain rule.

(d) First, note that from equation (7), we have ∇ψ(a, b) = φ(a, b) · ∇φ(a, b). Thus, if x is a solution to the NCP, it gives ∇ψ(x_i, F_i(x)) = 0 for all i = 1, . . . , n. Then, it follows from formula (13) in part(c) that ∇Ψ(x) = 0. That is, x is a stationary point of Ψ.

(e) Please see page 232 of [41]. 2

We adopt the neural network (9) with Ψ(x) = ¹₂kΦ(x)k², where Φ is given by (6) with φ ∈ {φ^p_NR, φ^p_S−NR, ψ_S−NR^p }. The function Φ corresponding to φ^p_NR, φ^p_S−NR and ψ_S−NR^p is denoted, respectively, by Φ^p

NR, Φ^p

S1−NR and Φ^p

S2−NR. Their corresponding merit functions will be denoted by Ψ^p

NR, Ψ^p

S1−NR and Ψ^p

S2−NR, respectively. We note that by formula (13) and the differentiability of Ψ ∈ {Ψ^p_NR, Ψ^p_S1−NR, Ψ^p_S2−NR} (see Proposition 4.2), the neural network (9) can be implemented on hardware as in Figure 1.

We first establish the existence and uniqueness of the solutions of neural network (9).

(11)

Figure 1: Simplified block diagram for neural network (9). This figure is lifted from Chen et al. [2]

Proposition 4.2 Let p > 1 be an odd integer. Then, the following hold.

(a) Ψ^p

NR and Ψ^p

S2−NR are both continuously differentiable on IRⁿ.

(b) Ψ^p_S1−NR is continuously differentiable on the open set Ω = {x ∈ IRⁿ| x_i 6= F_i(x), ∀i = 1, 2, · · · , n}.

Consequently, the neural network (9) with Ψ^p

NR or Ψ^p

S2−NR has a unique solution for all x⁰ ∈ IRⁿ. The neural network (9) with Ψ^p

S1−NR has a unique solution for all x⁰ ∈ Ω.

Proof. Part (a) and (b) directly follow from Proposition 4.1(c) and Lemma 3.2. The existence and uniqueness of the solutions follows from Lemma 3.4, noting the continuous differentiability of F and Ψ^p_NR, Ψ^p_S1−NR (on Ω), and Ψ^p_S2−NR. 2

We note that because of Proposition 4.2(b), we only consider the neural network (9) with Ψ = Ψ^p

S1−NR as a dynamical system defined on the set Ω. Our next goal is to determine the conditions such that equilibrium points of (9) are global minimizers of Ψ.

When an NCP-function has properties (P1) and (P2) (see Introduction), an equilibrium point is a global minimizer when F is a P0-function. However, these properties only hold on a proper subset of IRⁿfor the functions φ^p

NR, φ^p

S−NR and ψ^p

S−NR. Thus, we seek for other conditions to achieve the goal. We start with the merit function Ψ^p

NR.

(12)

Proposition 4.3 If F is strongly monotone with modulus µ > 1, then every stationary point of Ψ^p

NR is a global minimizer.

Proof. Let x^∗ be a stationary point of Ψ^p_NR, that is, ∇Ψ^p_NR(x^∗) = 0. For convenience, we denote by A(x^∗) and B(x^∗) the diagonal matrices such that for each i = 1, . . . , n,

A_ii(x^∗) = (x^∗_i)^p−1 and B_ii(x) = (x^∗_i − F_i(x^∗))^p−2(x^∗_i − F_i(x^∗))₊. Then, by formula (13) and Lemma 3.2 (b), we have

p[A(x^∗) − B(x^∗)]Φ^p

NR(x^∗) + p∇F (x^∗)B(x^∗)Φ^p

NR(x^∗) = 0, which yields

A(x^∗)Φ^p_NR(x^∗) + (∇F (x^∗) − I)B(x^∗)Φ^p_NR(x^∗) = 0. (14) Analogous to the technique in [11], pre-multiplying both sides of (14) by (B(x^∗)Φ^p

NR(x^∗))^T leads to

Φ^p

NR(x^∗)^T[B(x^∗)A(x^∗)]Φ^p

NR(x^∗) + (B(x^∗)Φ^p

NR(x^∗))^T(∇F (x^∗) − I)B(x^∗)Φ^p

NR(x^∗) = 0. (15) Since p is odd integer, we have A(x^∗) ≥ 0 and B(x^∗) ≥ 0; and hence,

Φ^p

NR(x^∗)^T[B(x^∗)A(x^∗)]Φ^p

NR(x^∗) ≥ 0.

On the other hand, since F is strongly monotone with modulus µ > 1, defining G(x) :=

F (x) − x gives

hx − y, G(x) − G(y)i = hx − y, F (x) − x − F (y) + yi

= hx − y, F (x) − F (y)i − kx − yk²

≥ (µ − 1)kx − yk²

> 0,

for all x, y ∈ IRⁿ. Note then that ∇G(x) = ∇F (x) − I is positive definite. Consequently, each term of the left-hand side of (15) is non-negative. With (∇F (x^∗) − I) being positive definite, it yields B(x^∗)Φ^p

NR(x^∗) = 0. In addition, from (14), we have A(x^∗)Φ^p

NR(x^∗) = 0.

To sum up, we have proved that A_ii(x^∗)φ^p_NR(x^∗_i, F_i(x^∗)) = 0 and B_ii(x^∗)φ^p_NR(x^∗_i, F_i(x^∗)) = 0 for all i.

Now, if φ^p

NR(x^∗_i, F_i(x^∗)) 6= 0 for some i, then we must have A_ii(x^∗) = B_ii(x^∗) = 0. Thus, (x^∗_i)^p−1 = 0 (i.e., x^∗_i = 0), and x^∗_i ≤ Fi(x^∗). Since φ^p_NR is an NCP-function, the latter implies that φ^p

NR(x_i, F_i(x^∗)) = 0. Hence, φ^p

NR(x_i, F_i(x^∗)) = 0 for all i, that is, x^∗ is a global minimizer of Ψ^p

NR. This completes the proof. 2

The following proposition provides a weaker condition on F to guarantee that a stationary point of Ψ^p

(13)

Proposition 4.4 If (∇F − I) is a P -matrix, then every stationary point of Ψ^p

Proof. Suppose that ∇Ψ^p

NR(x^∗) = 0. If B(x^∗)Φ^p

NR(x^∗) = 0, then A(x^∗)Φ^p

NR(x^∗) = 0 by equation (14). As in the preceding proof, we obtain Φ^p

NR(x^∗) = 0, and hence we are done.

It remains to consider another case that B(x^∗)Φ^p

NR(x^∗) 6= 0. Note that (B(x^∗)Φ^p_NR(x^∗))_i

= (x^∗_i − F_i(x^∗))^p−2(x^∗_i − F_i(x^∗))₊φ^p

NR(x^∗_i, F_i(x^∗))

=

0 if x^∗_i ≤ F_i(x^∗) or x^∗_i > F_i(x^∗) = 0, (x^∗_i − F_i(x^∗))^p−1φ^p

NR(x^∗_i, F_i(x^∗)) if x^∗_i > F_i(x^∗) and F_i(x^∗) 6= 0.

Thus, the nonzero entries of B(x^∗)Φ^p

NR(x^∗) appear at indices i where x^∗_i > F_i(x^∗) and F_i(x^∗) 6= 0. To proceed, we denote

I1 = {i | x^∗_i 6= 0 and (B(x^∗)Φ^p_NR(x^∗))i 6= 0}, I₂ = {i | x^∗_i = 0 and (B(x^∗)Φ^p_NR(x^∗))_i 6= 0}.

With these notations, we observe the following facts.

(i) For i ∈ I₁, since p is odd, it is clear that the i-th entry of A(x^∗)Φ^p

NR(x^∗) and B(x^∗)Φ^p

NR(x^∗)) are both nonzero and have the same sign.

(ii) For i ∈ I₂, then (B(x^∗)Φ^p

NR(x^∗))_i 6= 0 and (A(x^∗)Φ^p

NR(x^∗))_i = 0.

Because (∇F − I) is a P -matrix, it follows from Lemma 3.1 that there exists an index j such that

(B(x^∗)Φ^p

NR(x^∗))_j[(∇F (x^∗) − I)(B(x^∗)Φ^p

NR(x^∗))]_j > 0.

This says that (B(x^∗)Φ^p_NR(x^∗))_j 6= 0 and therefore j ∈ I₁∪ I₂. Note that by (i) above, (A(x^∗)Φ^p

NR(x^∗))_i and (B(x^∗)Φ^p

NR(x^∗))_i have the same sign if j ∈ I₁ which will contradict equation (14). On the other hand, if j ∈ I₂, we have from fact (ii) that (A(x^∗)Φ^p

NR(x^∗))_j = 0. However, we also have that [(∇F (x^∗)−I)(B(x^∗)Φ^p

NR(x^∗))]_j 6= 0. This certainly violates equation (14). Thus, we conclude that B(x^∗)Φ^p_NR(x^∗) = 0, and hence Φ^p_NR(x^∗) = 0. Then, the proof is complete. 2

Remark 4.1 In fact, if the function F is nonnegative (or if we at least have F (x^∗) ≥ 0 for an equilibrium point x^∗), then case (ii) in the above proof cannot happen. Thus, the above theorem is valid even when (∇F − I) is a P0-matrix by Lemma 3.1.

From Lemma 2.2(b) and Lemma 2.2(c), we see that the structures of ∇Φ^p

S1−NR and

∇Φ^p_S2−NR corresponding to the NCP-functions φ^p_S−NR and ψ^p_S−NR are complex because of the piecewise nature of φ^p

S−NR and ψ^p

S−NR. This makes it difficult to find conditions on F so that a stationary point of Ψ^p

S1−NR or Ψ^p

S2−NR is also a global minimizer. However, if F is a nonnegative function, we have the following proposition.

(14)

Proposition 4.5 Suppose that F is a nonnegative P₀-function and x^∗ ≥ 0. If x^∗ is a stationary point of Ψ^p

S1−NR or Ψ^p

S2−NR, then it is a global minimizer.

Proof. If we can show that properties (P1) and (P2) mentioned in the Introduction Section hold for φ^p

S−NR and ψ^p

S−NR on the nonnegative quadrant IR²₊, then we can proceed as in the proof of [5, Proposition 3.4]. Thus, it is enough to show that (P1) and (P2) hold on IR²₊. To simplify our notations, we denote φ1 = φ^p_S−NR, φ2 = ψ^p_S−NR, and ψi =

1

2|φ_i|² (i = 1, 2). Note that the domain of ∇Ψ^p

S1−NR is {x | x_i 6= F_i(x) or x_i = F_i(x) = 0}.

Thus, for ψ₁, it suffices to check that it has properties (P1) and (P2) only on the set {(a, b) ∈ IR²₊| a 6= b or a = b = 0}.

To proceed, we observe that

∇_aψ_i(a, b) = φ_i(a, b)∇_aφ_i(a, b) and ∇_bψ_i(a, b) = φ_i(a, b)∇_bφ_i(a, b), which imply

∇_aψ_i(a, b) · ∇_bψ_i(a, b) = (φ_i(a, b))²· ∇_aφ_i(a, b) · ∇_bφ_i(a, b), i = 1, 2.

If a ≥ b = 0 or b ≥ a = 0, then φ_i(a, b) = 0; and thus, the above product is zero.

Otherwise, the above product is positive by Lemma 3.3(b). This asserts (P1).

To show (P2), note that it is obvious that ∇_aψ_i(a, b) = ∇_bψ_i(a, b) = 0 if φ_i(a, b) = 0 for i = 1, 2.

To show the converse, it is enough to argue that if ∇_aφ_i(a, b) = 0 or ∇_bφ_i(a, b) = 0, then φ_i(a, b) = 0. First, we analyze the case for φ₁. Suppose that ∇_aφ₁(a, b) = 0. From Lemma 2.2(c),

1

p∇_aφ₁(a, b) =







a^p−1− (a − b)^p−1 if a > b

0 if a = b = 0

(b − a)^p−1 if a < b

For a = b = 0, then φ₁(a, b) = 0. For a > b, then a = |a − b| = a − b since p is an odd integer. Thus, b = 0 and because a > b, we obtain φ₁(a, b) = 0. For a < b, we have from (4) that (b − a)^p−1= 0, which is impossible. This proves that ∇aφ1(a, b) = 0 implies that φ₁(a, b) = 0. Similarly, we can show that ∇_bφ₁(a, b) = 0 implies that φ₁(a, b) = 0. This asserts (P2) for the function ψ₁.

Analogously, for ψ2, assume that ∇aφ2(a, b) = 0. From Lemma 2.2(d), we have 1

p∇_aφ₂(a, b) =







a^p−1b^p− (a − b)^p−1b^p if a > b,

a^2p−1 if a = b,

a^p−1b^p− (b − a)^pa^p−1+ (b − a)^p−1a^p if a < b.

For a = b, then a^2p−1 = 0, and hence a = 0 and φ₂(a, b) = 0. For a > b, then a^p−1b^p− (a − b)^p−1b^p = 0. For b = 0, we obtain φ₂(a, b) = 0 by using a > b. Otherwise,

(15)

a^p−1 − (a − b)^p−1 = 0. Because p is odd and a > b, we have a = |a − b| = a − b.

consequently, b = 0 and φ₂(a, b) = 0. For a < b, then we have from the above formula for ∇_aφ₂ that a^p−1b^p− (b − a)^pa^p−1+ (b − a)^p−1a^p = 0. For a = 0, then φ₂(a, b) = 0 due to a < b. Otherwise, a > 0 and

0 = b^p− (b − a)^p+ (b − a)^p−1a

= b^p− (b − a)^p−1(b − 2a)

= (a + k)^p− k^p−1(k − a) where k = b − a > 0

=

p−1

X

i=0

p i

a^p−ikⁱ+ ak^p−1

> 0

which is a contradiction. To sum up, we have shown that ∇_aφ₂(a, b) = 0 implies that φ₂(a, b) = 0. Similarly, it can be verified φ₂(a, b) = 0 provided ∇_bφ₂(a, b) = 0. Thus, ψ₂ possesses the property (P2). This completes the proof. 2

5 Stability Analysis

We now look at the properties of the neural network (9) related to the behavior of its solutions. We have the following consequences, which easily follow from Proposition 4.1(a), Proposition 4.1(d), Proposition 4.4, and Proposition 4.5.

Proposition 5.1 Consider the neural network (9) with Ψ ∈ {Ψ^p

NR, Ψ^p

S1−NR, Ψ^p

S2−NR}.

(a) Every solution of the NCP is an equilibrium point.

(b) If (∇F − I) is a P -matrix, then every equilibrium point of (9) with Ψ = Ψ^p

NR solves the NCP.

(c) If F is a nonnegative P₀-function, every equilibrium point x^∗ ≥ 0 of (9) with Ψ ∈ {Ψ^p_S1−NR, Ψ^p_S2−NR} solves the NCP.

Theorem 5.1 below addresses the boundedness of the level sets of Ψ and convergence of the trajectories of the neural network. Before we state this theorem, we need the following lemma.

Lemma 5.1 Let {(a^k, b^k)}^∞_k=1 ⊆ IR² such that |a^k| → ∞ and |b^k| → ∞ as k → ∞.

Then, |φ^p

NR(a^k, b^k)| → ∞, |φ^p

S−NR(a^k, b^k)| → ∞, and |ψ^p

S−NR(a^k, b^k)| → ∞.

(16)

Proof. (a) First, we verify that |φ^p

S−NR(a^k, b^k)| → ∞. To proceed, we consider three cases.

(i) Suppose a^k → ∞ and b^k → ∞. Note that for all x ∈ [−1, 0] and n ∈ N, there holds (1 + x)ⁿ ≤ (1 − nx)⁻¹

which is a useful inequality. Thus, when a > b > 0, we have φ^p_S−NR(a, b) = a^p− (a − b)^p = a^p− a^p

1 − b

a

p

≥ a^p− a^p

1 − p

−b a

−1

= a^p− a^p

a

a + pb

= pa^pb a + pb

≥ pa^p−1b 1 + p

≥ pb^p 1 + p.

Similarly, φ^p_S−NR(a, b) ≥ _1+p^pa^p for b > a > 0. Thus, φ^p_S−NR(a^k, b^k) → ∞ as k → ∞.

(ii) Suppose a^k → −∞ and b^k → −∞. Observe that φ^p_S−NR(a, b) ≤ a^p when a > b, and φ^p

S−NR(a, b) ≤ b^p when a < b. Thus, φ^p

S−NR(a^k, b^k) → −∞ as k → ∞.

(iii) Suppose a^k → ∞ and b^k → −∞. For a > 0 and b < 0, we have (a − b)^p ≥ a^p+ (−b)^p = a^p − b^p.

Thus, φ^p

S−NR(a, b) = a^p − (a − b)^p ≤ b^p and we conclude that φ^p

S−NR(a^k, b^k) → −∞ as k → ∞. In the case that a^k → −∞ and b^k → ∞, we also have φ^p

S−NR(a^k, b^k) → −∞ as k → ∞ by symmetry of φ^p

S−NR.

(b) Next, we show that |φ^p_NR(a^k, b^k)| → ∞. Again, we consider three cases.

(i) Suppose that a^k → −∞. Since φ^p

NR(a, b) = a^p− (a − b)^p₊ ≤ a^p for all (a, b) ∈ IR², it is trivial to see that φ^p

NR(a^k, b^k) → −∞.

(ii) Suppose that a^k → ∞ and b^k → ∞. For a > b > 0, then we have φ^p

NR(a, b) = φ^p

S−NR(a, b) ≥ pb^p 1 + p. For 0 ≤ a < b, it is clear that φ^p

NR(a, b) = a^p. Then, we conclude that φ^p

NR(a^k, b^k) → ∞.

(17)

(iii) Suppose that a^k → ∞ and b^k → −∞. For a > 0 and b < 0, we have φ^p

NR(a, b) = φ^p

S−NR(a, b) ≤ b^p

and so φ^p_NR(a^k, b^k) → −∞. Thus, we have proved that |φ^p_NR(a^k, b^k)| → ∞.

(c) The last limit, |ψ^p

S−NR(a^k, b^k)| → ∞, follows from the fact that

ψ^p

S−NR(a, b) =







φ^p_S−NR(a, b)b^p if a > b, a^pb^p = a^2p if a = b, φ^p

S−NR(b, a)a^p if a < b.

and the inequalities obtained above for φ^p

S−NR. 2

Theorem 5.1 Let F be a uniform P -function and let Ψ ∈ {Ψ^p

NR, Ψ^p

S1−NR, Ψ^p

S2−NR}.

(a) The level sets L(Ψ, γ) := {x ∈ IRⁿ| Ψ(x) ≤ γ} of Ψ are bounded for any γ ≥ 0.

Consequently, the trajectory x(t) through any initial condition x⁰ ∈ IRⁿ is defined for all t ≥ 0.

(b) The trajectory x(t) of (9) through any x⁰ ∈ IRⁿ converges to an equilibrium point.

Proof. (a) Suppose otherwise. Then, there exists a sequence {x^k}^∞_k=1 ⊆ L(Ψ, γ) such that kx^kk → ∞ as k → ∞. A similar argument as in [10] shows that there exists an index i such that |x^k_i| → ∞ and |F_i(x^k)| → ∞ as k → ∞. By Lemma 5.1, we have |φ(x^k_i, F_i(x^k))| → ∞, where φ ∈ {φ^p_NR, φ^p_S−NR, ψ_S−NR^p }. But, this is impossible since Ψ(x^k) ≤ γ for all k. Thus, the level set L(Ψ, γ) is bounded. The remaining part of the theorem can be proved similar to Proposition 4.2(b) in [2].

(b) From part(a), the level sets of Ψ are compact and so by LaSalle’s Invariance Principle [22], we reach the desired conclusion. 2

Theorem 5.2 Suppose x^∗ is an isolated equilibrium point of (9). Then, x^∗ is asymptotically stable provided that either

(i) Ψ = Ψ^p

NR and (∇F − I) is a P -matrix; or (ii) Ψ ∈ {Ψ^p

S1−NR, Ψ^p

S2−NR}, F is a nonnegative P₀-function, and the equilibrium point is nonnegative.

Proof. Let x^∗ be an isolated equilibrium point of (9). Then, it has a neighborhood O such that

∇Ψ(x^∗) = 0 and ∇Ψ(x) 6= 0 for all x ∈ O\{x^∗}.

We claim that Ψ is a Lyapunov function at x^∗ over Ω. To proceed, we note first that Ψ(x) ≥ 0. By Proposition 5.1(b) and (c), Ψ(x^∗) = 0. Further, if Ψ(x) = 0 for some

(18)

x ∈ O\{x^∗}, then x solves the NCP and by Proposition 5.1(a), it is an equilibrium point.

This contradicts the isolation of x^∗. Thus, Ψ(x) > 0 for all x ∈ O\{x^∗}. Finally, it is clear that

dΨ(x(t))

dt = −ρk∇Ψ(x(t))k² < 0

over the set O\{x^∗}. Then, applying Lemma 3.5 yields that x^∗ is asymptotically stable.

2

We look now at the exponential stability of the neural network.

Theorem 5.3 Consider the neural network (9) with Ψ ∈ {Ψ^p

NR, Ψ^p

S1−NR, Ψ^p

S2−NR}. If

∇Φ(x^∗) is nonsingular for some isolated equilibrium point x^∗, then x^∗ solves the NCP and x^∗ is exponentially stable.

Proof. Let x^∗ be an equilibrium point such that ∇Φ(x^∗) is nonsingular. Note that

∇Ψ(x^∗) = ∇Φ(x^∗)Φ(x^∗), and so ∇Ψ(x^∗) = 0 implies that Φ(x^∗) = 0. This proves the first claim of the theorem. Further, using Ψ as a Lyapunov function as in the preceding theorem, x^∗ is asymptotically stable.

Note that since Φ is differentiable at x^∗, we have

Φ(x) = ∇Φ(x)^T(x − x^∗) + o(kx − x^∗k) as x → x^∗ (16) By Lemma 3.6, there exists δ > 0 and a constant C such that ∇Φ(x) is nonsingular for all x with kx − x^∗k < δ, and k∇Φ(x)⁻¹k ≤ C. Then, it gives

κkyk² ≤ k∇Φ(x)yk² (17)

for any x in the δ-neighborhood (call it Nδ) and any y ∈ IRⁿ, where κ = 1/C².

Let ε < 2ρκ. Since x^∗ is asymptotically stable, we may choose δ small enough so that o(kx − x^∗k²) < εkx − x^∗k² and x(t) → x^∗ as t → ∞ for any initial condition x(0) ∈ N_δ. Now, define g : [0, ∞) → IR by

g(t) := kx(t) − x^∗k²

where x(t) is the unique solution through x(0) ∈ N_δ. Using equations (16) and (17), we obtain

dg(t)

dt = 2(x(t) − x^∗)^Tdx(t) dt

= −2ρ(x(t) − x^∗)^T∇Ψ(x(t))

= −2ρ(x(t) − x^∗)^T∇Φ(x(t))Φ(x(t))

= −2ρ(x(t) − x^∗)^T∇Φ(x(t))∇Φ(x)^T(x(t) − x^∗) + o(kx(t) − x^∗k²)

≤ (−2ρκ + ε)kx(t) − x^∗k²

= (−2ρκ + ε)g(t).

(19)

Then, it follows that g(t) ≤ e^{(−2ρκ+ε)t}g(0), which says

kx(t) − x^∗k ≤ e^{(−ρκ+ε/2)t}kx(0) − x^∗k,

where −ρκ + ε/2 < 0. This proves that x^∗ is exponentially stable. 2

6 Simulation Results

In this section, we look at some nonlinear complementarity problems and test them using the neural network (9) with Ψ ∈ {Ψ^p_NR, Ψ^p_S1−NR, Ψ^p_S2−NR}. We also compare the rate of convergence of each network for different values of p. Further, we compare the numerical performance of these networks with the neural network based on the Fischer-Burmeister (FB) function [23] given by (10) and the neural network based on the generalized Fischer- Burmeister function [2] given by (11).

In the following simulations, we use the Matlab ordinary differential equation solver ode23s. Recall that ρ is a time-scaling parameter. In particular, if we wish to achieve faster convergence, a higher value of ρ can be used. In our simulations, the values of ρ used are 10³, 10⁶ or 10⁹, as indicated in the figures. The stopping criterion in simulating the trajectories is k∇Ψ(x(t))k ≤ 10⁻⁵.

Example 6.1 [21, Kojima-Shindo] Consider the NCP, where F : IR⁴ → IR⁴ is given by

F (x) =







3x²₁ + 2x₁x₂+ 2x²₂+ x₃+ 3x₄− 6 2x²₁+ x₁+ x²₂+ 3x₃+ 2x₄− 2 3x²₁ + x₁x₂+ 2x²₂+ 2x₃+ 3x₄− 1

x²₁+ 3x²₂ + 2x3+ 3x4− 3





 .

This is a non-degenerate NCP and the solution is x^∗ = (√

6/2, 0, 0, 1/2).

We simulate the network (9) with different Ψ ∈ {Ψ^p

NR, Ψ^p

S1−NR, Ψ^p

S2−NR} for various values of p to see the influence of p on convergence of trajectories to the NCP solution.

From Figures 2-4, we see that a smaller value of p yields a faster convergence when the initial condition is x⁰ = (2, 0.5, 0.5, 1.5)^T. Figure 5 depicts the comparison of the different NCP-functions with p = 3, together with the FB and generalized FB functions. Among these five classes of NCP-functions, we see that the neural network based on φ^p

S−NR has the best numerical performance. In Figure 6, we simulate the neural network based on φ^p_S−NR using 6 random initial points, and the trajectories converges to x^∗ at around t = 5.5 ms. One can also observe from Figure 6 that the convergence of x₂(t) and x₃(t) is very fast. We note that ∇Φ^p

S1−NR(x^∗) is non-singular, which leads to the exponential stability of x^∗ by Theorem 5.3. This particular problem was also simulated using neural networks