Unified smoothing functions for absolute value equation associated with second-order cone

(1)

Applied Numerical Mathematics, vol. 135, January, pp. 206-227, 2019

Unified smoothing functions for absolute value equation associated with second-order cone

Chieu Thanh Nguyen ¹ Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan.

B. Saheya²

College of Mathematical Science Inner Mongolia Normal University Hohhot 010022, Inner Mongolia, P. R. China

Yu-Lin Chang³ Department of Mathematics National Taiwan Normal University

Jein-Shan Chen ⁴ Department of Mathematics National Taiwan Normal University

February 19, 2018 (revised on July 12, 2018)

Abstract In this paper, we explore a unified way to construct smoothing functions for solving the absolute value equation associated with second-order cone (SOCAVE).

Numerical comparisons are presented, which illustrate what kinds of smoothing functions

1E-mail:[email protected].

2E-mail: [email protected]. The author’s work is supported by Natural Science Foundation of Inner Mongolia (Award Number: 2017MS0125).

3E-mail:[email protected].

4Corresponding author. E-mail:[email protected]. The author’s work is supported by Min- istry of Science and Technology, Taiwan.

(2)

work well along with the smoothing Newton algorithm. In particular, the numerical experiments show that the well known loss function widely used in engineering community is the worst one among the constructed smoothing functions, which indicates that the other proposed smoothing functions can be employed for solving engineering problems.

Keywords. Second-order cone, absolute value equations, smoothing Newton algorithm.

1 Introduction

Recently, the paper [36] investigates a family of smoothing functions along with a smoothing- type algorithm to tackle the absolute value equation associated with second-order cone (SOCAVE) and shows the efficiency of such approach. Motivated by this article, we continue to ask two natural questions. (i) Whether there are other suitable smoothing functions that can be employed for solving the SOCAVE? (ii) Is there a unified way to construct smoothing functions for solving the SOCAVE? In this paper, we provide affirmative answers for these two queries. In order to smoothly convey the story of how we figure out the answers, we begin with recalling where the SOCAVE comes from.

The standard absolute value equation (AVE) is in the form of

Ax + B|x| = b, (1)

where A ∈ IR^n×n, B ∈ IR^n×n, B 6= 0, and b ∈ IRⁿ. Here |x| means the componentwise absolute value of vector x ∈ IRⁿ. When B = −I, where I is the identity matrix, the AVE (1) reduces to the special form:

Ax − |x| = b.

It is known that the AVE (1) was first introduced by Rohn in [41], but was termed by Mangasarian [34]. During the past decade, there has been many researchers paying atten- tion to this equation, for example, Caccetta, Qu and Zhou [2], Hu and Huang [12], Jiang and Zhang [20], Ketabchi and Moosaei [21], Mangasarian [26, 27, 28, 29, 30, 31, 32, 33], Mangasarian and Meyer [34], Prokopyev [37], and Rohn [43].

We elaborate more about the developments of the AVE. Mangasarian and Meyer [34]

show that the AVE (1) is equivalent to the bilinear program, the generalized LCP (linear complementarity problem), and to the standard LCP provided 1 is not an eigenvalue of A. With these equivalent reformulations, they also show that the AVE (1) is NP-hard in its general form and provide existence results. Prokopyev [37] further improves the above equivalence which indicates that the AVE (1) can be equivalently recast as LCP without any assumption on A and B, and also provides a relationship with mixed integer programming. In general, if solvable, the AVE (1) can have either unique solution or

(3)

multiple (e.g., exponentially many) solutions. Indeed, various sufficiency conditions on solvability and non-solvability of the AVE (1) with unique and multiple solutions are discussed in [34, 37, 42]. Some variants of the AVE, like the absolute value equation associated with second-order cone and the absolute value programs, are investigated in [14] and [46], respectively.

Recently, another type of absolute value equation, a natural extension of the standard AVE (1), is considered [14, 35, 36]. More specifically the following absolute value equation associated with second-order cones, abbreviated as SOCAVE, is studied:

Ax + B|x| = b, (2)

where A, B ∈ IR^n×n and b ∈ IRⁿ are the same as those in (1); |x| denotes the absolute value of x coming from the square root of the Jordan product “◦” of x and x. What is the difference between the standard AVE (1) and the SOCAVE (2)? Their mathematical formats look the same. In fact, the main difference is that |x| in the standard AVE (1) means the componentwise |x_i| of each x_i ∈ IR, i.e., |x| = (|x₁|, |x₂|, · · · , |x_n|)^T ∈ IRⁿ; however, |x| in the SOCAVE (2) denotes the vector satisfying √

x² :=√

x ◦ x associated with second-order cone under Jordan product. To understand its meaning, we need to introduce the definition of second-order cone (SOC). The second-order cone in IRⁿ (n ≥ 1), also called the Lorentz cone, is defined as

Kⁿ:=(x1, x2) ∈ IR × IRⁿ⁻¹| kx2k ≤ x1 ,

where k · k denotes the Euclidean norm. If n = 1, then Kⁿ is the set of nonnegative reals IR₊. In general, a general second-order cone K could be the Cartesian product of SOCs, i.e.,

K := Kⁿ¹ × · · · × Kⁿ^r.

For simplicity, we focus on the single SOC Kⁿ because all the analysis can be carried over to the setting of Cartesian product. The SOC is a special case of symmetric cones and can be analyzed under Jordan product, see [8]. In particular, for any two vectors x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ and y = (y₁, y₂) ∈ IR × IRⁿ⁻¹, the Jordan product of x and y associated with Kⁿ is defined as

x ◦ y :=

x^Ty y₁x₂+ x₁y₂

.

The Jordan product, unlike scalar or matrix multiplication, is not associative, which is a main source of complication in the analysis of optimization problems involved SOC, see [4, 6, 9] and references therein for more details. The identity element under this Jordan product is e = (1, 0, ..., 0)^T ∈ IRⁿ. With these definitions, x² means the Jordan product of x with itself, i.e., x² := x ◦ x; and √

x with x ∈ Kⁿ denotes the unique vector such that √

x ◦√

x = x. In other words, the vector |x| in the SOCAVE (2) is computed by

|x| :=√ x ◦ x.

(4)

As remarked in the literature, the significance of the AVE (1) arises from the fact that the AVE is capable of formulating many optimization problems such as linear programs, quadratic programs, bimatrix games, and so on. Likewise, the SOCAVE (2) plays a sim- ilar role in various optimization problems involving second-order cones. There has been many numerical methods proposed for solving the standard AVE (1) and the SOCAVE (2). Please refer to [36] for a quick review. Basically, we follow the smoothing Newton algorithm employed in [36] to deal with the SOCAVE (2). This kind of algorithm has been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [11, 23, 24], the system of inequalities under the order induced by symmetric cone [18, 25, 47], and so on. It is also employed for the standard AVE (1) in [19, 44]. The new upshot of this paper lies on discovering more suitable smoothing functions and exploring a unified way to construct smoothing functions. Of course, the numerical performance among different smoothing functions are compared.

These are totally new to the literature and are the main contribution of this paper.

To close this section, we recall some basic concepts and background materials regard- ing the second-order cone, which will be used in the subsequent analysis. More details can be found in [4, 6, 8, 9, 14]. First, we recall the expression of the spectral decomposition of x with respect to SOC. For x = (x₁, x₂) ∈ IR × IRⁿ⁻¹, the spectral decomposition of x with respect to SOC is given by

x = λ₁(x)u⁽¹⁾_x + λ₂(x)u⁽²⁾_x , (3) where λ_i(x) = x₁+ (−1)ⁱkx₂k for i = 1, 2 and

u⁽ⁱ⁾_x =







1 2

1, (−1)^{i x}_kx^T²

2k

T

if kx₂k 6= 0,

1

2 1, (−1)ⁱω^TT

if kx₂k = 0,

(4)

with ω ∈ IRⁿ⁻¹ being any vector satisfying kωk = 1. The two scalars λ1(x) and λ2(x) are called spectral values of x; while the two vectors u⁽¹⁾x and u⁽²⁾x are called the spectral vectors of x. Moreover, it is obvious that the spectral decomposition of x ∈ IRⁿ is unique if x₂ 6= 0. It is known that the spectral values and spectral vectors posses the following properties:

(i) u⁽¹⁾x ◦ u⁽²⁾x = 0 and u⁽ⁱ⁾x ◦ u⁽ⁱ⁾x = u⁽ⁱ⁾x for i = 1, 2;

(ii) ku⁽¹⁾x k² = ku⁽²⁾x k² = ¹₂ and kxk² = ¹₂(λ²₁(x) + λ²₂(x)).

Next is the concept about the projection onto second-order cone. Let x+ denote the projection of x onto Kⁿ, and x₋ be the projection of −x onto the dual cone (Kⁿ)^∗ of Kⁿ, where the dual cone (Kⁿ)^∗ is defined by (Kⁿ)^∗ := {y ∈ IRⁿ | hx, yi ≥ 0, ∀x ∈ Kⁿ}.

In fact, the dual cone of Kⁿ is itself, i.e., (Kⁿ)^∗ = Kⁿ. Due to the special structure of

(5)

Kⁿ, the explicit formula of projection of x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ onto Kⁿ is obtained in [4, 6, 8, 9, 10] as below:

x₊=







x if x ∈ Kⁿ, 0 if x ∈ −Kⁿ, u otherwise,

where u =

" _x₁_+kx₂_k

2 x1+kx2k

2

x2

kx2k

# .

Similarly, the expression of x− can be written out as

x− =







0 if x ∈ Kⁿ,

−x if x ∈ −Kⁿ, w otherwise,

where w =

"

−^x¹^−kx₂ ²^k

_x

1−kx2k 2

x2

kx₂k

# .

It is easy to verify that x = x₊+ x− and

x₊ = (λ₁(x))₊u⁽¹⁾_x + (λ₂(x))₊u⁽²⁾_x x−= (−λ₁(x))₊u⁽¹⁾_x + (−λ₂(x))₊u⁽²⁾_x ,

where (α)₊ = max{0, α} for α ∈ IR. As for the expression of |x| associated with SOC.

There is an alternative way via the so-called SOC-function to obtain the expression of

|x|, which can be found in [3, 5]. In any case, it comes out that

|x| = (λ1(x))++ (−λ1(x))+u⁽¹⁾_x +(λ2(x))++ (−λ2(x))+u⁽²⁾_x

= λ₁(x)

u⁽¹⁾_x + λ₂(x)

u⁽²⁾_x .

2 Unified smoothing functions for SOCAVE

As mentioned in Section 1, we employ the smoothing Newton method for solving the SOCAVE (2), which needs a smoothing function to work with. Indeed, a family of smoothing functions was already considered in [36]. In this section, we look into what kinds of smoothing functions can be employed to work with the smoothing Newton algorithm for solving the SOCAVE (2).

Definition 2.1. A function φ : IR₊₊× IR → IR is called a smoothing function of |t| if it satisfies the following:

(i) φ is continuously differentiable at (µ, t) ∈ IR₊₊× IR;

(ii) lim

µ↓0φ(µ, t) = |t| for any t ∈ IR.

Given a smoothing function φ, we further define a vector-valued function Φ : IR₊₊× IRⁿ → IRⁿ as

Φ(µ, x) = φ (µ, λ₁(x)) u⁽¹⁾_x + φ (µ, λ₂(x)) u⁽²⁾_x (5)

(6)

where µ ∈ IR₊₊ is a parameter, λ₁(x), λ₂(x) are the spectral values of x, and u⁽¹⁾x , u⁽²⁾x

are the spectral vectors of x. Consequently, Φ is also smooth on IR₊₊× IRⁿ. Moreover, it is easy to verify that

lim

µ→0⁺Φ(µ, x) = |λ₁(x)| u⁽¹⁾_x + |λ₂(x)| u⁽²⁾_x = |x|

which means each function Φ(µ, x) serves as a smoothing function of |x| associated with SOC. With this observation, for the SOCAVE (2), we further define the function H(µ, x) : IR++× IRⁿ→ IR × IRⁿ by

H(µ, x) =

µ

Ax + BΦ(µ, x) − b

, ∀µ ∈ IR₊₊ and x ∈ IRⁿ. (6)

Proposition 2.1. Suppose that x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ has the spectral decomposition as in (3)-(4). Let H : IR₊₊× IRⁿ→ IRⁿ be defined as in (6). Then,

(a) H(µ, x) = 0 if and only if x solves the SOCAVE (2);

(b) H is continuously differentiable at (µ, x) ∈ IR₊₊×IRⁿwith the Jacobian matrix given by

H⁰(µ, x) =

"

1 0

B ^∂Φ(µ,x)_∂µ A + B ^∂Φ(µ,x)_∂x

#

(7) where

∂Φ(µ, x)

∂µ = ∂φ(µ, λ1(x))

∂µ u⁽¹⁾_x + ∂φ(µ, λ2(x))

∂µ u⁽²⁾_x ,

∂Φ(µ, x)

∂x =











∂φ(µ,x1)

∂x1 I if x2 = 0,

"

b c _kx^x^T²

2k

c _kx^x²

2k aI + (b − a)_kx^x²^x^T²

2k²

#

if x₂ 6= 0,

with

a = φ(µ, λ₂(x)) − φ(µ, λ₁(x)) λ₂(x) − λ₁(x) ,

b = 1

2

∂φ(µ, λ₂(x))

∂x₁ + ∂φ(µ, λ₁(x))

∂x₁

, (8)

c = 1

2

∂φ(µ, λ₂(x))

∂x₁ − ∂φ(µ, λ₁(x))

∂x₁

.

Proof. (a) First, we observe that

H(µ, x) = 0 ⇐⇒ µ = 0 and Ax + BΦ(µ, x) − b = 0

⇐⇒ Ax + B|x| − b = 0 and µ = 0.

(7)

This indicates that x is a solution to the SOCAVE (2) if and only if (µ, x) is a solution to H(µ, x) = 0.

(b) Since Φ(µ, x) is continuously differentiable on IR₊₊ × IRⁿ, it is clear that H(µ, x) is continuously differentiable on IR₊₊× IRⁿ. Thus, it remains to compute the Jacobian matrix of H(µ, x). Note that

Φ(µ, x) = φ(µ, λ₁(x))u⁽¹⁾_x + φ(µ, λ₂(x))u⁽²⁾_x ,

=









 1 2

"

φ(µ, λ₁(x)) + φ(µ, λ₂(x))

−φ(µ, λ₁(x))_kx^x^T²

2k + φ(µ, λ₂(x))_kx^x^T²

2k

#

if x₂ 6= 0, 1

2

φ(µ, λ1(x)) + φ(µ, λ2(x))

−φ(µ, λ₁(x))ω^T + φ(µ, λ₂(x))ω^T

if x₂ = 0.

= 1

2

















φ(µ, λ₁(x)) + φ(µ, λ₂(x) (−φ(µ, λ₁(x)) + φ(µ, λ₂(x))) _kx^x^¯²

2k

...

(−φ(µ, λ₁(x)) + φ(µ, λ₂(x))) _kx^x^¯ⁿ

2k







if x₂ 6= 0,







φ(µ, λ1(x)) + φ(µ, λ2(x)) 0

... 0







if x₂ = 0.

where x₂ := (¯x₂, · · · , ¯x_n) ∈ IRⁿ⁻¹, ω = (ω₂, · · · , ω_n) ∈ IRⁿ⁻¹. From chain rule, it is trivial

that ∂Φ(µ, x)

∂µ = ∂φ(µ, λ₁(x))

∂µ u⁽¹⁾_x +∂φ(µ, λ₂(x))

∂µ u⁽²⁾_x In order to compute ^∂Φ(µ,x)_∂x , for simplicity, we denote

Φ(µ, x) := 1 2







τ₁(µ, x) τ2(µ, x)

... τ_n(µ, x)





 .

To proceed, we discuss two cases.

(i) For x₂ 6= 0, we compute

∂τ₁(µ, x)

∂x₁ = ∂φ(µ, λ₁(x))

∂x₁ +∂φ(µ, λ₂(x))

∂x₁

= ∂φ(µ, λ₁(x))

∂λ1(x)

∂λ₁(x)

∂x1

+∂φ(µ, λ₂(x))

∂λ2(x)

∂λ₂(x)

∂x1

= ∂φ(µ, λ1(x))

∂λ₁(x) +∂φ(µ, λ2(x))

∂λ₂(x) := 2b

(8)

and

∂τ1(µ, x)

∂ ¯x_i = ∂φ(µ, λ1(x))

∂ ¯x_i +∂φ(µ, λ2(x))

∂ ¯x_i

= ∂φ(µ, λ₁(x))

∂λ1(x)

∂λ₁(x)

∂ ¯xi

+ ∂φ(µ, λ₂(x))

∂λ2(x)

∂λ₂(x)

∂ ¯xi

= −∂φ(µ, λ1(x))

∂λ₁(x)

¯ xi

kx₂k +∂φ(µ, λ2(x))

∂λ₂(x)

¯ xi

kx₂k

= ∂φ(µ, λ₂(x))

∂λ₂(x) −∂φ(µ, λ₁(x))

∂λ₁(x)

x¯_i kx₂k

= ∂φ(µ, λ₂(x))

∂x₁ −∂φ(µ, λ₁(x))

∂x₁

x¯_i

kx₂k := 2c x¯_i

kx₂k, i = 2, · · · , n.

Moreover,

∂τ_i(µ, x)

∂x₁ = ∂φ(µ, λ₂(x))

∂x₁ − ∂φ(µ, λ₁(x))

∂x₁

x¯_i

kx₂k = 2c x¯_i

kx₂k, i = 2, · · · , n.

Similarly, we have

∂τ₂(µ, x)

∂ ¯x2

= ∂φ(µ, λ₂(x))

∂ ¯x2

−∂φ(µ, λ₁(x))

∂ ¯x2

x¯₂

kx₂k + (φ(µ, λ₂(x)) − φ(µ, λ₁(x)))

∂

¯x2

kx₂k

∂ ¯x2

= 2bx¯₂· ¯x₂

kx2k² + (φ(µ, λ₂(x)) − φ(µ, λ₁(x)))

1

kx2k − x¯₂· ¯x₂ kx2k³

= 2a + 2(b − a)x¯₂· ¯x₂ kx₂k² ,

where a means a := φ(µ, λ₂(x)) − φ(µ, λ₁(x))

λ2(x) − λ1(x) . In general, mimicking the same derivation yields

∂τ_i(µ, x)

∂ ¯x_j =

( 2a + 2(b − a)_kx^x^¯ⁱ^·¯^xⁱ

2k² if i = j, 2(b − a)_kx^x^¯ⁱ^·¯^x^j

2k² if i 6= j.

To sum up, we obtain

∂Φ(µ, x)

∂x =

"

b c_kx^x^T²

2k

c_kx^x²

2k aI + (b − a)_kx^x²^x^T²

2k²

#

which is the desired result.

(ii) For x2 = 0, it is clear to see

∂τ₁(µ, x)

∂x₁ = 2∂φ(µ, x₁)

∂x₁ and ∂τ₁(µ, x)

∂ ¯x_i = 0 for i = 2, · · · , n.

(9)

Since τ_i(µ, x) = 0 for i = 2, · · · , n, it gives ^∂τⁱ_∂x^(µ,x)

1 = 0. Moreover,

∂τ₂(µ, x)

∂ ¯x₂ = lim

¯ x2→0

τ₂(µ, x₁, ¯x₂, 0, · · · , 0) − τ₂(µ, x₁, 0, · · · , 0)

¯ x₂

= lim

¯ x2→0

φ(µ, x₁+ |¯x₂|) − φ(µ, x₁− |¯x₂|)

¯ x₂

|¯x₂|

= lim

¯ x2→0

φ(µ, x₁+ |¯x₂|) − φ(µ, x₁− |¯x₂|)

|¯x2|

= lim

¯ x2→0

∂φ(µ, x1+ |¯x2|)

∂(|¯x₂|) − ∂φ(µ, x1− |¯x2|)

∂(|¯x₂|) (as L⁰Hopital⁰s rule)

= lim

¯ x2→0

∂φ(µ, x₁+ |¯x₂|)

∂(x₁+ |¯x₂|) + ∂φ(µ, x₁− |¯x₂|)

∂(x₁− |¯x₂|)

= 2∂φ(µ, x₁)

∂x1

. Thus, we obtain

∂τ_i(µ, x)

∂ ¯x_j = (

2^∂φ(µ,x_∂x ¹⁾

1 if i = j, 0 if i 6= j.

which is equivalent to saying

∂Φ(µ, x)

∂x = ∂φ(µ, x1)

∂x₁ I.

From all the above, we conclude that

∂Φ(µ, x)

∂x =











∂φ(µ,x1)

∂x1 I if x₂ = 0,

"

b c _kx^x^T²

2k

c _kx^x²

2k aI + (b − a)_kx^x²^x^T²

2k²

#

if x₂ 6= 0,

Thus, the proof is complete. 2

Now, we are ready to answer the question about what kind of smoothing functions can be adopted in the smoothing type algorithm. Two technical lemmas are needed towards the answer.

Lemma 2.1. Suppose that M, N ∈ IR^n×n. Let σ_min(M ) denote the minimum singular value of M , and σ_max(N ) denote the maximum singular value of N . Then, the following hold.

(a) σ_min(M ) > σ_max(N ) if and only if σ_min(M^TM ) > σ_max(N^TN ).

(b) If σ_min(M^TM ) > σ_max(N^TN ), then M^TM − N^TN is positive definite.

(10)

Proof. The proof is straightforward or can be found in usual textbook of matrix analysis, so we omit it here. 2

Lemma 2.2. Let A, S ∈ IR^n×n and A be symmetric. Suppose that the eigenvalues of A and SS^T are arranged in non-increasing order. Then, for each k = 1, 2, · · · , n, there exists a nonnegative real number θ_k such that

λ_min(SS^T) ≤ θ_k≤ λ_max(SS^T) and λ_k(SAS^T) = θ_kλ_k(A).

Proof. Please see [15, Corollary 4.5.11] for a proof. 2

We point out that the crucial key, which guarantees a smoothing function can be employed in the smoothing type algorithm, is the nonsingularity of the Jacobian matrix H⁰(µ, x)) given in (7). As below, we provide under what condition the Jacobian matrix H⁰(µ, x)) is nonsingular.

Theorem 2.1. Consider a SOCAVE (2) with σ_min(A) > σ_max(B). Let H be defined as in (6). Suppose that φ : IR₊₊× IR → IR is a smoothing function of |t|. If −1 ≤ _dt^dφ(µ, t) ≤ 1 is satisfied, then the Jacobian matrix H⁰(µ, x) is nonsingular for any µ > 0.

Proof. From the expression of H⁰(µ, x) given as in (7), we know that H⁰(µ, x) is nonsingular if and only if the matrix A + B ^∂Φ(µ,x)_∂x is nonsingular. Thus, it suffices to show that the matrix A + B ^∂Φ(µ,x)_∂x is nonsingular under the conditions.

Suppose not, that is, there exists a vector 0 6= v ∈ IRⁿ such that

A + B ∂Φ(µ, x)

∂x

v = 0 which implies that

v^TA^TAv = v^T ∂Φ(µ, x)

∂x

T

B^TB ∂Φ(µ, x)

∂x v. (9)

For convenience, we denote C := ^∂Φ(µ,x)_∂x . Then, it follows that v^TA^TAv = v^TC^TB^TBCv.

Applying Lemma 2.2, there exists a constant ˆθ such that

λ_min(C^TC) ≤ ˆθ ≤ λ_max(C^TC) and λ_max(C^TB^TBC) = ˆθλ_max(B^TB).

Note that if we can prove that

0 ≤ λ_min(C^TC) ≤ λ_max(C^TC) ≤ 1,

we will have λ_max(C^TB^TBC) ≤ λ_max(B^TB). Then, by the assumption that the minimum singular value of A strictly exceeds the maximum singular value of B (i.e., σ_min(A) >

(11)

σ_max(B)) and applying Lemma 2.1, we obtain v^TA^TAv > v^TC^TB^TBCv. This contradicts the identity (9), which shows the Jacobian matrix H⁰(µ, x) is nonsingular for µ > 0.

Thus, in light of the above discussion, it suffices to claim 0 ≤ λ_min(C^TC) ≤ λ_max(C^TC) ≤ 1. To this end, we discuss two cases.

Case 1: For x₂ = 0, we compute that C = ^∂φ(µ,x_∂x ¹⁾

1 I. Since −1 ≤ ^∂φ(µ,x_∂x ¹⁾

1 ≤ 1, it is clear that 0 ≤ λ(C^TC) ≤ 1 for µ > 0. Then, the claim is done.

Case 2: For x₂ 6= 0, using the fact that the matrix M^TM is always positive semidefinite for any matrix M ∈ IR^m×n, we see that the inequality λmin(C^TC) ≥ 0 always holds. In order to prove λ_max(C^TC) ≤ 1, we need to further argue that the matrix I − C^TC is positive semidefinite. First, we write out

I − C^TC =

"

1 − b²− c² −2bc_kx^x^T²

2k

−2bc_kx^x²

2k (1 − a²)I + (a² − b²− c²)_kx^x²^x^T²

2k²

# .

If −1 < ^∂φ(µ,λ_∂xⁱ^(x))

1 < 1, then we obtain b²+ c² = 1

2

"

∂φ(µ, λ₁(x))

∂x₁

2

+ ∂φ(µ, λ₂(x))

∂x₁

2#

< 1.

This indicates that 1 − b²− c² > 0. By considering [1 − b²− c²] as an 1 × 1 matrix, this says [1 − b² − c²] is positive definite. Hence, its Schur complement can be computed as below:

(1 − a²)I + (a²− b² − c²)x2x^T₂

kx₂k² − 4b²c² 1 − b²− c²

x2x^T₂ kx₂k²

= (1 − a²)

I − x₂x^T₂ kx2k²

+

1 − b²− c²− 4b²c² 1 − b²− c²

x₂x^T₂

kx2k². (10) On the other hand, by the Mean Value Theorem, we have

φ(µ, λ₂(x)) − φ(µ, λ₁(x)) = ∂φ(µ, ξ)

∂ξ (λ₂(x) − λ₁(x)),

where ξ ∈ (λ₁(x), λ₂(x)). To proceed, we need to further discuss two subcases.

(1) When −1 < ^∂φ(µ,ξ)_∂ξ < 1, we know |φ(µ, λ₂(x)) − φ(µ, λ₁(x))| < |λ₂(x) − λ₁(x)|. This together with (8) implies that 1 − a² > 0 for any µ > 0. In addition, for any µ > 0, we observe that

(1 − b²− c²)²− 4b²c²

= (1 − (b − c)²)(1 − (b + c)²)

=

"

1 − ∂φ(µ, λ₁(x))

∂x₁

2#

·

"

1 − ∂φ(µ, λ₂(x))

∂x₁

2#

> 0.

(12)

With all of these, we verify that the Schur complement of [1−b²−c²] given as in (10) is a linear positive combination of the matrices

I − _kx^x²^x^T²

2k²

and _kx^x²^x^T²

2k², which yields that the Schur complement (10) of [1 − b² − c²] is positive semidefinite. Hence, the matrix I − C^TC is also positive semidefinite, which is equivalent to saying 0 ≤ λ_min(C^TC) ≤ λ_max(C^TC) ≤ 1.

(2) When ^∂φ(µ,ξ)_∂ξ = ±1, we have

1 − a² = 0, and (1 − b²− c²)²− 4b²c² > 0.

Since the matrix _kx^x²^x^T²

2k² is positive semidefinite, the matrix I − C^TC is positive semidefinite. Hence, 0 ≤ λmin(C^TC) ≤ λmax(C^TC) ≤ 1.

If either

( _∂φ(µ,λ

1(x))

∂x1 = ±1

∂φ(µ,λ2(x))

∂x1 = ±1 or

( _∂φ(µ,λ

1(x))

∂x1 = ±1

∂φ(µ,λ2(x))

∂x1 = ∓1 , then we have b = ±1, c = 0 or b = 0, c = ∓1, which yields b² + c² = 1. Again, two subcases are needed.

(1) When −1 < ^∂φ(µ,ξ)_∂ξ < 1, we have |φ(µ, λ2(x)) − φ(µ, λ1(x))| < |λ2(x) − λ1(x)|. This implies that 1 − a² > 0 for any µ > 0. Therefore

I − C^TC = " 0 0 0 (1 − a²)

I − _kx^x²^x^T²

2k²

# ,

Since the matrix I − _kx^x²^x^T²

2k² is positive semidefinite, the matrix I − C^TC is positive semidefinite. Hence, 0 ≤ λmin(C^TC) ≤ λmax(C^TC) ≤ 1.

(2) When ^∂φ(µ,ξ)_∂ξ = ±1, we have I − C^TC = 0, which leads to λ(C^TC) = 1.

From all the above, the proof is complete. 2

We point out that the condition σmin(A) > σmax(B) in Theorem 2.1 guarantees the unique solution according to [35, Theorem 4.1]. From Theorem 2.1, we realize that for a SOCAVE (2) with σ_min(A) > σ_max(B), any smoothing function of |t| with

−1 ≤ _dt^dφ(µ, t) ≤ 1 will be good for serving in the smoothing Newton algorithm when solving the above SOCAVE. With this, it is easy to find or construct smoothing functions of |t| satisfying the above condition. One popular approach is a smoothing approximation via convolution for the absolute value function [1, 22, 38, 45], which is described as below.

First, we construct a smoothing approximation for the plus function (t)+ = max{0, t}.

Then, we consider the piecewise continuous function d(t) with finite number of pieces, which is a density (kernel) function. In other words, it satisfies

d(t) ≥ 0 and

Z +∞

−∞

d(t)dt = 1.

(13)

With this d(t), we further define ˆs(t, µ) := ¹_µd

t µ

, where µ is a positive parameter. If R+∞

−∞ |t| d(t)dt < +∞, then a smoothing approximation for (t)+ is formed. In particular, ˆ

p(t, µ) = Z +∞

−∞

(t − s)₊ˆs(s, µ)ds = Z t

−∞

(t − s)ˆs(s, µ)ds ≈ (t)₊.

The following are four well-known smoothing functions for the plus function [1, 38]:

φˆ₁(µ, t) = t + µ ln

1 + e⁻^µ^t

. (11)

φˆ₂(µ, t) =







t if t ≥ ^µ₂,

1

2µ t + ^µ₂2

if − ^µ₂ < t < ^µ₂, 0 if t ≤ −^µ₂.

(12)

φˆ₃(µ, t) = p4µ²+ t²+ t

2 . (13)

φˆ₄(µ, t) =







t − ^µ₂ if t > µ,

t²

2µ if 0 ≤ t ≤ µ, 0 if t < 0.

(14)

where the corresponding kernel functions are d₁(t) = e^−x

(1 + e^−x)².

d₂(t) = 1 if − ¹₂ ≤ x ≤ ¹₂, 0 otherwise.

d₃(t) = 2 (x²+ 4)³².

d₄(t) = 1 if 0 ≤ x ≤ 1, 0 otherwise.

Next, in light of |t| = (t)₊+ (−t)−, the smoothing function of |t| via convolution can be written as

ˆ

p(|t| , µ) = ˆp(t, µ) + ˆp(−t, µ) = Z +∞

−∞

|t − s| ˆs(s, µ)ds.

Analogous to (11)-(14), we achieve the following smoothing functions for |t|:

φ₁(µ, t) = µh ln

1 + e⁻^µ^t

+ ln

1 + e^µ^t

i

. (15)

φ₂(µ, t) =







t if t ≥ ^µ₂,

t²

µ + ^µ₄ if − ^µ₂ < t < ^µ₂,

−t if t ≤ −^µ₂.

(16)

φ3(µ, t) = p

4µ²+ t². (17)

φ₄(µ, t) =

( t²

2µ if |t| ≤ µ,

|t| − ^µ₂ if |t| > µ. (18)

(14)

If we take a Epanechnikov kernel function K(t) =

₃

4(1 − t²) if |t| ≤ 1, 0 if otherwise, then we obtain the following smoothing function for |t|:

φ5(µ, t) =







t if t > µ,

−_8µ^t⁴3 +^3t_4µ² + ^3µ₈ if − µ ≤ t ≤ µ,

−t if t < µ.

(19)

Moreover, taking a Gaussian kernel function K(t) = ^√¹

2πe⁻^t2² for all t ∈ IR yields ˆ

s(t, µ) := 1 µK t

µ

= 1

p2πµ²e⁻^2µ2^t2 , and it leads to the smoothing function [45] for |t|:

φ₆(µ, t) = terf

t

√2µ

+

r2

πµe⁻^2µ2^t2 , (20)

where the error function is defined by erf(t) = 2

√π Z t

0

e^−u²du ∀t ∈ IR.

In summary, we have constructed six smoothing functions from the above discussions.

Can all the above functions serve as smoothing functions for solving SOCAVE? The answer is affirmative because it is not hard to verify that each φipossesses −1 ≤ _dt^dφi(µ, t) ≤ 1. Thus, these six functions will be adopted for our numerical implementations. Accord- ingly, we need to define Φ_i(µ, x) and H_i(µ, x) based on each φ_i. For subsequent needs, we only present the expression of each Jacobian matrix H_i⁰(µ, x) without detailed derivations.

Based on each φ_i, let Φ_i : IR × IRⁿ → IRⁿ for i = 1, 2, · · · , 6 be similarly defined as in (5), i.e

Φ_i(µ, x) = φ_i(µ, λ₁(x)) u⁽¹⁾_x + φ_i(µ, λ₂(x)) u⁽²⁾_x (21) and H_i : IR × IRⁿ→ IRⁿ for i = 1, 2, · · · , 6 be similarly defined as in (6), i.e

H_i(µ, x) =

µ

Ax + BΦ_i(µ, x) − b

, ∀µ ∈ IR₊₊ and x ∈ IRⁿ. (22) Then, each H_iis continuously differentiable on IR₊₊×IRⁿwith the Jacobian matrix given by

H_i⁰(µ, x) =

"

1 0

B ^∂Φⁱ_∂µ^(µ,x) A + B ^∂Φⁱ_∂x^(µ,x)

#

(23) for all (µ, x) ∈ IR₊₊× IRⁿ with x = (x₁, x₂) ∈ IR × IRⁿ⁻¹. Moreover, the differentation of each Φ_i is expressed as below.

(15)

(1) The Jacobian of Φ₁ is characterized as below.

∂Φ₁(µ, x)

∂µ

= ∂φ₁(µ, λ₁(x))

∂µ u⁽¹⁾_x +∂φ₁(µ, λ₂(x))

∂µ u⁽²⁾_x

=

"

φ₁(µ, λ₁(x))

µ +λ₁(x)

µ · 1 − e^λ1(x)^µ 1 + e^λ1(x)^µ

# u⁽¹⁾_x +

"

φ₁(µ, λ₂(x))

µ + λ₂(x)

µ · 1 − e^λ2(x)^µ 1 + e^λ2(x)^µ

# u⁽²⁾_x .

∂Φ₁(µ, x)

∂x =











e^x1^µ −1 e^x1^µ +1

I if x₂ = 0,

"

b1 c1

x^T₂ kx₂k

c₁_kx^x²

2k a₁I + (b₁− a₁)_kx^x²^x^T²

2k²

#

if x₂ 6= 0,

with

a₁ = φ₁(µ, λ₂(x)) − φ₁(µ, λ₁(x)) λ₂(x) − λ₁(x) , b₁ = 1

2

e^λ1(x)^µ − 1 e^λ1(x)^µ + 1

+e^λ2(x)^µ − 1 e^λ2(x)^µ + 1

! ,

c1 = 1 2

1 − e^λ1(x)^µ e^λ1(x)^µ + 1

+e^λ2(x)^µ − 1 e^λ2(x)^µ + 1

! .

(2) The Jacobian of Φ₂ is characterized as below.

∂Φ₂(µ, x)

∂µ = ∂φ₂(µ, λ₁(x))

∂µ u⁽¹⁾_x + ∂φ₂(µ, λ₂(x))

∂µ u⁽²⁾_x with

∂φ₂(µ, λ_i(x))

∂µ =







0 if λ_i(x) ≥ ^µ₂,

−

λi(x) µ

2

+ ¹₄ if − ^µ₂ < λ_i(x) < ^µ₂, 0 if λ_i(x) ≤ −^µ₂.

∂Φ₂(µ, x)

∂x =







dI if x₂ = 0,

"

b₂ c₂_kx^x^T²

2k

c₂_kx^x²

2k a₂I + (b₂− a₂)_kx^x²^x^T²

2k²

#

if x₂ 6= 0,

(16)

with

a₂ = φ₂(µ, λ₂(x)) − φ₂(µ, λ₁(x)) λ2(x) − λ1(x) ,

b2 =











0 if λ₂(x) ≥ ^µ₂ > −^µ₂ ≥ λ₁(x), 1 if λ₂(x) > λ₁(x) ≥ ^µ₂,

λ1(x)

µ + ¹₂ if λ₂(x) ≥ ^µ₂ > λ₁(x) > −^µ₂,

λ1(x)+λ2(x)

µ if ^µ₂ > λ₂(x) > λ₁(x) > −^µ₂,

λ2(x)

µ − ¹₂ if ^µ₂ > λ₂(x) > −^µ₂ ≥ λ₁(x),

−1 if λ₁(x) < λ₂(x) ≤ −^µ₂,

c₂ =











1 if λ₂(x) ≥ ^µ₂ > −^µ₂ ≥ λ₁(x), 0 if λ₂(x) > λ₁(x) ≥ ^µ₂,

1

2 − ^λ¹_µ^(x) if λ₂(x) ≥ ^µ₂ > λ₁(x) > −^µ₂,

λ2(x)−λ1(x)

µ if ^µ₂ > λ₂(x) > λ₁(x) > −^µ₂,

λ2(x)

µ + ¹₂ if ^µ₂ > λ₂(x) > −^µ₂ ≥ λ₁(x), 0 if λ₁(x) < λ₂(x) ≤ −^µ₂, d =







1 if x1 ≥ ^µ₂,

2x1

µ if − ^µ₂ < x₁ < ^µ₂,

−1 if x₁ ≤ −^µ₂. (3) The Jacobian of Φ₃ is characterized as below.

∂Φ₃(µ, x)

∂µ = 4µ

p4µ²+ λ²₁(x)u⁽¹⁾_x + 4µ

p4µ²+ λ²₂(x)u⁽²⁾_x

∂Φ₃(µ, x)

∂x =











x1

√

4µ²+x²₁I if x2 = 0,

"

b3 c3

x^T₂ kx₂k

c₃_kx^x²

2k a₁I + (b₁− a₁)_kx^x²^x^T²

2k²

#

if x₂ 6= 0, with

a3 = φ₃(µ, λ₂(x)) − φ₃(µ, λ₁(x)) λ₂(x) − λ₁(x) , b₃ = 1

2

λ1(x)

p4µ²+ λ²₁(x) + λ2(x) p4µ²+ λ²₂(x)

! ,

c3 = 1 2

−λ₁(x)

p4µ²+ λ²₁(x) + λ₂(x) p4µ²+ λ²₂(x)

! .

(4) The Jacobian of Φ₄ is characterized as below.

∂Φ₄(µ, x)

∂µ = ∂φ₄(µ, λ₁(x))

∂µ u⁽¹⁾_x + ∂φ₄(µ, λ₂(x))

∂µ u⁽²⁾_x

(17)

with

∂φ₄(µ, λ_i(x))

∂µ =







−¹₂ if λ_i(x) > µ,

−¹₂

λi(x) µ

2

if − µ ≤ λi(x) ≤ µ,

−¹₂ if λ_i(x) < −µ.

∂Φ₄(µ, x)

∂x =







eI if x₂ = 0,

"

b4 c4

x^T₂ kx₂k

c₄_kx^x²

2k a₄I + (b₄− a₄)_kx^x²^x^T²

2k²

#

if x₂ 6= 0, with

a₄ = φ₄(µ, λ₂(x)) − φ₄(µ, λ₁(x)) λ₂(x) − λ₁(x) ,

b₄ =











0 if λ₂(x) > µ > −µ > λ₁(x), 1 if λ₂(x) > λ₁(x) > µ,

λ1(x)

2µ +¹₂ if λ₂(x) > µ ≥ λ₁(x) ≥ −µ,

λ1(x)+λ2(x)

2µ if µ ≥ λ₂(x) > λ₁(x) ≥ −µ,

λ2(x)

2µ −¹₂ if µ ≥ λ₂(x) ≥ −µ > λ₁(x),

−1 if λ₁(x) < λ₂(x) < −µ.

c₄ =











1 if λ₂(x) > µ > −µ > λ₁(x), 0 if λ₂(x) > λ₁(x) > µ,

1

2 − ^λ¹_2µ^(x) if λ₂(x) > µ ≥ λ₁(x) ≥ −µ,

λ2(x)−λ1(x)

2µ if µ ≥ λ2(x) > λ1(x) ≥ −µ,

λ2(x)

2µ +¹₂ if µ ≥ λ2(x) ≥ −µ > λ1(x), 0 if λ₁(x) < λ₂(x) < −µ, e =







1 if x₁ > µ,

x1

µ if − µ ≤ x₁ ≤ µ,

−1 if x₁ < −µ.

(5) The Jacobian of Φ5 is characterized as below.

∂Φ₅(µ, x)

∂µ = ∂φ₅(µ, λ₁(x))

∂µ u⁽¹⁾_x + ∂φ₅(µ, λ₂(x))

∂µ u⁽²⁾_x with

∂φ₅(µ, λ_i(x))

∂µ =











0 if λ_i(x) > µ,

3 8

λi(x) µ

2

− 1

2

if − µ ≤ λi(x) ≤ µ, 0 if λ_i(x) < −µ.

∂Φ₅(µ, x)

∂x =







eI if x₂ = 0,

"

b₅ c₅_kx^x^T²

2k

c5 x2

kx₂k a5I + (b5− a5)_kx^x²^x^T²

2k²

#

if x2 6= 0,