3 Smoothness of the function ψ

(1)

Computational Optimization and Applications, vol. 45, pp. 581-606, 2010

A one-parametric class of merit functions for the second-order cone complementarity problem

Jein-Shan Chen ¹ Department of Mathematics National Taiwan Normal University

Taipei, Taiwan 11677 E-mail: jschen@math.ntnu.edu.tw

Shaohua Pan

School of Mathematical Sciences South China University of Technology

Guangzhou 510640, China E-mail: shhpan@scut.edu.cn

January 10, 2007

(ﬁrst revised September 3, 2007) (ﬁnal revised April 21, 2008)

Abstract. We investigate a one-parametric class of merit functions for the second-order cone complementarity problem (SOCCP) which is closely related to the popular Fischer- Burmeister (FB) merit function and natural residual merit function. In fact, it will reduce to the FB merit function if the involved parameter τ equals 2, whereas as τ tends to zero, its limit will become a multiple of the natural residual merit function. In this paper, we show that this class of merit functions enjoys several favorable properties as the FB merit function holds, for example, the smoothness. These properties play an important role in the reformulation method of an unconstrained minimization or a nonsmooth system of equations for the SOCCP. Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions, which indicate that the FB merit function is not the best.

For the sparse linear SOCPs, the merit function corresponding to τ = 2.5 or 3 works better than the FB merit function, whereas for the dense convex SOCPs, the merit function with τ = 0.1, 0.5 or 1.0 seems to have better numerical performance.

Key words. Second-order cone, complementarity, merit function, Jordan product.

1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Oﬃce. The author’s work is partially supported by National Science Council of Taiwan.

(2)

AMS subject classifications. 26B05, 26B35, 90C33, 65K05

1 Introduction

We consider the conic complementarity problem of ﬁnding a vector ζ ∈ IRⁿ such that F (ζ)∈ K, G(ζ) ∈ K, ⟨F (ζ), G(ζ)⟩ = 0, (1) where ⟨·, ·⟩ is the Euclidean inner product, F : IRⁿ → IRⁿ and G : IRⁿ → IRⁿ are the mappings assumed to be continuously diﬀerentiable throughout this paper, and K is the Cartesian product of second-order cones (SOCs). In other words,

K = Kⁿ¹ × Kⁿ² × · · · × Kⁿ^N, (2) where N, n1, . . . , nN ≥ 1, n1+· · · + nN = n, and

Kⁿⁱ :=^{(x₁, x₂)∈ IR × IRⁿⁱ⁻¹ | ∥x2∥ ≤ x1

}

,

with∥ · ∥ denoting the Euclidean norm and K¹ denoting the set of nonnegative reals IR₊. We will refer to (1)–(2) as the second-order cone complementarity problem (SOCCP).

An important special case of the SOCCP corresponds to G(ζ) = ζ for all ζ ∈ IRⁿ. Then (1) reduces to

F (ζ) ∈ K, ζ ∈ K, ⟨F (ζ), ζ⟩ = 0, (3)

which is a natural extension of the nonlinear complementarity problem (NCP) [9, 10]

with K = IRⁿ₊, the nonnegative orthant cone of IRⁿ. Another important special case corresponds to the KKT optimality conditions of the convex second-order cone program (CSOCP):

minimize g(x)

subject to Ax = b, x∈ K, (4)

where g : IRⁿ→ IR is a convex twice continuously diﬀerentiable function, A ∈ IR^m^×n has full row rank and b∈ IR^m. From [6], we know that the KKT conditions of (4), which are suﬃcient but not necessary for optimality, can be reformulated as (1) with

F (ζ) := ¯x + (I − A^T(AA^T)⁻¹A)ζ, G(ζ) :=∇g(F (ζ)) − A^T(AA^T)⁻¹Aζ, (5) where ¯x∈ IRⁿis any point such that A¯x = b. When g is linear, the CSOCP reduces to the linear SOCP which arises in numerous applications in engineering design, ﬁnance, robust optimization, and includes as special cases convex quadratically constrained quadratic programs and linear programs; see [1, 15] and references therein.

(3)

There have been various methods proposed for solving SOCPs and SOCCPs. They include the interior-point methods [2, 3, 16, 17, 21], the non-interior smoothing Newton methods [5, 8], and the smoothing-regularization method [12]. Recently, there was an alternative method [6] based on reformulating the SOCCP as an unconstrained mini- mization problem. In that approach, it aims to ﬁnd a function ψ : IRⁿ × IRⁿ → IR+

satisfying

ψ(x, y) = 0 ⇐⇒ x ∈ K, y ∈ K, ⟨x, y⟩ = 0, (6) so that the SOCCP can be reformulated as an unconstrained minimization problem

ζmin∈IRⁿf (ζ) := ψ(F (ζ), G(ζ)).

We call such ψ a merit function associated with the cone K.

A popular choice of ψ is the Fischer-Burmeister (FB) merit function ψ_FB(x, y) := 1

2∥ϕFB(x, y)∥², (7)

where ϕ_FB : IRⁿ× IRⁿ→ IRⁿ is the vector-valued FB function deﬁned by

ϕ_FB(x, y) := (x²+ y²)^1/2− (x + y), (8) with x² = x◦x denoting the Jordan product between x and itself, x^1/2 being a vector such that (x^1/2)² = x, and x + y meaning the usual componentwise addition of vectors. The function ψ_FB was studied in [6] and particularly shown to be continuously diﬀerentiable (smooth). Another popular choice of ψ is the natural residual merit function

ψ_NR(x, y) := 1

2∥ϕNR(x, y)∥²,

where ϕ_NR : IRⁿ× IRⁿ → IRⁿ is the vector-valued natural residual function given by ϕ_NR(x, y) := x− (x − y)+

with (·)+ meaning the projection in the Euclidean norm onto K. The function ϕNR was studied in [8, 12] which is involved in smoothing methods for the SOCCP. Compared with the FB merit function ψ_FB, the function ψ_NR has a drawback, i.e., its non-diﬀerentiability.

In this paper, we will investigate the following one-parametric class of functions ψ_τ(x, y) := 1

2∥ϕτ(x, y)∥², (9)

where τ is a ﬁxed parameter from (0, 4) and ϕ_τ : IRⁿ× IRⁿ → IRⁿ is deﬁned by

ϕ_τ(x, y) :=^[(x− y)²+ τ (x◦ y)^]^1/2− (x + y). (10)

(4)

Speciﬁcally, we prove that ψ_τ is a merit function associated with K which is continuously diﬀerentiable everywhere with computable gradient formulas (see Propositions 3.1–3.3), and hence the SOCCP can be reformulated as an unconstrained smooth minimization

ζmin∈IRⁿf_τ(ζ) := ψ_τ(F (ζ), G(ζ)). (11) Also, we show that every stationary point of fτ solves the SOCCP under the condition that∇F and −∇G are column monotone (see Proposition 4.1). Observe that ϕτ reduces to ϕ_FB when τ = 2, whereas its limit as τ → 0 becomes a multiple of ϕNR. Thus, this class of merit functions has a close relation to two of the most important merit functions so that a closer look and study for it is worthwhile. In addition, this study is motivated by the work [13] where ϕ_τ was used to develop a nonsmooth Newton method for the NCP. This paper is mainly concerned with the merit function approach based on the unconstrained minimization problem (11). Numerical results are also reported for some convex SOCPs, which indicate that ψτ can be an alternative for ψ_FB if a suitable τ is selected.

Throughout this paper, IRⁿ denotes the space of n-dimensional real column vectors, and IRⁿ¹× · · · × IRⁿ^m is identiﬁed with IRⁿ¹⁺^···+n^m. Thus, (x1, . . . , xm)∈ IRⁿ¹× · · · × IRⁿ^m is viewed as a column vector in IRⁿ¹⁺^···+n^m. The notation I denotes an identity matrix of suitable dimension, and int(Kⁿ) denotes the interior of Kⁿ. For any diﬀerentiable mapping F : IRⁿ → IR^m, ∇F (x) ∈ IRⁿ^×m denotes the transposed Jacobian of F at x.

For a symmetric matrix A, we write A≽ O (respectively, A ≻ O) to mean A is positive semidefinite (respectively, positive definite). For nonnegative α and β, we write α = O(β) to mean α≤ Cβ, with C > 0 independent of α and β. Without loss of generality, in the rest of this paper we assume that K = Kⁿ (n > 1). All analysis can be carried over to the general case where K has the structure as (2). In addition, we always assume that τ satisfies 0 < τ < 4.

2 Preliminaries

It is known that Kⁿ (n > 1) is a closed convex self-dual cone with nonempty interior int(Kⁿ) :=^{x = (x₁, x₂)∈ IR × IRⁿ⁻¹ | x1 >∥x2∥^}.

For any x = (x₁, x₂), y = (y₁, y₂)∈ IR × IRⁿ⁻¹, the Jordan product of x and y is deﬁned by

x◦ y := (⟨x, y⟩, y1x2+ x1y2). (12) The Jordan product, unlike scalar or matrix multiplication, is not associative, which is a main source on complication in the analysis of SOCCP. The identity element under this product is e := (1, 0, . . . , 0)^T ∈ IRⁿ. For any x = (x₁, x₂)∈ IR × IRⁿ⁻¹, the determinant

(5)

of x is deﬁned by det(x) := x²₁− ∥x2∥². If det(x)̸= 0, then x is said to be invertible. If x is invertible, there exists a unique y = (y₁, y₂) ∈ IR × IRⁿ⁻¹ satisfying x◦ y = y ◦ x = e.

We call this y the inverse of x and denote it by x⁻¹. For each x = (x₁, x₂)∈ IR × IRⁿ⁻¹, let

L_x :=

[ x₁ x^T₂ x₂ x₁I

]

. (13)

It is easily veriﬁed that Lxy = x◦ y and Lx+y = Lx+ Ly for any x, y ∈ IRⁿ, but generally L²_x = L_xL_x ̸= Lx² and L⁻¹_x ̸= Lx⁻¹. If L_x is invertible, then the inverse of L_x is given by

L⁻¹_x = 1 det(x)





x1 −x^T2

−x2

det(x) x₁ I + 1

x₁x₂x^T₂



. (14)

We next recall from [8] that each x = (x1, x2)∈ IR × IRⁿ⁻¹ admits a spectral factorization, associated with Kⁿ, of the form

x = λ₁(x)· u⁽¹⁾x + λ₂(x)· u⁽²⁾x ,

where λ1(x), λ2(x) and u⁽¹⁾_x , u⁽²⁾_x are the spectral values and the associated spectral vectors of x given by

λ_i(x) = x₁ + (−1)ⁱ∥x2∥, u⁽ⁱ⁾x = 1 2

(

1, (−1)ⁱx¯₂⁾ for i = 1, 2, with ¯x2 = _∥x^x²

2∥ if x2 ̸= 0, and otherwise ¯x2 being any vector in IRⁿ⁻¹such that∥¯x2∥ = 1. If x₂ ̸= 0, the factorization is unique. The spectral factorization of x has various interesting properties; see [8]. We list three properties that will be used later.

Property 2.1 (a) x² = λ²₁(x)· u⁽¹⁾_x + λ²₂(x)· u⁽²⁾_x ∈ Kⁿ for any x∈ IRⁿ. (b) If x∈ Kⁿ, then x^1/2 =

√

λ₁(x)· u⁽¹⁾x +

√

λ₂(x)· u⁽²⁾x ∈ Kⁿ.

(c) x∈ Kⁿ ⇐⇒ λ1(x)≥ 0 ⇐⇒ Lx ≽ O, x ∈ int(Kⁿ)⇐⇒ λ1(x) > 0⇐⇒ Lx≻ O.

3 Smoothness of the function ψ

_τ

In this section we will show that ψ_τ deﬁned by (9) is a smooth merit function. First, by Property 2.1 (a) and (b), ϕ_τ and ψ_τ are well-deﬁned since for any x, y∈ IRⁿ, there has (x− y)²+ τ (x◦ y) =⁽x +τ − 2

2 y

)2

+ τ (4− τ) 4 y² =

(

y + τ− 2 2 x

)2

+τ (4− τ)

4 x² ∈ Kⁿ.(15) The following proposition shows that ψ_τ is indeed a merit function associated withKⁿ.

(6)

Proposition 3.1 Let ψ_τ and ϕ_τ be given as in (9) and (10), respectively. Then, ψ_τ(x, y) = 0 ⇐⇒ ϕτ(x, y) = 0 ⇐⇒ x ∈ Kⁿ, y ∈ Kⁿ, ⟨x, y⟩ = 0.

Proof. The ﬁrst equivalence is clear by the deﬁnition of ψ_τ. We consider the second one.

“⇐”. Since x ∈ K, y ∈ K and ⟨x, y⟩ = 0, we have x ◦ y = 0. Substituting it into the expression of ϕ_τ(x, y) then yields that ϕ_τ(x, y) = (x²+ y²)^1/2− (x + y) = ϕFB(x, y). From Proposition 2.1 of [8], we immediately obtain ϕ_τ(x, y) = 0.

“⇒”. Suppose that ϕτ(x, y) = 0. Then, x + y = [(x− y)²+ τ (x◦ y)]^1/2. Squaring both sides yields x◦ y = 0. This implies that x + y = (x² + y²)^1/2, i.e. ϕ_FB(x, y) = 0. From Proposition 2.1 of [8], it then follows that x∈ Kⁿ, y∈ Kⁿ and ⟨x, y⟩ = 0. 2

In what follows, we focus on the proof of the smoothness of ψ_τ. We ﬁrst introduce some notation that will be used in the sequel. For any x = (x₁, x₂), y = (y₁, y₂)∈ IR × IRⁿ⁻¹, let

w = (w₁, w₂) = w(x, y) := (x− y)²+ τ (x◦ y),

z = (z₁, z₂) = z(x, y) := ^[(x− y)²+ τ (x◦ y)^]^1/2. (16) Then, w ∈ Kⁿ and z ∈ Kⁿ. Moreover, by the deﬁnition of Jordan product,

w1 = w1(x, y) = ∥x∥²+∥y∥² + (τ − 2)x^Ty,

w₂ = w₂(x, y) = 2(x₁x₂+ y₁y₂) + (τ − 2)(x1y₂+ y₁x₂). (17) Let λ1(w) and λ2(w) be the spectral values of w. By Property 2.1 (b), we have that

z₁ = z₁(x, y) =

√

λ₂(w) +

√

λ₁(w)

2 , z₂ = z₂(x, y) =

√

λ₂(w)−^√λ₁(w)

2 w¯₂, (18) where ¯w₂ := _∥w^w²

2∥ if w₂ ̸= 0 and otherwise ¯w₂ is any vector in IRⁿ⁻¹ satisfying ∥ ¯w₂∥ = 1.

The following technical lemma describes the behavior of x, y when w = (x− y)²+ τ (x◦y) is on the boundary of Kⁿ. In fact, it may be viewed as an extension of [6, Lemma 3.2].

Lemma 3.1 For any x = (x₁, x₂), y = (y₁, y₂)∈ IR × IRⁿ⁻¹, if w /∈ int(Kⁿ), then x²₁ = ∥x2∥², y₁² = ∥y2∥², x₁y₁ = x^T₂y₂, x₁y₂ = y₁x₂; (19) x²₁+ y₁²+ (τ − 2)x1y1 = ∥x1x2+ y1y2+ (τ − 2)x1y2∥

= ∥x2∥²+∥y2∥²+ (τ − 2)x^T2y₂. (20) If, in addition, (x, y)̸= (0, 0), then w2 ̸= 0, and furthermore,

x^T₂ w2

∥w2∥ = x₁, x₁ w2

∥w2∥ = x₂, y₂^T w2

∥w2∥ = y₁, y₁ w2

∥w2∥ = y₂. (21)

(7)

Proof. Since w = (x− y)² + τ (x◦ y) /∈ int(Kⁿ), using (15) and [6, Lemma 3.2] yields

(

x₁ +τ − 2 2 y₁

)2

=x₂+τ − 2 2 y₂

2

, y²₁ =∥y2∥²,

(

x1+ τ− 2 2 y1

)

y2 =

(

x2+τ − 2 2 y2

)

y1,

(

x1+ τ− 2 2 y1

)

y1 =

(

x2+τ − 2 2 y2

)T

y2;

(

y1+τ − 2 2 x1

)2

=y2+ τ− 2 2 x2

², x²₁ =∥x2∥²,

(

y₁+ τ− 2 2 x₁

)

x₂ =

(

y₂+τ − 2 2 x₂

)

x₁,

(

y₁+τ − 2 2 x₁

)

x₁ =

(

y₂+τ − 2 2 x₂

)T

x₂. From these equalities, we readily get the results in (19). Since w ∈ Kⁿ but w /∈ int(Kⁿ), we have∥x∥²+∥y∥²+(τ−2)x^Ty =∥2x1x₂+ 2y₁y₂+ (τ − 2)(x1y₂+ y₁x₂)∥ by λ1(w) = 0.

Applying the relations in (19) then gives the equalities in (20). If, in addition, (x, y)̸=

(0, 0), then it is clear that ∥x1x₂+ y₁y₂ + (τ − 2)x1y₂∥ = x²1+ y₁²+ (τ − 2)x1y₁ ̸= 0. To prove the equalities in (21), it suﬃces to verify that x^T₂ _∥w^w²

2∥ = x₁ and x₁_∥w^w²

2∥ = x₂ by the symmetry of x and y in w. The veriﬁcations are straightforward by (20) and x₁y₂ = y₁x₂ 2

By Lemma 3.1, when w /∈ int(Kⁿ), the spectral values of w are calculated as follows:

λ1(w) = 0, λ2(w) = 4⁽x²₁+ y²₁ + (τ − 2)x1y1

)

. (22)

If (x, y)̸= (0, 0) also holds, then using equations (18), (20) and (22) yields that z₁(x, y) =

√

x²₁+ y₁²+ (τ − 2)x1y₁, z₂(x, y) = x₁x₂+ y₁y₂+ (τ − 2)x1y₂

√

x²₁+ y²₁+ (τ − 2)x1y₁ .

Thus, if (x, y)̸= (0, 0) and (x − y)²+ τ (x◦ y) /∈ int(Kⁿ), ϕτ(x, y) can be rewritten as

ϕ_τ(x, y) = z(x, y)− (x + y) =







√

x²₁+ y²₁ + (τ − 2)x1y₁− (x1+ y₁) x₁x₂+ y₁y₂+ (τ − 2)x1y₂

√

x²₁+ y²₁ + (τ − 2)x1y1

− (x2+ y2)





. (23)

This speciﬁc expression will be employed in the proof of the following main result.

Proposition 3.2 The function ψ_τ given by (9) is diﬀerentiable at every (x, y) ∈ IRⁿ× IRⁿ. Moreover, ∇xψ_τ(0, 0) =∇yψ_τ(0, 0) = 0; if (x− y)²+ τ (x◦ y) ∈ int(Kⁿ), then

∇xψ_τ(x, y) = ^[L_x+^τ−2

2 yL⁻¹_z − I^]ϕ_τ(x, y),

∇yψ_τ(x, y) = ^[L_y+^τ−2

2 xL⁻¹_z − I^]ϕ_τ(x, y); (24)

(8)

if (x, y)̸= (0, 0) and (x − y)²+ τ (x◦ y) ̸∈ int(Kⁿ), then x²₁+ y²₁+ (τ − 2)x1y₁ ̸= 0 and

∇xψ_τ(x, y) =



 x₁+^τ⁻²₂ y₁

√

x²₁+ y₁²+ (τ − 2)x1y₁ − 1



ϕ_τ(x, y),

∇yψ_τ(x, y) =



 y₁+ ^τ⁻²₂ x₁

√

x²₁+ y₁²+ (τ − 2)x1y₁ − 1



ϕ_τ(x, y). (25)

Proof. Case (1): (x, y) = (0, 0). For any u = (u₁, u₂), v = (v₁, v₂)∈ IR×IRⁿ⁻¹, let µ₁, µ₂ be the spectral values of (u− v)²+ τ (u◦ v) and ξ⁽¹⁾, ξ⁽²⁾ be the spectral vectors. Then,

2 [ψ_τ(u, v)− ψτ(0, 0)] = [u²+ v² + (τ − 2)(u ◦ v)]^1/2− u − v²

= √

µ₁ ξ⁽¹⁾+√

µ₂ ξ⁽²⁾− u − v ²

≤ ^(√2µ₂+∥u∥ + ∥v∥⁾². In addition, from the deﬁnition of spectral value, it follows that

µ₂ = ∥u∥²+∥v∥²+ (τ − 2)u^Tv + 2∥(u1u₂+ v₁v₂) + (τ− 2)(u1v₂+ v₁u₂)∥

≤ 2∥u∥²+ 2∥v∥²+ 3|τ − 2|∥u∥∥v∥ ≤ 5(∥u∥²+∥v∥²).

Now combining the last two equations, we have ψ_τ(u, v)− ψτ(0, 0) = O(∥u∥² +∥v∥²).

This shows that ψ_τ is diﬀerentiable at (0, 0) with∇xψ_τ(0, 0) =∇yψ_τ(0, 0) = 0.

Case (2): (x− y)² + τ (x◦ y) ∈ int(Kⁿ). By [4, Proposition 5], z(x, y) defined by (18) is continuously differentiable at such (x, y), and consequently ϕ_τ(x, y) is also continuously differentiable at such (x, y) since ϕ_τ(x, y) = z(x, y)− (x + y). Notice that

z²(x, y) =

(

x +τ − 2 2 y

)2

+τ (4− τ) 4 y², which leads to ∇xz(x, y)Lz = L_x+^τ−2

2 y by taking diﬀerentiation on both sides about x.

Since Lz ≻ O by Property 2.1 (c), it follows that ∇xz(x, y) = L_x+^τ−2

2 yL⁻¹_z . Consequently,

∇xϕ_τ(x, y) =∇xz(x, y)− I = Lx+^τ⁻²₂ yL⁻¹_z − I.

This together with ∇xψ_τ(x, y) =∇xϕ_τ(x, y)ϕ_τ(x, y) proves the ﬁrst formula of (24). For the symmetry of x and y in ψ_τ, the second formula also holds.

Case (3): (x, y) ̸= (0, 0) and (x − y)²+ τ (x◦ y) /∈ int(Kⁿ). For any x^′ = (x^′₁, x^′₂), y^′ = (y₁^′, y₂^′)∈ IR × IRⁿ⁻¹, it is easy to verify that

2ψ_τ(x^′, y^′) = ^[x^′2+ y^′2+ (τ − 2)(x^′ ◦ y^′)^]^1/2

2

+∥x^′+ y^′∥²

−2^⟨[x^′2+ y^′2+ (τ − 2)(x^′◦ y^′)

]_1/2

, x^′+ y^′

⟩

= ∥x^′∥²+∥y^′∥² + (τ − 2)⟨x^′, y^′⟩ + ∥x^′+ y^′∥²

−2^⟨[x^′2+ y^′2+ (τ − 2)(x^′◦ y^′)^]^1/2, x^′+ y^′

⟩

,

(9)

where the second equality uses the fact that∥z∥² =⟨z², e⟩ for any z ∈ IRⁿ. Since ∥x^′∥²+

∥y^′∥²+ (τ−2)⟨x^′, y^′⟩+∥x^′+ y^′∥² is clearly diﬀerentiable in (x^′, y^′), it suﬃces to show that

⟨[x^′2+ y^′2+ (τ− 2)(x^′◦ y^′)]^1/2, x^′+ y^′⟩ is diﬀerentiable at (x^′, y^′) = (x, y). By Lemma 3.1, w2 = w2(x, y)̸= 0, which implies w^′2 = w2(x^′, y^′) = 2x^′₁x^′₂+2y₁^′y^′₂+(τ−2)(x^′1y^′₂+y^′₁x^′₂)̸= 0 for all (x^′, y^′) ∈ IRⁿ× IRⁿ suﬃciently near to (x, y). Let µ₁, µ₂ be the spectral values of x^′2+ y^′2+ (τ − 2)(x^′◦ y^′). Then we can compute that

2^⟨[x^′2+ y^′2+ (τ − 2)(x^′◦ y^′)^]^1/2, x^′+ y^′

⟩

= √

µ₂

[

x^′₁+ y₁^′ + [2(x^′₁x^′₂ + y^′₁y₂^′) + (τ − 2)(x^′1y₂^′ + y^′₁x^′₂)]^T(x^′₂+ y^′₂)

∥2(x^′1x^′₂+ y₁^′y^′₂) + (τ − 2)(x^′1y^′₂+ y₁^′x^′₂)∥

]

+√ µ₁

[

x^′₁+ y₁^′ − [2(x^′₁x^′₂+ y^′₁y₂^′) + (τ − 2)(x^′1y₂^′ + y^′₁x^′₂)]^T (x^′₂+ y₂^′)

∥2(x^′1x^′₂+ y₁^′y^′₂) + (τ − 2)(x^′1y^′₂+ y₁^′x^′₂)∥

]

. (26) Since λ2(w) > 0 and w2(x, y) ̸= 0, the ﬁrst term on the right-hand side of (26) is diﬀerentiable at (x^′, y^′) = (x, y). Now, we claim that the second term is o(∥x^′ − x∥ +

∥y^′ − y∥), i.e., it is diﬀerentiable at (x, y) with zero gradient. To see this, notice that w₂(x, y) ̸= 0, and hence µ1 = ∥x^′∥² +∥y^′∥² + (τ − 2)⟨x^′, y^′⟩ − ∥2(x^′1x^′₂ + y^′₁y₂^′) + (τ − 2)(x^′₁y₂^′ + y₁^′x^′₂)∥, viewed as a function of (x^′, y^′), is diﬀerentiable at (x^′, y^′) = (x, y).

Moreover, µ₁ = λ₁(w) = 0 when (x^′, y^′) = (x, y). Thus, the ﬁrst-order Taylor’s expansion of µ₁ at (x, y) yields

µ₁ = O(∥x^′ − x∥ + ∥y^′− y∥).

Also, since w₂(x, y)̸= 0, by the product and quotient rules for diﬀerentiation, the function

x^′₁+ y^′₁−[2(x^′₁x^′₂+ y₁^′y^′₂) + (τ − 2)(x^′1y^′₂+ y₁^′x^′₂)]^T (x^′₂+ y₂^′)

∥2(x^′1x^′₂+ y^′₁y₂^′) + (τ − 2)(x^′1y₂^′ + y^′₁x^′₂)∥ (27) is diﬀerentiable at (x^′, y^′) = (x, y), and it has value 0 at (x^′, y^′) = (x, y) due to

x₁+ y₁− [x₁x₂+ y₁y₂+ (τ − 2)x1y₂]^T (x₂+ y₂)

∥x1x₂+ y₁y₂+ (τ − 2)x1y₂∥ = x₁− x^T2

w₂

∥w2∥ + y₁− y2^T

w₂

∥w2∥ = 0.

Hence, the function in (27) is O(∥x^′− x∥ + ∥y^′− y∥) in magnitude, which together with µ₁ = O(∥x^′− x∥ + ∥y^′− y∥) shows that the second term on the right-hand side of (26) is

O((∥x^′− x∥ + ∥y^′− y∥)^3/2) = o(∥x^′− x∥ + ∥y^′ − y∥).

Thus, we have shown that ψ_τ is diﬀerentiable at (x, y). Moreover, we see that 2∇ψτ(x, y) is the sum of the gradient of ∥x^′∥²+∥y^′∥²+ (τ − 2)⟨x^′, y^′⟩ + ∥x^′+ y^′∥² and the gradient of the ﬁrst term on the right-hand side of (26), evaluated at (x^′, y^′) = (x, y).

The gradient of∥x^′∥²+∥y^′∥²+ (τ− 2)⟨x^′, y^′⟩ + ∥x^′+ y^′∥² with respect to x^′, evaluated at (x^′, y^′) = (x, y), is 2x + (τ − 2)y + 2(x + y). The derivative of the ﬁrst term on the

(10)

right-hand side of (26) with respect to x^′₁, evaluated at (x^′, y^′) = (x, y), works out to be

√ 1 λ₂(w)

[(

x₁+τ − 2 2 y₁

)

+

(

x₂+τ − 2 2 y₂

)T w₂

∥w2∥

] (

x₁+ y₁+ (x₂+ y₂)^T w₂

∥w2∥

)

+

√

λ₂(w)

[

1 + (x₂+ ^τ⁻²₂ y₂)^T(x₂+ y₂)

∥x1x₂+ y₁y₂+ (τ − 2)x1y₂∥ − w₂^T(x₂+ y₂)· w₂^T(x₂+ ^τ⁻²₂ y₂)

∥x1x₂+ y₁y₂+ (τ − 2)x1y₂∥ · ∥w2∥²

]

= 2(x₁+ ^τ⁻²₂ y₁)(x₁+ y₁)

√

x²₁+ y²₁+ (τ − 2)x1y₁ + 2

√

x²₁+ y₁²+ (τ − 2)x1y₁,

where the equality follows from Lemma 3.1. Similarly, the gradient of the ﬁrst term on the right of (26) with respect to x^′₂, evaluated at (x^′, y^′) = (x, y), works out to be

√ 1 λ₂(w)

[(

x₂+τ − 2 2 y₂

)

+

(

x₁+τ − 2 2 y₁

) w₂

∥w2∥

] (

x₁+ y₁+ (x₂ + y₂)^T w₂

∥w2∥

)

+

√

λ₂(w)

[(2x₁+ (τ − 2)y1)x₂ +^τ₂(x₁+ y₁)y₂

∥x1x₂+ y₁y₂+ (τ − 2)x1y₂∥ − w^T₂(x₂+ y₂)· (x1+^τ⁻²₂ y₁)w₂

∥x1x₂+ y₁y₂+ (τ − 2)x1y₂∥ · ∥w2∥²

]

= 2(2x₁+ (τ − 2)y1)x₂+^τ₂(x₁+ y₁)y₂

√

x²₁+ y₁²+ (τ − 2)x1y₁

.

Then, combining the last two gradient expressions yields that 2∇xψ_τ(x, y) = 2x + (τ − 2)y + 2(x + y) −

[2

√

x²₁+ y₁²+ (τ − 2)x1y₁ 0

]

− 2

√

x²₁+ y₁²+ (τ − 2)x1y₁

[ (x1+^τ⁻²₂ y1)(x1+ y1) (2x₁+ (τ − 2)y1)x₂+ ^τ₂(x₁+ y₁)y₂

]

.

Using the fact that x1y2 = y1x2 and noting that ϕτ can be simpliﬁed as the one in (23) under this case, we readily rewrite the above expression for ∇xψ_τ(x, y) in the form of (25). By symmetry, ∇yψ_τ(x, y) also holds as the form of (25). 2

Proposition 3.2 shows that ψ_τ is diﬀerentiable with a computable gradient. To estab- lish the continuity of the gradient of ψτ or the smoothness of ψτ, we need the following two crucial technical lemmas whose proofs are provided in appendix.

Lemma 3.2 For any x = (x₁, x₂), y = (y₁, y₂)∈ IR × IRⁿ⁻¹, if w₂ ̸= 0, then

[(

x₁+τ − 2 2 y₁

)

+ (−1)ⁱ⁽x₂+τ − 2 2 y₂

)T w₂

∥w2∥

]₂

≤ (

x₂+τ − 2 2 y₂

)

+ (−1)ⁱ⁽x₁+ τ− 2 2 y₁

) w₂

∥w2∥

2

≤ λi(w) for i = 1, 2. Furthermore, these relations also hold when interchanging x and y.

(11)

Lemma 3.3 For all (x, y) satisfying (x− y)²+ τ (x◦ y) ∈ int(Kⁿ), we have that

L_x+τ−2 2 yL⁻¹_z

F ≤ C, L_y+τ−2 2 xL⁻¹_z

F ≤ C, (28)

where C > 0 is a constant independent of x, y and τ , and ∥ · ∥F denotes the Frobenius norm.

Proposition 3.3 The function ψ_τ deﬁned by (9) is smooth everywhere on IRⁿ× IRⁿ. Proof. By Proposition 3.2 and the symmetry of x and y in ∇ψτ, it suﬃces to show that ∇xψ_τ is continuous at every (a, b)∈ IRⁿ× IRⁿ. If (a− b)²+ τ (a◦ b) ∈ int(Kⁿ), the conclusion has been shown in Proposition 3.2. We next consider the other two cases.

Case (1): (a, b) = (0, 0). By Proposition 3.2, we need to show that ∇xψ_τ(x, y) → 0 as (x, y)→ (0, 0). If (x − y)²+ τ (x◦ y) ∈ int(Kⁿ), then∇xψ_τ(x, y) is given by (24), whereas if (x, y) ̸= (0, 0) and (x − y)² + τ (x◦ y) /∈ int(Kⁿ), then ∇xψ_τ(x, y) is given by (25).

Notice that L_x+τ−2

2 yL⁻¹_z and √ ^x¹⁺^τ−2² ^y¹

x²₁+y₁²+(τ−2)x¹y1

are bounded with bound independent of x, y and τ , using the continuity of ϕ_τ(x, y) immediately yields the desired result.

Case (2): (a, b)̸= (0, 0) and (a−b)²+τ (a◦b) /∈ int(Kⁿ). We will show that∇xψ_τ(x, y)→

∇xψ_τ(a, b) by the two subcases: (2a) (x, y)̸= (0, 0) and (x − y)²+ τ (x◦ y) /∈ int(Kⁿ) and (2b) (x− y)²+ τ (x◦ y) ∈ int(Kⁿ). In subcase (2a), ∇xψ_τ(x, y) is given by (25). Noting that the right hand side of (25) is continuous at (a, b), the desired result follows.

Next, we prove that ∇xψ_τ(x, y) → ∇xψ_τ(a, b) in subcase (2b). From (24), we have that

∇xψ_τ(x, y) =

(

x +τ − 2 2 y

)

− Lx+^τ⁻²₂ yL⁻¹_z (x + y)− ϕτ(x, y). (29) On the other hand, since (a, b)̸= (0, 0) and (a − b)²+ τ (a◦ b) /∈ int(Kⁿ),

∥a∥²+∥b∥²+ (τ − 2)a^Tb =∥2(a1a₂+ b₁b₂) + (τ − 2)(a1b₂+ b₁a₂)∥ ̸= 0, (30) and moreover from (20) it follows that

∥a∥²+∥b∥²+ (τ − 2)a^Tb = 2(a²₁+ b²₁+ (τ − 2)a1b₁)

= 2(∥a2∥²+∥b2∥²+ (τ − 2)a^T2b₂)

= 2∥(a1a₂+ b₁b₂) + (τ − 2)a1b₂∥. (31) Using the equalities in (31), it is not hard to verify that

a1+^τ⁻²₂ b1

√

a²₁+ b²₁+ (τ − 2)a1b₁

(

(a− b)²+ τ (a◦ b)⁾^1/2 = a +τ − 2 2 b.