to appear in Computational Optimization and Applications, 2018

### Two smooth support vector machines for ε-insensitive regression

Weizhe Gu

Department of Mathematics School of Science Tianjin University Tianjin 300072, P.R. China E-mail: weizhegu@yahoo.com.cn

Wei-Po Chen

Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan E-mail: weaper@gmail.com

Chun-Hsu Ko

Department of Electrical Engineering I-Shou University

Kaohsiung 840, Taiwan E-mail: chko@isu.edu.tw

Yuh-Jye Lee

Department of Applied Mathematics National Chiao Tung University

Hsinchu 300, Taiwan

E-mail: yuhjye@math.nctu.edu.tw

Jein-Shan Chen ^{1}
Department of Mathematics
National Taiwan Normal University

Taipei 11677, Taiwan E-mail: jschen@math.ntnu.edu.tw

1The author’s work is supported by Ministry of Science and Technology, Taiwan.

June 7, 2017

(1st revised on September 21, 2017) (2nd revised on November 6, 2017)

Abstract. In this paper, we propose two new smooth support vector machines for ε- insensitive regression. According to these two smooth support vector machines, we construct two systems of smooth equations based on two novel families of smoothing functions, from which we seek the solution to ε-support vector regression(ε-SVR). More specifically, using the proposed smoothing functions, we employ the smoothing Newton method to solve the systems of smooth equations. The algorithm is shown to be globally and quadratically convergent without any additional conditions. Numerical comparisons among different values of parameter are also reported.

Key words. support vector machine, ε-insensitive loss function, ε-smooth support vector regression, smoothing Newton algorithm

### 1 Introduction

Support vector machine (SVM) is a popular and important statistical learning tech-
nology [1, 9, 10, 16, 17, 18, 19]. Generally speaking, there are two main categories
for support vector machines (SVMs): support vector classification (SVC) and support
vector regression (SVR). The model produced by SVR depends on a training data set
S = {(A_{1}, y_{1}), . . . , (A_{m}, y_{m})} ⊆ IR^{n}× IR, where A_{i} ∈ IR^{n} is the input data and y_{i} ∈ IR is
called the observation. The main goal of ε-insensitive regression with the idea of SVMs
is to find a linear or nonlinear regression function f that has at most ε deviation from
the actually obtained targets y_{i} for all the training data, and at the same time is as flat
as possible. This problem is called ε-support vector regression (ε-SVR).

For pedagogical reasons, we begin with the linear case, in which the regression function f ($) is defined as

f ($) = $^{T}x + b with x ∈ IR^{n}, b ∈ IR. (1)
Flatness in the case of (1) means that one seeks a small x. One way to ensure this is
to minimize the norm of x, then the problem ε-SVR can be formulated as a constrained
minimization problem:

min ^{1}_{2}x^{T}x + CPm

i=1(ξ_{i}+ ξ^{∗}_{i})
s.t.

y_{i}− A^{T}_{i} x − b ≤ ε + ξ_{i}
A^{T}_{i} x + b − y_{i} ≤ ε + ξ_{i}^{∗}
ξi, ξ_{i}^{∗} ≥ 0, i = 1, · · · , m

(2)

The constant C > 0 determines the trade-off between the flatness of f and the amount
up to which deviations larger than ε are tolerated. This corresponds to dealing with a
so called ε-insensitive loss function |ξ|_{ε} described by

|ξ|_{ε} = max{0, |ξ| − ε}.

The formulation (2) is a convex quadratic minimization problem with n+1 free variables, 2m nonnegative variables, and 2m inequality constraints, which enlarges the problem size and could increase computational complexity.

In fact, the problem (2) can be reformulated as an unconstrained optimization prob- lem:

min

(x,b)∈IR^{n+1}

1

2(x^{T}x + b^{2}) + C
2

m

X

i=1

A^{T}_{i} x + b − yi

2

ε (3)

This formulation has been proposed in active set support vector regression [8] and solved in its dual form. The objective function is strongly convex, hence, the problem has a unique global optimal solution. However, according to the fact that the objective func- tion is not twice continuously differentiable, Newton-type algorithms cannot be applied to solve (3) directly.

Lee, Hsieh and Huang [9] apply a smooth technique for (3). The smooth function
f_{ε}(x, α) = x + 1

αlog(1 + e^{−αx}), (4)

which is the integral of the sigmoid function _{1+e}^{1}−αx, is used to smooth the plus function
[x]_{+}. More specifically, the smooth function f_{ε}(x, α) approaches to [x]_{+}, when α goes to
infinity. Then, the problem (3) is recast to a strongly convex unconstrained minimization
problem with the smooth function f_{ε}(x, α) and a Newton-Armijo algorithm is proposed
to solve it. It is proved that when the smoothing parameter α approaches to infinity,
the unique solution of the reformulated problem converges to the unique solution of the
original problem [9, Theorem 2.2]. However, the smoothing parameter α is fixed in the
proposed algorithm, and in the implementation of this algorithm, α cannot be set large
enough.

In this paper, we introduce two smooth support vector machines for ε-insensitive
regression. For the first smooth support vector machine, we reformulated ε-SVR to a
strongly convex unconstrained optimization problem with one type of smoothing func-
tions φ_{ε}(x, α). Then, we define a new function H_{φ}, which corresponds to the optimality
condition of the unconstrained optimization problem. From the solution of H_{φ}(z) = 0,
we can obtain the solution of ε-SVR. For the second smooth support vector machine,
we smooth the optimality condition of the strongly convex unconstrained optimization

problem of (3) with another type of smooth functions ψ_{ε}(x, α). Accordingly we define the
function H_{ψ}, which also possesses the same properties as H_{φ} does. For either H_{φ}= 0 or
H_{ψ} = 0, we consider the smoothing Newton method to solve it. The algorithm is shown
to be globally convergent, specifically, the iterative sequence converges to the unique so-
lution to (3). Furthermore, the algorithm is shown to be locally quadratically convergent
without any assumptions.

The paper is organized as follows. In Section 2 and Section 3, we introduce two smooth support vector machine reformulations by two types of smoothing functions. In Section 4, we propose a smoothing Newton algorithm and study its global and local quadratic convergence. Numerical results and comparisons are reported in Section 5.

Throughout this paper, K := {1, 2, · · · }, all vectors will be column vectors. For a given
vector x = (x_{1}, . . . , x_{n})^{T} ∈ IR^{n}, the plus function [x]_{+} is defined as

([x]_{+})_{i} = max{0, x_{i}}, i = 1, · · · , n.

For a differentiable function f , we denote by ∇f (x) and ∇^{2}f (x) the gradient and the
Hessian matrix of f at x, respectively. For a differentiable mapping G : IR^{n} → IR^{m}, we
denote by G^{0}(x) ∈ IR^{m×n} the Jacabian of G at x. For a matrix A ∈ IR^{m×n}, A^{T}_{i} is the
i-th row of A. A column vector of ones and identity matrix of arbitrary dimension will
be denoted by 1 and I, respectively. We denote the sign function by

sgn(x) =

1 if x > 0, [−1, 1] if x = 0,

−1 if x < 0.

### 2 The first smooth support vector machine

As mentioned in [9], it is known that ε-SVR can be reformulated as a strongly convex
unconstrained optimization problem (3). Denote ω := (x, b) ∈ IR^{n+1}, ¯A := (A, 1) and
A¯^{T}_{i} is the i-th row of ¯A, then the smooth support vector regression (3) can be rewritten
as

minω

1

2ω^{T}ω + C
2

m

X

i=1

¯A^{T}_{i} ω − y_{i}

2

ε. (5)

Note that | · |^{2}_{ε} is smooth, but not twice differentiable, which means the objective func-
tion is not twice continuously differentiable. Hence, the Newton-type method cannot be
applied to solve (5) directly.

In view of this fact, we propose a family of twice continuously differentiable functions
φ_{ε}(x, α) to replace |x|^{2}_{ε}. The family of functions φ_{ε}(x, α) : IR × IR_{+}→ IR_{+} is given by

φ_{ε}(x, α) =

(|x| − ε)^{2}+^{1}_{3}α^{2} if |x| − ε ≥ α,

1

6α(|x| − ε + α)^{3} if ||x| − ε| < α,
0 if |x| − ε ≤ −α,

(6)

where 0 < α < ε is a smooth parameter. The graphs of φ_{ε}(x, α) are depicted in Figure 1.

From this geometric view, it is clear to see that φε(x, α) is a class of smoothing functions
for |x|^{2}_{ε}.

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

x 0

0.002 0.004 0.006 0.008 0.01 0.012 0.014

|x|^{2}
(x, 0.03)
(x, 0.06)
(x, 0.09)

Figure 1: Graphs of φ_{ε}(x, α) with ε = 0.1 and α = 0.03, 0.06, 0.09.

Besides the geometric approach, we hereat show that φε(x, α) is a class of smoothing
functions for |x|^{2}_{ε}by algebraic verification. To this end, we compute the partial derivatives
of φ_{ε}(x, α) as below:

∇_{x}φ_{ε}(x, α) =

2(|x| − ε)sgn(x) if |x| − ε ≥ α,

1

2α(|x| − ε + α)^{2}sgn(x) if

|x| − ε < α,

0 if |x| − ε ≤ −α.

(7)

∇^{2}_{xx}φ_{ε}(x, α) =

2 if |x| − ε ≥ α

|x|−ε+α

α if

|x| − ε < α, 0 if |x| − ε ≤ −α.

(8)

∇^{2}_{xα}φ_{ε}(x, α) =

0 if |x| − ε ≥ α,

(|x|−ε+α)(α−|x|+ε)

2α^{2} sgn(x) if

|x| − ε < α,

0 if |x| − ε ≤ −α.

(9)

With the above, the following lemma shows some basic properties of φ_{ε}(x, α).

Lemma 2.1. Let φ_{ε}(x, α) be defined as in (6). Then, the following hold.

(a) For 0 < α < ε, there holds 0 ≤ φ_{ε}(x, α) − |x|^{2}_{ε} ≤ ^{1}_{3}α^{2}.

(b) The function φ_{ε}(x, α) is twice continuously differentiable with respect to x for 0 <

α < ε.

(c) lim

α→0φ_{ε}(x, α) = |x|^{2}_{ε} and lim

α→0∇_{x}φ_{ε}(x, α) = ∇(|x|^{2}_{ε}).

Proof. (a) To complete the arguments, we need to discuss four cases.

(i) For |x| − ε ≥ α, it is clear that φ_{ε}(x, α) − |x|^{2}_{ε} = ^{1}_{3}α^{2}.

(ii) For 0 < |x| − ε < α, i.e., 0 < x − ε < α or 0 < −x − ε < α, there have two subcase.

If 0 < x − ε < α, letting f (x) := φε(x, α) − |x|^{2}_{ε} = _{6α}^{1} (x − ε + α)^{3}− (x − ε)^{2} gives
(

f^{0}(x) = ^{(x−ε+α)}_{2α} ^{2} − 2(x − ε), ∀x ∈ (ε, ε + α),
f^{00}(x) = ^{x−ε+α}_{α} − 2 < 0, ∀x ∈ (ε, ε + α).

This indicates that f^{0}(x) is monotone decreasing on (ε, ε + α), which further implies
f^{0}(x) ≥ f^{0}(ε + α) = 0, ∀x ∈ (ε, ε + α).

Thus, we obtain that f (x) is monotone increasing on (ε, ε + α). With this, we have
f (x) ≤ f (ε + α) = ^{1}_{3}α^{2}, which yields

φ_{ε}(x, α) − |x|^{2}_{ε} ≤ 1

3α^{2}, ∀x ∈ (ε, ε + α).

If 0 < −x − ε < α, the arguments are similar as above, and we omit them.

(iii) For −α < |x| − ε ≤ 0, it is clear that φ_{ε}(x, α) − |x|^{2}_{ε} = _{6α}^{1} (|x| − ε + α)^{3} ≤ ^{α}_{6α}^{3} ≤ ^{α}_{3}^{2}.
(iv) For |x| − ε ≤ −α, we have φ_{ε}(x, α) − |x|^{2}_{ε} = 0. Then, the desired result follows.

(b) To prove the twice continuous differentiability of φε(x, α), we need to check φε(·, α),

∇_{x}φ_{ε}(·, α) and ∇^{2}_{xx}φ_{ε}(·, α) are all continuous. Since they are piecewise functions, it
suffices to check the junction points.

First, we check that φ_{ε}(·, α) is continuous.

(i) If |x| − ε = α, then φ_{ε}(x, α) = ^{4}_{3}α^{2}, which implies φ_{ε}(·, α) is continuous.

(ii) If |x| − ε = −α, then φ_{ε}(x, α) = 0. Hence, φ_{ε}(·, α) is continuous.

Next, we check ∇_{x}φ_{ε}(·, α) is continuous.

(i) If |x| − ε = α, then ∇_{x}φ_{ε}(x, α) = 2α sgn(x).

(ii) If |x| − ε = −α, then ∇_{x}φ_{ε}(x, α) = 0. From the above, it clear to see that ∇_{x}φ_{ε}(·, α)
is continuous.

Now we show that ∇^{2}_{xx}φε(·, α) is continuous. (i) If |x| − ε = α, ∇^{2}_{xx}φε(x, α) = 2.

(ii) |x| − ε = −α then ∇^{2}_{xx}φ_{ε}(x, α) = 0. Hence, ∇^{2}_{xx}φ_{ε}(·, α) is continuous.

(c) It is clear that lim

α→0φε(x, α) = |x|^{2}_{ε}holds by part(a). It remains to verify lim

α→0∇xφε(x, α) =

∇(|x|^{2}_{ε}). First, we compute that

∇(|x|^{2}_{ε}) = 2(|x| − ε)sgn(x) if |x| − ε ≥ 0,

0 if |x| − ε < 0. (10)

In light of (10), we proceed the arguments by discussing four cases.

(i) For |x| − ε ≥ α, we have ∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε}) = 0. Then, the desired result follows.

(ii) For 0 < |x| − ε < α, we have

∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε}) = 1

2α(|x| − ε + α)^{2}sgn(x) − 2(|x| − ε)sgn(x)
which yields

α→0lim(∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε})) = lim

α→0

(|x| − ε + α)^{2}− 4α(|x| − ε)

2α sgn(x).

We notice that |x| → ε when α → 0, and hence ^{(|x|−ε+α)}_{2α}^{2}^{−4α(|x|−ε)} → ^{0}_{0}. Then, applying
L’hopital rule yields

α→0lim

(|x| − ε + α)^{2}− 4α(|x| − ε)

2α = lim

α→0(α − (|x| − ε)) = 0.

This implies lim

α→0(∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε})) = 0, which is the desired result.

(iii) For −α < |x| − ε ≤ 0, we have ∇xφε(x, α) − ∇(|x|^{2}_{ε}) = _{2α}^{1} (|x| − ε + α)^{2}sgn(x). Then,
applying L’hopital rule gives

α→0lim

(|x| − ε + α)^{2}

2α = lim

α→0(|x| − ε + α) = 0.

Thus, we prove that lim

α→0(∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε})) = 0 under this case.

(iv) For |x| − ε ≤ −α, we have ∇_{x}φ_{ε}(x, α) − ∇(|x|^{2}_{ε}) = 0. Then, the desired result follows
clearly. 2

Now, we use the family of smoothing functions φ_{ε}to replace the square of ε-insensitive
loss function in (5) to obtain the first smooth support vector regression. In other words,
we consider

minω F_{ε,α}(ω) := 1

2ω^{T}ω + C

21^{T}Φ_{ε} Aω − y, α .¯ (11)

where ω := (x, b) ∈ IR^{n+1}, and Φ_{ε}(Ax + 1b − y, α) ∈ IR^{m} is defined by
Φ_{ε}(Ax + 1b − y, α)_{i} = φ_{ε}(A_{i}x + b − y_{i}, α) .

This is a strongly convex unconstrained optimization with the twice continuously differ- entiable objective function. Noting lim

α→0φ_{ε}(x, α) = |x|^{2}_{ε}, we see that
minω F_{ε,0}(ω) := lim

α→0F_{ε,α}(ω) = 1

2ω^{T}ω + C
2

m

X

i=1

¯A^{T}_{i}ω − y_{i}

2

ε (12)

which is exactly the problem (5).

The following Theorem shows that the unique solution of the smooth problem (11) approaches to the unique solution of the problem (12) as α → 0. Indeed, it plays as the same role as [9, Theorem 2.2].

Theorem 2.1. Let F_{ε,α}(ω) and F_{ε,0}(ω) be defined as in (11) and (12), respectively. Then,
the following hold.

(a) There exists a unique solution ¯ω_{α} to min

ω∈IR^{n+1}

F_{ε,α}(ω) and a unique solution ¯ω to
min

ω∈IR^{n+1}F_{ε,0}(ω).

(b) For all 0 < α < ε, we have the following inequality:

k¯ω_{α}− ¯ωk^{2} ≤ 1

6Cmα^{2}. (13)

Moreover, ¯ω_{α} converges to ¯ω as α → 0 with an upper bound given by (13).

Proof. (a) In view of φ_{ε}(x, α) − |x|^{2}_{ε} ≥ 0 in Lemma 2.1(a), we see that the level sets
L_{v}(F_{ε,α}(ω)) := ω ∈ IR^{n+1}| F_{ε,α}(ω) ≤ v

L_{v}(F_{ε,0}(ω)) := ω ∈ IR^{n+1}| F_{ε,0}(ω) ≤ v
satisfy

L_{v}(F_{ε,α}(ω)) ⊆ L_{v}(F_{ε,0}(ω)) ⊆ω ∈ IR^{n+1}| ω^{T}ω ≤ 2v

(14)
for any v ≥ 0. Hence, we obtain that L_{v}(F_{ε,α}(ω)) and L_{v}(F_{ε,0}(ω)) are compact (closed
and bounded) subsets in IR^{n+1}. Then, by the strong convexity of F_{ε,0}(ω) and F_{ε,α}(ω)
with α > 0, each of the problems min

ω∈IR^{n+1}F_{ε,α}(ω) and min

ω∈IR^{n+1}F_{ε,0}(ω) has a unique solution.

(b) From the optimality condition and strong convexity of Fε,0(ω) and Fε,α(ω) with α > 0, we know that

F_{ε,0}(¯ω_{α}) − F_{ε,0}(¯ω) ≥ ∇F_{ε,0}(¯ω_{α}− ¯ω) + 1

2k¯ω_{α}− ¯ωk^{2} ≥ 1

2k¯ω_{α}− ¯ωk^{2}, (15)

F_{ε,α}(¯ω) − F_{ε,α}(¯ω_{α}) ≥ ∇F_{ε,α}(¯ω − ¯ω_{α}) + 1

2k¯ω − ¯ω_{α}k^{2} ≥ 1

2k¯ω − ¯ω_{α}k^{2}. (16)
Note that F_{ε,α}(ω) ≥ F_{ε,0}(ω) because φ_{ε}(x, α) − |x|^{2}_{ε} ≥ 0. Then, adding up (15) and (16)
along with this fact yield

k¯ω_{α}− ¯ωk^{2} ≤ (F_{ε,α}(¯ω) − F_{ε,0}(¯ω)) − (F_{ε,α}(¯ω_{α}) − F_{ε,0}(¯ω_{α}))

≤ F_{ε,α}(¯ω) − F_{ε,0}(¯ω)

= C

21^{T}Φ_{ε}( ¯A¯ω − y, α) −C
2

m

X

i=1

¯A^{T}_{i} ω − y¯ _{i}

2 ε

= C

2

m

X

i=1

φ_{ε}( ¯A_{i}ω − y¯ _{i}, α) − C
2

m

X

i=1

¯A^{T}_{i} ω − y¯ _{i}

2 ε

≤ 1

6Cmα^{2},

where the last inequality is due to Lemma 2.1(a). It is clear that ¯ω_{α} converges to ¯ω as
α → 0 with an upper bound given by the above. Then, the proof is complete. 2

Next, we focus on the optimality condition of the minimization problem (11), which is indeed sufficient and necessary for (11) and has the form of

∇_{ω}F_{ε,α}(ω) = 0.

With this, we define a function H_{φ}: IR^{n+2} → IR^{n+2} by
Hφ(z) =

α

∇_{ω}F_{ε,α}(ω)

=

α

ω + CPm

i=1∇_{x}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ¯A_{i}

(17)
where z := (α, ω) ∈ IR^{n+2}. From Lemma 2.1 and the strong convexity of F_{ε,α}(ω), it is
easy to see that if H_{φ}(z) = 0, then α = 0 and ω solves (11); and for any z ∈ IR_{++}× IR^{n+1},
the function H_{φ} is continuously differentiable. In addition, the Jacobian of H_{φ} can be
calculated as below:

H_{φ}^{0}(z) =

1 0

∇^{2}_{ωα}F_{ε,α}(ω) ∇^{2}_{ωω}F_{ε,α}(ω)

(18) where

∇^{2}_{ωα}F_{ε,α}(ω) = C

m

X

i=1

∇^{2}_{xα}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ¯A_{i},

∇^{2}_{ωω}F_{ε,α}(ω) = I + C

m

X

i=1

∇^{2}_{xx}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ¯A_{i}A¯^{T}_{i} .

From (8), we can see ∇^{2}_{xx}φ_{ε}(x, α) ≥ 0, which implies CPm

i=1∇^{2}_{xx}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ¯A_{i}A¯^{T}_{i}
is positive semidefinite. Hence, ∇^{2}_{ωω}F_{ε,α}(ω) is positive definite. This helps us to prove

that H_{φ}^{0}(z) is invertible at any z ∈ IR_{++}× IR^{n+1}. In fact, if there exists a vector d :=

(d_{1}, d_{2}) ∈ IR × IR^{n+1} such that H_{φ}^{0}(z)d = 0, then we have

d_{1}

d_{1}∇^{2}_{ωα}F_{ε,α}(ω) + ∇^{2}_{ωω}F_{ε,α}(ω)d_{2}

= 0.

This implies that d = 0, and hence H_{φ}^{0}(z) is invertible at any z ∈ IR_{++}× IR^{n+1}.

### 3 The second smooth support vector machine

In this section, we consider another type of smoothing functions ψ_{ε,p}(x, α) : IR × IR_{+} →
IR_{+}, which is defined by

ψε,p(x, α) =

0 if 0 ≤ |x| ≤ ε − α,

α p−1

h(p−1)(|x|−ε+α) pα

ip

if ε − α < |x| < ε + _{p−1}^{α} ,

|x| − ε if |x| ≥ ε + _{p−1}^{α} .

(19)

Here p ≥ 2. The graphs of ψ_{ε,p}(x, α) are depicted in Figure 2, which clearly verify that
ψ_{ε,p}(x, α) is a family of smoothing functions for |x|_{ε}.

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

x 0

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

|x|

,p(x, 0.03) ,p(x, 0.06) ,p(x, 0.09)

Figure 2: Graphs of ψ_{ε,p}(x, α) with ε = 0.1, α = 0.03, 0.06, 0.09 and p = 2.

As in Lemma 3.1, we verify that ψ_{ε,p}(x, α) is a family of smoothing functions for |x|_{ε},
hence, ψ_{ε,p}^{2} (x, α) is also a family of smoothing functions for |x|^{2}_{ε}. Then, we can employ
ψ_{ε,p}^{2} to replace the square of ε-insensitive loss function in (5) as the same way done in

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 x

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014

|x|^{2}
(x, 0.06)
(x, 0.09)

,p 2 (x, 0.06)

,p 2 (x, 0.09)

Figure 3: Graphs of |x|^{2}_{ε}, φε(x, α) and ψ^{2}_{ε,p}(x, α) with ε = 0.1, α = 0.06, 0.09 and p = 2.

Section 2. The graphs of ψ_{ε,p}^{2} (x, α) with comparison to φ_{ε}(x, α) are depicted in Figure 3.

In fact, there is a relation between ψ_{ε,p}^{2} (x, α) and φ_{ε}(x, α) shown as in Proposition 3.1.

In other words, we obtain an alternative strongly convex unconstrained optimization for (5):

minω

1

2ω^{T}ω + C
2

m

X

i=1

ψ_{ε,p}^{2} A¯^{T}_{i} ω − y_{i}, α . (20)
However, the smooth function ψ^{2}_{ε,p}(x, α) is not twice differentiable with respect x, and
hence the objective function of (20) is not twice differentiable although it smooth. Then,
we still cannot apply Newton-type method to solve (20). To conquer this, we take
another smoothing technique. Before presenting the idea of this smoothing technique,
the following two lemmas regarding properties of ψε,p(x, α) are needed. To this end, we
also compute the partial derivative of ψ_{ε,p}(x, α) as below:

∇_{x}ψ_{ε,p}(x, α) =

0 if 0 ≤ |x| ≤ ε − α,

sgn(x)h(p−1)(|x|−ε+α) pα

ip−1

if ε − α < |x| < ε + _{p−1}^{α} ,
sgn(x) if |x| ≥ ε + _{p−1}^{α} .

∇_{α}ψ_{ε,p}(x, α) =

0 if 0 ≤ |x| ≤ ε − α,

(ε−|x|)(p−1)+α pα

h(p−1)(|x|−ε+α) pα

ip−1

if ε − α < |x| < ε + _{p−1}^{α} ,
0 if |x| ≥ ε + _{p−1}^{α} .

Lemma 3.1. Let ψ_{ε,p}(x, α) be defined as in (19). Then, we have
(a) ψ_{ε,p}(x, α) is smooth with respect to x for any p ≥ 2;

(b) lim

α→0ψ_{ε,p}(x, α) = |x|_{ε} for any p ≥ 2.

Proof. (a) To prove the result, we need to check both ψ_{ε,p}(·, α) and ∇_{x}ψ_{ε,p}(·, α) are
continuous.

(i) If |x| = ε − α, then ψ_{ε,p}(x, α) = 0.

(ii) If |x| = ε +_{p−1}^{α} , then ψ_{ε,p}(x, α) = _{p−1}^{α} . Form (i) and (ii), it is clear to see ψ_{ε,p}(·, α) is
continuous.

Moreover, (i) If |x| = ε − α, then ∇_{x}ψ_{ε,p}(x, α) = 0.

(ii) If |x| = ε + _{p−1}^{α} , then ∇xψε,p(x, α) = sgn(x). In view of (i) and (ii), we see that

∇_{x}ψ_{ε,p}(·, α) is continuous.

(b) To proceed, we discuss four cases.

(1) If 0 ≤ |x| ≤ ε − α, then ψ_{ε,p}(x, α) − |x|_{ε} = 0. Then, the desired result follows.

(2) If ε − α ≤ |x| ≤ ε, then ψ_{ε,p}(x, α) − |x|_{ε}= _{p−1}^{α} h

(p−1)(|x|−ε+α) pα

ip

. Hence,

α→0lim

ψ_{ε,p}(x, α) − |x|_{ε}

= lim

α→0

α

p − 1

(p − 1)(|x| − ε + α) pα

p

= lim

α→0

α

p − 1

α→0lim

(p − 1)(|x| − ε + α) pα

p

.

It is clear that the first limit is zero, so we only need to show that the second limit is bounded. To this end, we rewrite it as

α→0lim

(p − 1)(|x| − ε + α) pα

p

= lim

α→0

p − 1 p

p

|x| − ε + α α

p

.

We notice that |x| → ε when α → 0 so that ^{|x|−ε+α}_{α} → ^{0}_{0}. Therefore, by applying
L’hopital’s rule, we obtain

α→0lim

|x| − ε + α α

= 1

which implies that lim_{α→0}

ψ_{ε,p}(x, α) − |x|_{ε}

= 0 under this case.

(3) If ε ≤ |x| ≤ ε +_{p−1}^{α} , then

ψε,p(x, α) − |x|ε= α p − 1

(p − 1)(|x| − ε + α) pα

p

− (|x| − ε).

We have shown in case (2) that

α→0lim α p − 1

(p − 1)(|x| − ε + α) pα

p

= 0.

It is also obvious that lim_{α→0}(|x|−ε) = 0. Hence, we obtain lim_{α→0} ψ_{ε,p}(x, α)−|x|_{ε} = 0
under this case.

(4) If |x| ≥ ε + _{p−1}^{α} , the desired result follows since it is clear that ψ_{ε,p}(x, α) − |x|_{ε} = 0.

From all the above, the proof is complete. 2

Lemma 3.2. Let ψε,p(x, α) be defined as in (19). Then, we have
(a) ψ_{ε,p}(x, α)sgn(x) is smooth with respect to x for any p ≥ 2;

(b) lim

α→0ψ_{ε,p}(x, α)sgn(x) = |x|_{ε}sgn(x) for any p ≥ 2.

Proof. (a) First, we observe that ψ_{ε,p}(x, α)sgn(x) can be written as

ψ_{ε,p}(x, α)sgn(x) =

0 if 0 ≤ |x| ≤ ε − α,

α p−1

h(p−1)(|x|−ε+α) pα

ip

sgn(x) if ε − α < |x| < ε + _{p−1}^{α} ,
(|x| − ε)sgn(x) if |x| ≥ ε +_{p−1}^{α} .

Note that sgn(x) is continuous at x 6= 0 and ψε,p(x, α) = 0 at x = 0, then applying
Lemma 3.1(a) yields ψ_{ε,p}(x, α)sgn(x) is continuous. Furthermore, by simple calculations,
we have

∇_{x}(ψ_{ε,p}(x, α)sgn(x)) = ∇_{x}ψ_{ε,p}(x, α)sgn(x)

=

0 if 0 ≤ |x| ≤ ε − α, h(p−1)(|x|−ε+α)

pα

ip−1

if ε − α < |x| < ε + _{p−1}^{α} ,
1 if |x| ≥ ε +_{p−1}^{α} .

(21)

Mimicking the arguments as in Lemma 3.1(a), we can verify that ∇x(ψε,p(x, α)sgn(x)) is continuous. Thus, the desired result follows.

(b) By Lemma 3.1(b), it is easy to see that lim

α→0ψ_{ε,p}(x, α)sgn(x) = |x|_{ε}sgn(x). Then, the
desired result follows. 2

Note that |x|^{2}_{ε} is smooth with

∇(|x|^{2}_{ε}) = 2|x|_{ε}sgn(x) = 2(|x| − ε)sgn(x) if |x| > ε,
0 if |x| ≤ ε.

being continuous (but not differentiable). Then, we consider the optimality condition of (12), that is

∇_{ω}F_{ε,0}(ω) = ω + C

m

X

i=1

| ¯A^{T}_{i} ω − y_{i}|_{ε}sgn( ¯A^{T}_{i} ω − y_{i}) ¯A_{i} = 0, (22)

which is indeed sufficient and necessary for (5). Hence, solving (22) is equivalent to solv- ing (5).

Using the family of smoothing functions ψ_{ε,p} to replace ε-loss function of (22) leads to
a system of smooth equations. More specifically, we define a function Hψ : IR^{n+2} → IR^{n+2}
by

H_{ψ}(z) = H_{ψ}(α, ω) =

α

ω + CPm

i=1ψ_{ε}( ¯A^{T}_{i} ω − y_{i}, α)sgn( ¯A^{T}_{i}ω − y_{i}) ¯A_{i}

where z := (α, ω) ∈ IR^{n+2}. From Lemma 3.1, it is easy to see that if H_{ψ}(z) = 0, then
α = 0 and ω is the solution of the equations (22), i.e., the solution of (12). Moreover, for
any z ∈ IR_{++}× IR^{n+1}, the function H_{ψ} is continuously differentiable with

H_{ψ}^{0}(z) =

1 0

E(ω) I + D(ω)

(23) where

E(ω) = C

m

X

i=1

∇αψε( ¯A^{T}_{i} ω − yi, α)sgn( ¯A^{T}_{i} ω − yi) ¯Ai,

D(ω) = C

m

X

i=1

∇_{x}ψ_{ε}( ¯A^{T}_{i} ω − y_{i}, α)sgn( ¯A^{T}_{i} ω − y_{i}) ¯A_{i}A¯^{T}_{i} .

Because ∇_{x}ψ_{ε}( ¯A^{T}_{i} ω − y_{i}, α)sgn( ¯A^{T}_{i} ω − y_{i}) is nonnegative for any α > 0 from (21), we
see that I + D(x) is positive definite at any z ∈ IR_{++} × IR^{n+1}. Following the similar
arguments as in Section 2, we obtain that H_{ψ}^{0}(z) is invertible at any z ∈ IR_{++}× IR^{n+1}.
Proposition 3.1. Let φ_{ε}(x, α) be defined as in (6) and ψ_{ε,p}(x, α) be defined as in (19).

Then, the following hold.

(a) For p ≥ 2, we have φ_{ε}(x, α) ≥ ψ^{2}_{ε,p}(x, α) ≥ |x|^{2}_{ε}.
(b) For p ≥ q ≥ 2, we have ψε,q(x, α) ≥ ψε,p(x, α).

Proof. (a) First, we show that φ_{ε}(x, α) ≥ ψ^{2}_{ε,p}(x, α) holds. To proceed, we discuss four
cases.

(i) If |x| ≤ ε − α, then φ_{ε}(x, α) = 0 = ψ_{ε,p}^{2} (x, α).

(ii) If ε − α < |x| < ε + _{p−1}^{α} , then |x| ≤ ε + _{p−1}^{α} which is equivalent to _{|x|−ε+α}^{1} ≥ ^{p−1}_{αp}.
Thus, we have

φ_{ε}(x, α)

ψ_{ε,p}^{2} (x, α) = α^{2p−3}p^{2p}

6(p − 1)^{2p−2}(|x| − ε + α)^{2p−3} ≥ p^{3}

6(p − 1) ≥ 1,
which implies φ_{ε}(x, α) ≥ ψ_{ε,p}^{2} (x, α).

(iii) For ε + _{p−1}^{α} ≤ |x| < ε + α, letting t := |x| − ε ∈ [_{p−1}^{α} , α) yields
φ_{ε}(x, α) − ψ_{ε,p}^{2} (x, α) = 1

6α(t + α)^{3} − t^{2} = t 1

6αt^{2}− 1
2t + 1

2α

+ 1

6α^{2} ≥ 0.

Here the last inequality follows from the fact that discriminant of _{6α}^{1} t^{2}− ^{1}_{2}t + ^{1}_{2}α is less
than 0 and _{6α}^{1} > 0. Then, φ_{ε}(x, α) − ψ^{2}_{ε,p}(x, α) > 0.

(iv) If |x| ≥ ε + α, then it is clear that φ_{ε}(x, α) = (|x| − ε)^{2}+ ^{1}_{3}α^{2} ≥ (|x| − ε)^{2} = ψ_{ε,p}^{2} .
Now we show that the other part ψ_{ε,p}(x, α) ≥ |x|_{ε}, which is equivalent to verifying
ψ_{ε,p}^{2} (x, α) ≥ |x|^{2}_{ε}. Again, we discuss four cases.

(i) If |x| ≤ ε − α, then ψ_{ε,p}(x, α) = 0 = |x|_{ε}.

(ii) If ε − α < |x| ≤ ε, then ε − α < |x| which says |x| − ε + α > 0. Thus, we have ψε,p(x, α) ≥ 0 = |x|ε.

(iii) For ε < |x| < ε + _{p−1}^{α} , we let t := |x| − ε ∈ (0,_{p−1}^{α} ) and define a function as
f (t) = α

p − 1

(p − 1)(t + α) pα

p

− t,

which is a function onh

0,_{p−1}^{α} i

. Note that f (|x|−ε) = ψ_{ε,p}(x, α)−|x|_{ε}for |x| ∈ (ε, ε+_{p−1}^{α} )
and observe that

f^{0}(t) = (p − 1)(t + α)
pα

p−1

− 1 ≤ (p − 1)(_{p−1}^{α} + α)
pα

!p−1

− 1 = 0.

This meansf (t) is monotone decreasing on (0,_{p−1}^{α} ). Since f (_{p−1}^{α} ) = 0, we have f (t) ≥ 0
for t ∈ (0,_{p−1}^{α} ), which implies ψ_{ε,p}(x, α) ≥ |x|_{ε} for |x| ∈ (ε, ε + _{p−1}^{α} ).

(iv) If |x| ≥ ε + _{p−1}^{α} , then it is clear that ψ_{ε,p}(x, α) = |x| − ε = |x|_{ε}.
(b) For p ≥ q ≥ 2, it is obvious to see that

ψε,q(x, α) = ψε,p(x, α) for |x| ∈ [0, ε − α] ∪ [ε + α

q − 1, +∞).

If |x| ∈ [ε + _{p−1}^{α} , ε + _{q−1}^{α} ), then ψ_{ε,p}(x, α) = |x|_{ε} ≤ ψ_{ε,q}(x, α) from the above. Thus, we
only need to prove the case of |x| ∈ (ε − α, ε +_{p−1}^{α} ).

Consider |x| ∈ (ε − α, ε + _{p−1}^{α} ) and t := |x| − ε + α, we observe that ^{α}_{t} ≥ ^{p−1}_{p} . Then, we
verify that

ψ_{ε,q}(x, α)

ψ_{ε,p}(x, α) = (q − 1)^{q−1}p^{p}
(p − 1)^{p−1}q^{q} ·α

t

p−q

≥ (q − 1)^{q−1}p^{p}

(p − 1)^{p−1}q^{q} · p − 1
p

p−q

= p q

q

· q − 1 p − 1

q−1

=

1 + ^{p−q}_{q} q

1 + ^{p−q}_{q−1}q−1

≥ 1,

where the last inequality is due to (1 +^{p−q}_{x} )^{x} being increasing for x > 0. Thus, the proof
is complete. 2

### 4 A smoothing Newton algorithm

In Section 2 and Section 3, we construct two systems of smooth equations: H_{φ}(z) = 0
and Hψ(z) = 0. We briefly describe the difference between Hφ(z) = 0 and Hψ(z) = 0.

In general, the way we come up with H_{φ}(z) = 0 and H_{ψ}(z) = 0 is a bit different. For
achieving H_{φ}(z) = 0, we first use the twice continuously differentiable functions φ_{ε}(x, α)
to replace |x|^{2}_{ε} in problem (5), and then write out its KKT condition. To the contrast,
for achieving Hψ(z) = 0, we write out the KKT condition of problem (5) first, then
we use the smoothing functions ψ_{ε,p}(x, α) to replace ε-loss function of (22) therein. For
convenience, we denote ˜H(z) ∈ {H_{φ}(z), H_{ψ}(z)}. In other words, ˜H(z) possesses the
property that if ˜H(z) = 0, then α = 0 and ω solves (12). In view of this, we apply
some Newton-type methods to solve the system of smooth equations ˜H(z) = 0 at each
iteration and letting α → 0 so that the solution to the problem (12) can be found.

Algorithm 4.1. (A smoothing Newton method)

Step 0 Choose δ ∈ (0, 1), σ ∈ (0,^{1}_{2}), and α_{0} > 0. Take τ ∈ (0, 1) such that τ α_{0} < 1. Let
ω0 ∈ IR^{n+1} be an arbitrary vector. Set z^{0} := (α0, ω0). Set e^{0} := (1, 0, . . . , 0) ∈ IR^{n+2}.
Step 1 If k ˜H(z^{k})k = 0, stop.

Step 2 Define function Γ, β by

Γ(z) := k ˜H(z^{k})k^{2} and β(z) := τ min{1, Γ(z)}. (24)

Compute 4z^{k} := (4α_{k}, 4x^{k}) by

H(z˜ ^{k}) + ˜H^{0}(z^{k})4z^{k}= α_{0}β(z^{k})e^{0}.
Step 3 Let θk be the maximum of the values 1, δ, δ^{2}, · · · such that

Γ(z^{k}+ λ_{k}∆z^{k}) ≤ [1 − 2σ(1 − γα_{0})θ_{k}]Γ(z^{k}). (25)
Step 4 Set z^{k+1} := z^{k}+ θ_{k}∆z^{k}, and k := k + 1, Go to step 1.

Proposition 4.1. Suppose that the sequence {z^{k}} is generated by Algorithm 4.1. Then,
the following results hold.

(a) {Γ(z^{k})} is monotonically decreasing.

(b) { ˜H(z^{k})} and {β(z^{k})} are monotonically decreasing.

(c) Let N (τ ) := {z ∈ IR_{+}× IR^{n+1} : α_{0}β(z) ≤ α}, then z^{k} ∈ N (τ ) for any k ∈ K and
0 < α_{k+1} ≤ α_{k}.

(d) The algorithm is well defined.

Proof. Since the proof is much similar to [6, Remark 2.1], we omit it here. 2

Lemma 4.1. Let ¯λ := maxn

λ_{i}(Pm

i=1A¯_{i}A¯_{i}^{T})o

. Then, for any z ∈ IR_{++}× IR^{n+1}, we have
(a) 1 ≤ λ_{i}(H_{φ}^{0}(z)) ≤ 1 + 2¯λ, i = 1, · · · , n + 2;

(b) 1 ≤ λi(H_{ψ}^{0} (z)) ≤ 1 + ¯λ, i = 1, · · · , n + 2.

Proof. (a) H_{φ}^{0}(z) is continuously differentiable at any z ∈ IR_{++}× IR^{n+1}, and by (18), it
is easy to see that {1, λ_{1}(∇^{2}_{ωω}F_{ε,α}(ω)), · · · , λ_{n+1}(∇^{2}_{ωω}F_{ε,α}(ω))} are eigenvalues of H_{φ}^{0}(z).

From the representation of ∇^{2}_{xx}φ_{ε} in (8), we have 0 ≤ ∇^{2}_{xx}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ≤ 2. As

∇^{2}_{ωω}F_{ε,α}(ω) = I +Pm

i=1∇^{2}_{xx}φ_{ε}( ¯A^{T}_{i} ω − y_{i}, α) ¯A_{i}A¯^{T}_{i} , then

1 ≤ λi(∇^{2}_{ωω}Fε,α(ω)) ≤ 1 + 2¯λ(i = 1, · · · , n + 1). (26)
Thus the result (i) holds.

(b) Note that

∇_{x}ψ_{ε,p}(x, α)sgn(x) =

0 0 ≤ |x| ≤ ε − α,

h(p−1)(|x|−ε+α) pα

ip−1

ε − α < |x| < ε + _{p−1}^{α} ,

1 |x| ≥ ε +_{p−1}^{α} ,

which says 0 ≤ ∇_{x}ψ_{ε,p}(x, α) ≤ 1. Then, following the similar arguments as in part(a),
the result of part(b) cab be proved. 2

Proposition 4.2. { ˜H(α, ω)} is coercive for any fixed α > 0, i.e., lim_{kωk→+∞}k ˜H(α, ω)k =
+∞.

Proof. We first claim that {H_{φ}(α, ω)} is coercive for any fixed α > 0. By the definition
of Hφ(α, ω) in (17), kHφ(α, ω)k^{2} = α^{2}+ k∇ωFε,α(ω)k^{2}. Then for any fixed α > 0,

lim

kωk→+∞kH_{φ}(α, ω)k = +∞ ⇔ lim

kωk→+∞k∇_{ω}F_{ε,α}(ω)k = +∞.

By (26), we have k∇^{2}_{ωω}Fε,α(x, b)k ≥ 1. For any ω0 ∈ IR^{n+1},

k∇_{ω}F_{ε,α}(ω)k + k∇_{ω}F_{ε,α}(ω_{0})k ≥ k∇_{ω}F_{ε,α}(ω) − ∇_{ω}F_{ε,α}(ω_{0})k

= k∇^{2}_{ωω}F_{ε,α}(ˆω)(ω − ω_{0})k

≥ kω − ω0k,

where ˆω between ω_{0} and ω, then limkωk→+∞k∇_{ω}F_{ε,α}(ω)k = +∞.

By a similar proof, we can get {H_{ψ}(α, ω)} is coercive for any fixed α > 0.

From the above, ˜H(α, ω) ∈ {H_{φ}(α, ω), H_{ψ}(α, ω)} is coercive for any fixed α > 0. 2

Lemma 4.2. Let Ω ⊆ IR^{n+1} be a compact set and Γ(α, ω) be defined as in (24). Then,
for every ς > 0, there exists a ¯α > 0 such that

|Γ(α, ω) − Γ(0, ω)| ≤ ς for all ω ∈ Ω and all α ∈ [0, ¯α].

Proof. The function Γ(α, ω) defined as in (24) is continuous on the compact set [0, ¯α]×Ω.

The lemma is then an immediate consequence of the fact that every continuous function on a compact set is uniformly continuous there. 2

Lemma 4.3. (Mountain Pass Theorem [12, Theorem 9.2.7]) Suppose that g : IR^{m} →
IR is a continuously differentiable and coercive function. Let Ω ⊂ IR^{m} be a nonempty
and compact set and ξ be the minimum value of g on the boundary of Ω, i.e., ξ :=

min_{y∈∂Ω}g(y). Assume that there exist points a ∈ Ω and b /∈ Ω such that g(a) < ξ and
g(b) < ξ . Then, there exists a point c ∈ IR^{m} such that ∇g(c) = 0 and g(c) ≥ ξ.

Theorem 4.1. Suppose the sequence {z^{k}} is generated by Algorithm 4.1. Then, the
sequence {z^{k}} is bounded, and ω^{k} = (x^{k}, b^{k}) converges to the unique solution ω^{sol} =
(x^{sol}, b^{sol}) of problem (12).

Proof. (a) We first show that the sequence {z^{k}} is bounded. It is clear from Proposition
4.1(c) that the sequence {α_{k}} is bounded. In the following two cases, by assuming that
{ω^{k}} is unbounded, we will derive a contradiction. By passing to subsequence if necessary,
we assume kω^{k}k → +∞ as k → +∞. Then, we discuss two cases.

(i) If α∗ = lim

k→+∞α_{k} > 0, applying Proposition 4.1(b) yields that n ˜H(z^{k})o

is bounded.

In addition, by Proposition 4.2, we have lim

k→+∞

H(α˜ ∗, ω^{k}) = lim

kω^{k}k→+∞

H(α˜ ∗, ω^{k}) = +∞. (27)

Hence, a contradiction is reached.

(ii) If α∗ = lim

k→+∞α_{k} = 0, by assuming that {ω^{k}} is unbounded, there exists a compact
set Ω ⊂ IR^{n} with

ω^{sol}∈ Ω/ (28)

for all k sufficiently large. Since

¯

m := min

ω∈∂ΩΓ(0, ω) > 0, we can apply Lemma 4.2 with ς := ¯m/4 and conclude that

Γ(αk, ω^{sol}) ≤ 1

4m¯ (29)

and

ω∈∂ΩminΓ(α_{k}, ω) ≥ 3
4m¯

for all k sufficiently large. Since αk → 0 in this case, combining (24) and Proposition 4.1(c) gives

Γ(α_{k}, ω^{k}) = β(α_{k}, ω^{k}) ≤ α_{k}/α_{0}.
Hence,

Γ(α_{k}, ω^{k}) ≤ 1

4m¯ (30)

for all k sufficiently large. Now let us fix an index k such that (29) and (30) hold.

Applying the Mountain Pass Theorem 4.3 with a := ω^{sol} and b := ω^{k}, we obtain the
existence of a vector c ∈ IR^{n+1} such that

∇_{ω}Γ(α_{k}, c) = 0 and Γ(α_{k}, c) ≥ 3

4m > 0.¯ (31)

To derive a contradiction, we need to show that c is a global minimizer of Γ(αk, ω). Since
Γ(α_{k}, ω) ≥ α^{2}, it is sufficient to show Γ(α_{k}, c) = α^{2}. We discuss this in two cases:

• If ˜H = H_{p}, then

∇_{ω}Γ(α_{k}, c) = 2∇^{2}_{ωω}F_{ε,α}_{k}(c) ¯H_{p}(α_{k}, c)

where ¯Hp is the last n + 1 components of Hφ, i.e., ¯Hp = Hp(2 : n + 2). Then, using
(31) and the fact that ∇^{2}_{ωω}F_{ε,α}_{k}(c) is invertible for α_{k} > 0, we have ¯H_{p}(α_{k}, c) = 0.

Furthermore,

Γ(α_{k}, c) = kH(α_{k}, c)k^{2} = α^{2}.

• If ˜H = H_{ψ}, then

∇ωΓ(αk, c) = 2(I + D(ω)) ¯Hψ(αk, c)

where I + D(ω) is given by (23) and ¯H_{ψ} is the last n + 1 components of H_{ψ}. Since
I + D(ω) is invertible for α_{k}> 0, we obtain that Γ(α_{k}, c) = α^{2} by the same way as
in the above case.

(b) From Proposition 4.1, we know that sequences { ˜H(z^{k})} and {Γ(z^{k})} are non-negative
and monotone decreasing, and hence they are convergent. In addition, by using the
first result of this theorem, we obtain that the sequence {z^{k}} is bounded. Passing to
subsequence if necessary, we may assume that there exists a point z^{∗} = (α_{∗}, ω^{∗})IR_{++}×
IR^{n+1} such that lim_{k→+∞}z^{k}= z^{∗}, and hence,

k→+∞lim kH(z^{k})k = kH(z^{∗})k and lim

k→+∞Γ(z^{k}) = Γ(z^{∗}).

For H(z^{∗}) = 0, by a simple continuity discussion, we obtain that ω^{∗} is a solution to
problem (12). For the case of H(z^{∗}) > 0, and hence α^{∗} > 0, we will derive a contradiction.

First, by the assumption that H(z^{∗}) > 0, we have limk→+∞θk = 0. Thus, for any
sufficiently large k, the stepsize ˆθ_{k} := θ_{k}/δ does not satisfy the line search criterion (25),
i.e.,

Γ(z^{k}+ ˆθk4z^{k}) >

h

1 − 2σ(1 − γα0)ˆθk

i
Γ(z^{k}),
which implies that

Γ(z^{k}+ ˆθ_{k}4z^{k}) − Γ(z^{k})

θˆ_{k} > −2σ(1 − γα_{0})Γ(z^{k}).

Since α_{∗} > 0, it follows that Γ(z^{k}) is continuously differentiable at z^{∗}. Letting k → +∞

in the above inequality gives

−2σ(1 − γα_{0})Γ(z^{∗})

≤ 2 ˜H(z^{∗})^{T}H˜^{0}(z^{∗})4z^{∗} = 2 ˜H(z^{∗})^{T}(− ˜H(z^{∗}) + α_{0}β(z^{∗})e^{0})

= −2 ˜H(z^{∗})^{T}H(z˜ ^{∗}) + 2α_{0}β(z^{∗}) ˜H(z^{∗})^{T}e^{0}

≤ 2(−1 + γα_{0})Γ(z^{∗}).

This indicates that −1 + γα_{0}+ σ(1 − γα_{0}) ≥ 0, which contradicts the fact that γα_{0} < 1.

Thus, there should be ˜H(z^{∗}) = 0.

Because the unique solution to problem (12) is ω^{sol}, we have z^{∗} = (0, ω^{sol}) and the whole
sequence {z^{k}} converge to z^{∗}, that is,

lim

k→+∞z^{∗} = (0, ω^{sol}).

Then, the proof is complete. 2

In the following, we discuss the local convergence of Algorithm 4.1. To this end, we need the concept of semismoothness, which was originally introduced by Mifflin [7]

for functionals and was further extended to the setting of vector-valued functions by
Qi and Sun [14]. A locally Lipschitz continuous function F : IR^{n} → IR^{m}, which has
the generalized Jacobian ∂F (x) in the sense of Clarke [2], is said to be semismooth
(respectively, strongly semismooth) at x ∈ IR^{n}, if F is directionally differentiable at x
and

F (x + h) − F (x) − V h = o(khk) (= O(khk^{2}), respectively)
holds for any V ∈ ∂F (x + h).

Lemma 4.4. (a) Suppose that the sequence {z^{k}} is generated by Algorithm 4.1. Then,

H˜^{0}(z^{k})^{−1}
≤ 1.

(b) ˜H(z) is strongly semismooth at any z = (α, ω) ∈ IR^{n+2}.

Proof. (a) By Proposition 4.1 (c), we know that α_{k} > 0 for any k ∈ K. This together
with Lemma 4.1 leads to the desired result.

(b) We only provide the proof for the case of ˜H(α, ω) = H_{φ}(α, ω). For the other case of
H(α, ω) = H˜ _{ψ}(α, ω), the proof is similar and is omitted. First, we observe that H_{φ}^{0}(z)
is continuously differentiable and Lipschitz continuous at z ∈ IR_{++}× IR^{n+1} by Lemma
4.1(a). Thus, Hφ(z) is strongly semismooth at z ∈ IR++× IR^{n+1}. It remains to verify
that H_{φ}(z) is strongly semismooth at z ∈ {0} × IR^{n+1}. To see this, we recall that

∇_{x}φ_{ε}(x, 0) = 2(|x| − ε)sgn(x), |x| − ε ≥ 0;

0, |x| − ε ≤ −α.

It is a piecewise linear function, and hence ∇_{x}φ_{ε}(x, 0) is a strongly semismooth function.

In summary, H_{φ}(z) is strongly semismooth at z ∈ {0} × IR^{n+1}. 2

Theorem 4.2. Suppose that z^{∗} = (µ∗, x^{∗}) is an accumulation point of {z^{k}} generated by
Algorithm 4.1. Then, we have