Two smooth support vector machines for ε-insensitive regression
Weizhe Gu1 · Wei-Po Chen2 · Chun-Hsu Ko3 · Yuh-Jye Lee4 · Jein-Shan Chen2
Received: 6 June 2017 / Published online: 18 December 2017
© Springer Science+Business Media, LLC, part of Springer Nature 2017
Abstract In this paper, we propose two new smooth support vector machines for ε-insensitive regression. According to these two smooth support vector machines, we construct two systems of smooth equations based on two novel families of smoothing functions, from which we seek the solution toε-support vector regression (ε-SVR).
More specifically, using the proposed smoothing functions, we employ the smoothing Newton method to solve the systems of smooth equations. The algorithm is shown to be globally and quadratically convergent without any additional conditions. Numerical comparisons among different values of parameter are also reported.
J.-S. Chen work is supported by Ministry of Science and Technology, Taiwan.
B
Jein-Shan Chen jschen@math.ntnu.edu.tw Weizhe Guweizhegu@yahoo.com.cn Wei-Po Chen
weaper@gmail.com Chun-Hsu Ko chko@isu.edu.tw Yuh-Jye Lee
yuhjye@math.nctu.edu.tw
1 Department of Mathematics, School of Science, Tianjin University, Tianjin 300072, People’s Republic of China
2 Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan 3 Department of Electrical Engineering, I-Shou University, Kaohsiung 840, Taiwan
4 Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan
Keywords Support vector machine· ε-insensitive loss function · ε-smooth support vector regression· Smoothing Newton algorithm
1 Introduction
Support vector machine (SVM) is a popular and important statistical learning technol- ogy [1,7,8,16–19]. Generally speaking, there are two main categories for support vector machines (SVMs): support vector classification (SVC) and support vector regression (SVR). The model produced by SVR depends on a training data set S = {(A1, y1), . . . , (Am, ym)} ⊆ IRn × IR, where Ai ∈ IRn is the input data and yi ∈ IR is called the observation. The main goal of ε-insensitive regression with the idea of SVMs is to find a linear or nonlinear regression function f that has at mostε deviation from the actually obtained targets yifor all the training data, and at the same time is as flat as possible. This problem is calledε-support vector regression (ε-SVR).
For pedagogical reasons, we begin with the linear case, in which the regression function f() is defined as
f() = Tx+ b with x ∈ IRn, b ∈ IR. (1) Flatness in the case of (1) means that one seeks a small x. One way to ensure this is to minimize the norm of x, then the problemε-SVR can be formulated as a constrained minimization problem:
min 21xTx+ Cm
i=1(ξi+ ξi∗) s.t.
⎧⎪
⎨
⎪⎩
yi− AiTx− b ≤ ε + ξi
ATi x+ b − yi ≤ ε + ξi∗ ξi, ξi∗≥ 0, i = 1, . . . , m
(2)
The constant C > 0 determines the trade-off between the flatness of f and the amount up to which deviations larger thanε are tolerated. This corresponds to dealing with a so calledε-insensitive loss function |ξ|εdescribed by
|ξ|ε= max{0, |ξ| − ε}.
The formulation (2) is a convex quadratic minimization problem with n+ 1 free variables, 2m nonnegative variables, and 2m inequality constraints, which enlarges the problem size and could increase computational complexity.
In fact, the problem (2) can be reformulated as an unconstrained optimization problem:
(x,b)∈IRminn+1 1 2
xTx+ b2 +C
2 m i=1
AiTx+ b − yi2
ε (3)
This formulation has been proposed in active set support vector regression [11] and solved in its dual form. The objective function is strongly convex, hence, the problem
has a unique global optimal solution. However, according to the fact that the objective function is not twice continuously differentiable, Newton-type algorithms cannot be applied to solve (3) directly.
Lee, Hsieh and Huang [7] apply a smooth technique for (3). The smooth function
fε(x, α) = x + 1
αlog(1 + e−αx), (4)
which is the integral of the sigmoid function 1+e1−αx, is used to smooth the plus func- tion[x]+. More specifically, the smooth function fε(x, α) approaches to [x]+, whenα goes to infinity. Then, the problem (3) is recast to a strongly convex unconstrained min- imization problem with the smooth function fε(x, α) and a Newton-Armijo algorithm is proposed to solve it. It is proved that when the smoothing parameterα approaches to infinity, the unique solution of the reformulated problem converges to the unique solution of the original problem [7, Theorem 2.2]. However, the smoothing parameter α is fixed in the proposed algorithm, and in the implementation of this algorithm, α cannot be set large enough.
In this paper, we introduce two smooth support vector machines forε-insensitive regression. For the first smooth support vector machine, we reformulatedε-SVR to a strongly convex unconstrained optimization problem with one type of smoothing functionsφε(x, α). Then, we define a new function Hφ, which corresponds to the optimality condition of the unconstrained optimization problem. From the solution of Hφ(z) = 0, we can obtain the solution of ε-SVR. For the second smooth support vector machine, we smooth the optimality condition of the strongly convex uncon- strained optimization problem of (3) with another type of smooth functionsψε(x, α).
Accordingly we define the function Hψ, which also possesses the same properties as Hφdoes. For either Hφ = 0 or Hψ = 0, we consider the smoothing Newton method to solve it. The algorithm is shown to be globally convergent, specifically, the iterative sequence converges to the unique solution to (3). Furthermore, the algorithm is shown to be locally quadratically convergent without any assumptions.
The paper is organized as follows. In Sects.2 and3, we introduce two smooth support vector machine reformulations by two types of smoothing functions. In Sect.4, we propose a smoothing Newton algorithm and study its global and local quadratic convergence. Numerical results and comparisons are reported in Sect.5. Throughout this paper,K := {1, 2, . . .}, all vectors will be column vectors. For a given vector x= (x1, . . . , xn)T ∈ IRn, the plus function[x]+is defined as
([x]+)i = max{0, xi}, i = 1, . . . , n.
For a differentiable function f , we denote by∇ f (x) and ∇2f(x) the gradient and the Hessian matrix of f at x, respectively. For a differentiable mapping G : IRn → IRm, we denote by G (x) ∈ IRm×nthe Jacabian of G at x. For a matrix A∈ IRm×n, ATi is the i -th row of A. A column vector of ones and identity matrix of arbitrary dimension will be denoted by 1 and I , respectively. We denote the sign function by
sgn(x) =
⎧⎨
⎩
1 if x > 0, [−1, 1] if x = 0,
−1 if x < 0.
2 The first smooth support vector machine
As mentioned in [7], it is known thatε-SVR can be reformulated as a strongly convex unconstrained optimization problem (3). Denoteω := (x, b) ∈ IRn+1, ¯A:= (A, 1) and ¯ATi is the i -th row of ¯A, then the smooth support vector regression (3) can be rewritten as
minω
1
2ωTω +C 2
m i=1
¯ATi ω − yi2
ε. (5)
Note that | · |2ε is smooth, but not twice differentiable, which means the objective function is not twice continuously differentiable. Hence, the Newton-type method cannot be applied to solve (5) directly.
In view of this fact, we propose a family of twice continuously differentiable func- tionsφε(x, α) to replace |x|2ε. The family of functionsφε(x, α) : IR × IR+→ IR+is given by
φε(x, α) =
⎧⎪
⎪⎨
⎪⎪
⎩
(|x| − ε)2+13α2if |x| − ε ≥ α,
1
6α(|x| − ε + α)3if ||x| − ε| < α,
0 if |x| − ε ≤ −α,
(6)
where 0 < α < ε is a smooth parameter. The graphs of φε(x, α) are depicted in Fig.1. From this geometric view, it is clear to see thatφε(x, α) is a class of smoothing functions for|x|2ε.
Besides the geometric approach, we hereat show thatφε(x, α) is a class of smooth- ing functions for|x|2ε by algebraic verification. To this end, we compute the partial derivatives ofφε(x, α) as below:
∇xφε(x, α) =
⎧⎪
⎪⎨
⎪⎪
⎩
2(|x| − ε)sgn(x) if|x| − ε ≥ α,
1
2α(|x| − ε + α)2sgn(x) if x| − ε
0 if|x| − ε ≤ −α.
(7)
∇x x2 φε(x, α) =
⎧⎪
⎪⎨
⎪⎪
⎩
2 if |x| − ε ≥ α
|x|−ε+α
α if x| − ε 0 if |x| − ε ≤ −α.
(8)
∇2xαφε(x, α) =
⎧⎪
⎪⎨
⎪⎪
⎩
0 if |x| − ε ≥ α,
(|x|−ε+α)(α−|x|+ε)
2α2 sgn(x) if x| − ε
0 if |x| − ε ≤ −α.
(9)
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 x
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014
|x|2 (x, 0.03) (x, 0.06) (x, 0.09)
Fig. 1 Graphs ofφε(x, α) with ε = 0.1 and α = 0.03, 0.06, 0.09
With the above, the following lemma shows some basic properties ofφε(x, α).
Lemma 2.1 Letφε(x, α) be defined as in (6). Then, the following hold.
(a) For 0< α < ε, there holds 0 ≤ φε(x, α) − |x|2ε≤ 13α2.
(b) The functionφε(x, α) is twice continuously differentiable with respect to x for 0< α < ε.
(c) lim
α→0φε(x, α) = |x|2εand lim
α→0∇xφε(x, α) = ∇(|x|2ε).
Proof (a) To complete the arguments, we need to discuss four cases.
(i) For|x| − ε ≥ α, it is clear that φε(x, α) − |x|2ε= 13α2.
(ii) For 0< |x| − ε < α, i.e., 0 < x − ε < α or 0 < −x − ε < α, there have two subcase.
If 0< x − ε < α, letting f (x) := φε(x, α) − |x|2ε = 61α(x − ε + α)3− (x − ε)2gives f (x) = (x−ε+α)2α 2 − 2(x − ε), ∀x ∈ (ε, ε + α),
f (x) = x−ε+αα − 2 < 0, ∀x ∈ (ε, ε + α).
This indicates that f (x) is monotone decreasing on (ε, ε + α), which further implies f (x) ≥ f (ε + α) = 0, ∀x ∈ (ε, ε + α).
Thus, we obtain that f(x) is monotone increasing on (ε, ε + α). With this, we have f(x) ≤ f (ε + α) = 13α2, which yields
φε(x, α) − |x|2ε≤ 1
3α2, ∀x ∈ (ε, ε + α).
If 0< −x − ε < α, the arguments are similar as above, and we omit them.
(iii) For−α < |x| − ε ≤ 0, it is clear that φε(x, α) − |x|2ε= 61α(|x| − ε + α)3≤ α6α3 ≤
α2 3.
(iv) For|x| − ε ≤ −α, we have φε(x, α) − |x|2ε = 0. Then, the desired result follows.
(b) To prove the twice continuous differentiability of φε(x, α), we need to check φε(·, α), ∇xφε(·, α) and ∇2x xφε(·, α) are all continuous. Since they are piecewise functions, it suffices to check the junction points.
First, we check thatφε(·, α) is continuous.
(i) If|x| − ε = α, then φε(x, α) = 43α2, which impliesφε(·, α) is continuous.
(ii) If|x| − ε = −α, then φε(x, α) = 0. Hence, φε(·, α) is continuous.
Next, we check∇xφε(·, α) is continuous.
(i) If|x| − ε = α, then ∇xφε(x, α) = 2α sgn(x).
(ii) If |x| − ε = −α, then ∇xφε(x, α) = 0. From the above, it clear to see that
∇xφε(·, α) is continuous.
Now we show that∇2x xφε(·, α) is continuous.
(i) If|x| − ε = α, ∇x x2 φε(x, α) = 2.
(ii)|x| − ε = −α then ∇x x2 φε(x, α) = 0. Hence, ∇2x xφε(·, α) is continuous.
(c) It is clear that lim
α→0φε(x, α) = |x|2ε holds by part(a). It remains to verify
α→0lim∇xφε(x, α) = ∇(|x|2ε). First, we compute that
∇(|x|2ε) =
2(|x| − ε)sgn(x) if |x| − ε ≥ 0,
0 if |x| − ε < 0. (10)
In light of (10), we proceed the arguments by discussing four cases.
(i) For|x| − ε ≥ α, we have ∇xφε(x, α) − ∇(|x|2ε) = 0. Then, the desired result follows.
(ii) For 0< |x| − ε < α, we have
∇xφε(x, α) − ∇(|x|2ε) = 1
2α(|x| − ε + α)2sgn(x) − 2(|x| − ε)sgn(x) which yields
α→0lim(∇xφε(x, α) − ∇(|x|2ε)) = lim
α→0
(|x| − ε + α)2− 4α(|x| − ε)
2α sgn(x).
We notice that|x| → ε when α → 0, and hence (|x|−ε+α)22−4α(|x|−ε)α → 00. Then, applying L’hopital rule yields
α→0lim
(|x| − ε + α)2− 4α(|x| − ε)
2α = lim
α→0(α − (|x| − ε)) = 0.
This implies lim
α→0(∇xφε(x, α) − ∇(|x|2ε)) = 0, which is the desired result.
(iii) For−α < |x|−ε ≤ 0, we have ∇xφε(x, α)−∇(|x|2ε) = 21α(|x|−ε +α)2sgn(x).
Then, applying L’hopital rule gives
α→0lim
(|x| − ε + α)2
2α = lim
α→0(|x| − ε + α) = 0.
Thus, we prove that lim
α→0(∇xφε(x, α) − ∇(|x|2ε)) = 0 under this case.
(iv) For|x| − ε ≤ −α, we have ∇xφε(x, α) − ∇(|x|2ε) = 0. Then, the desired result
follows clearly.
Now, we use the family of smoothing functions φε to replace the square of ε- insensitive loss function in (5) to obtain the first smooth support vector regression. In other words, we consider
minω Fε,α(ω) := 1
2ωTω +C
21T ε ¯Aω − y, α
. (11)
whereω := (x, b) ∈ IRn+1, and ε(Ax + 1b − y, α) ∈ IRmis defined by ε(Ax + 1b − y, α)i = φε(Aix+ b − yi, α) .
This is a strongly convex unconstrained optimization with the twice continuously differentiable objective function. Noting lim
α→0φε(x, α) = |x|2ε, we see that
minω Fε,0(ω) := lim
α→0Fε,α(ω) = 1
2ωTω + C 2
m i=1
¯AiTω − yi2
ε (12)
which is exactly the problem (5).
The following Theorem shows that the unique solution of the smooth problem (11) approaches to the unique solution of the problem (12) asα → 0. Indeed, it plays as the same role as [7, Theorem 2.2].
Theorem 2.1 Let Fε,α(ω) and Fε,0(ω) be defined as in (11) and (12), respectively.
Then, the following hold.
(a) There exists a unique solution ¯ωα to min
ω∈IRn+1Fε,α(ω) and a unique solution ¯ω to
ω∈IRminn+1Fε,0(ω).
(b) For all 0< α < ε, we have the following inequality:
¯ωα− ¯ω 2≤ 1
6Cmα2. (13)
Moreover, ¯ωα converges to ¯ω as α → 0 with an upper bound given by (13).
Proof (a) In view ofφε(x, α) − |x|2ε ≥ 0 in Lemma2.1(a), we see that the level sets
Lv(Fε,α(ω)) :=
ω ∈ IRn+1| Fε,α(ω) ≤ v Lv(Fε,0(ω)) :=
ω ∈ IRn+1| Fε,0(ω) ≤ v
satisfy
Lv(Fε,α(ω)) ⊆ Lv(Fε,0(ω)) ⊆
ω ∈ IRn+1| ωTω ≤ 2v
(14)
for any v ≥ 0. Hence, we obtain that Lv(Fε,α(ω)) and Lv(Fε,0(ω)) are compact (closed and bounded) subsets in IRn+1. Then, by the strong convexity of Fε,0(ω) and Fε,α(ω) with α > 0, each of the problems min
ω∈IRn+1Fε,α(ω) and min
ω∈IRn+1Fε,0(ω) has a unique solution.
(b) From the optimality condition and strong convexity of Fε,0(ω) and Fε,α(ω) with α > 0, we know that
Fε,0( ¯ωα) − Fε,0( ¯ω) ≥ ∇ Fε,0( ¯ωα− ¯ω) +1
2 ¯ωα− ¯ω 2≥ 1
2 ¯ωα− ¯ω 2, (15) Fε,α( ¯ω) − Fε,α( ¯ωα) ≥ ∇ Fε,α( ¯ω − ¯ωα) +1
2 ¯ω − ¯ωα 2≥ 1
2 ¯ω − ¯ωα 2. (16) Note that Fε,α(ω) ≥ Fε,0(ω) because φε(x, α) − |x|2ε ≥ 0. Then, adding up (15) and (16) along with this fact yield
¯ωα− ¯ω 2≤ (Fε,α( ¯ω) − Fε,0( ¯ω)) − (Fε,α( ¯ωα) − Fε,0( ¯ωα))
≤ Fε,α( ¯ω) − Fε,0( ¯ω)
=C
21T ε( ¯A ¯ω − y, α) −C 2
m i=1
¯AiT¯ω − yi2
ε
=C 2
m i=1
φε( ¯Ai¯ω − yi, α) −C 2
m i=1
¯ATi ¯ω − yi2
ε
≤ 1 6Cmα2,
where the last inequality is due to Lemma2.1(a). It is clear that ¯ωα converges to ¯ω as α → 0 with an upper bound given by the above. Then, the proof is complete.
Next, we focus on the optimality condition of the minimization problem (11), which is indeed sufficient and necessary for (11) and has the form of
∇ωFε,α(ω) = 0.
With this, we define a function Hφ: IRn+2→ IRn+2by
Hφ(z) =
α
∇ωFε,α(ω)
=
α
ω + Cm
i=1∇xφε( ¯ATi ω − yi, α) ¯Ai
(17)
where z := (α, ω) ∈ IRn+2. From Lemma2.1and the strong convexity of Fε,α(ω), it is easy to see that if Hφ(z) = 0, then α = 0 and ω solves (11); and for any z ∈ IR++× IRn+1, the function Hφ is continuously differentiable. In addition, the Jacobian of Hφcan be calculated as below:
Hφ (z) =
1 0
∇ωα2 Fε,α(ω) ∇ωω2 Fε,α(ω)
(18)
where
∇ωα2 Fε,α(ω) = C m i=1
∇x2αφε
¯ATi ω − yi, α
¯Ai,
∇ωω2 Fε,α(ω) = I + C m i=1
∇x x2 φε
¯ATi ω − yi, α
¯Ai ¯ATi .
From (8), we can see ∇x x2 φε(x, α) ≥ 0, which implies Cm
i=1∇x x2 φε( ¯ATi ω − yi, α) ¯Ai ¯AiT is positive semidefinite. Hence, ∇ωω2 Fε,α(ω) is positive definite. This helps us to prove that Hφ (z) is invertible at any z ∈ IR++× IRn+1. In fact, if there exists a vector d := (d1, d2) ∈ IR × IRn+1such that Hφ (z)d = 0, then we have
d1
d1∇ωα2 Fε,α(ω) + ∇ωω2 Fε,α(ω)d2
= 0.
This implies that d= 0, and hence Hφ (z) is invertible at any z ∈ IR++× IRn+1.
3 The second smooth support vector machine
In this section, we consider another type of smoothing functionsψε,p(x, α) : IR × IR+→ IR+, which is defined by
ψε,p(x, α) =
⎧⎪
⎨
⎪⎩
0 if 0≤ |x| ≤ ε − α,
pα−1
(p−1)(|x|−ε+α) pα
p
if ε − α < |x| < ε + pα−1,
|x| − ε if |x| ≥ ε + pα−1.
(19)
here p≥ 2. The graphs of ψε,p(x, α) are depicted in Fig.2, which clearly verify that ψε,p(x, α) is a family of smoothing functions for |x|ε.
As in Lemma3.1, we verify thatψε,p(x, α) is a family of smoothing functions for
|x|ε, hence,ψε,p2 (x, α) is also a family of smoothing functions for |x|2ε. Then, we can
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 x
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
|x|
,p(x, 0.03) ,p(x, 0.06) ,p(x, 0.09)
Fig. 2 Graphs ofψε,p(x, α) with ε = 0.1, α = 0.03, 0.06, 0.09 and p = 2
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
x 0
0.002 0.004 0.006 0.008 0.01 0.012 0.014
|x|2 (x, 0.06) (x, 0.09)
,p 2 (x, 0.06)
,p 2 (x, 0.09)
Fig. 3 Graphs of|x|2ε,φε(x, α) and ψε,p2 (x, α) with ε = 0.1, α = 0.06, 0.09 and p = 2
employψε,p2 to replace the square ofε-insensitive loss function in (5) as the same way done in Sect.2. The graphs ofψε,p2 (x, α) with comparison to φε(x, α) are depicted in Fig.3. In fact, there is a relation betweenψε,p2 (x, α) and φε(x, α) shown as in Proposition3.1.
In other words, we obtain an alternative strongly convex unconstrained optimization for (5):
minω
1
2ωTω +C 2
m i=1
ψε,p2
¯ATi ω − yi, α
. (20)
However, the smooth functionψε,p2 (x, α) is not twice differentiable with respect x, and hence the objective function of (20) is not twice differentiable although it smooth.
Then, we still cannot apply Newton-type method to solve (20). To conquer this, we take another smoothing technique. Before presenting the idea of this smoothing technique, the following two lemmas regarding properties ofψε,p(x, α) are needed. To this end, we also compute the partial derivative ofψε,p(x, α) as below:
∇xψε,p(x, α) =
⎧⎪
⎨
⎪⎩
0 if 0≤ |x| ≤ ε − α,
sgn(x)
(p−1)(|x|−ε+α) pα
p−1
ifε − α < |x| < ε + p−1α ,
sgn(x) if|x| ≥ ε + p−1α .
∇αψε,p(x, α) =
⎧⎪
⎨
⎪⎩
0 if 0≤ |x| ≤ ε − α,
(ε−|x|)(p−1)+α pα
(p−1)(|x|−ε+α) pα
p−1
if ε − α < |x| < ε + pα−1,
0 if |x| ≥ ε + pα−1.
Lemma 3.1 Letψε,p(x, α) be defined as in (19). Then, we have (a) ψε,p(x, α) is smooth with respect to x for any p ≥ 2;
(b) lim
α→0ψε,p(x, α) = |x|εfor any p≥ 2.
Proof (a) To prove the result, we need to check bothψε,p(·, α) and ∇xψε,p(·, α) are continuous.
(i) If|x| = ε − α, then ψε,p(x, α) = 0.
(ii) If |x| = ε + pα−1, thenψε,p(x, α) = pα−1. Form (i) and (ii), it is clear to see ψε,p(·, α) is continuous.
Moreover, (i) If|x| = ε − α, then ∇xψε,p(x, α) = 0.
(ii) If|x| = ε + pα−1, then∇xψε,p(x, α) = sgn(x). In view of (i) and (ii), we see that
∇xψε,p(·, α) is continuous.
(b) To proceed, we discuss four cases.
(1) If 0≤ |x| ≤ ε − α, then ψε,p(x, α) − |x|ε= 0. Then, the desired result follows.
(2) Ifε − α ≤ |x| ≤ ε, then ψε,p(x, α) − |x|ε= pα−1
(p−1)(|x|−ε+α) pα
p
. Hence,
α→0lim
ψε,p(x, α) − |x|ε
= lim
α→0
α
p− 1
(p − 1)(|x| − ε + α) pα
p
= lim
α→0
α
p− 1
α→0lim
(p − 1)(|x| − ε + α) pα
p
.
It is clear that the first limit is zero, so we only need to show that the second limit is bounded. To this end, we rewrite it as
α→0lim
(p − 1)(|x| − ε + α) pα
p
= lim
α→0
p− 1 p
p
|x| − ε + α α
p
.
We notice that|x| → ε when α → 0 so that |x|−ε+αα → 00. Therefore, by applying L’hopital’s rule, we obtain
α→0lim
|x| − ε + α α
= 1
which implies that limα→0
ψε,p(x, α) − |x|ε
= 0 under this case.
(3) Ifε ≤ |x| ≤ ε + pα−1, then
ψε,p(x, α) − |x|ε = α p− 1
(p − 1)(|x| − ε + α) pα
p
− (|x| − ε).
We have shown in case (2) that
α→0lim α p− 1
(p − 1)(|x| − ε + α) pα
p
= 0.
It is also obvious that limα→0(|x| − ε) = 0. Hence, we obtain limα→0
ψε,p(x, α) −
|x|ε
= 0 under this case.
(4) If|x| ≥ ε +pα−1, the desired result follows since it is clear thatψε,p(x, α)−|x|ε=
0. From all the above, the proof is complete.
Lemma 3.2 Letψε,p(x, α) be defined as in (19). Then, we have (a) ψε,p(x, α)sgn(x) is smooth with respect to x for any p ≥ 2;
(b) lim
α→0ψε,p(x, α)sgn(x) = |x|εsgn(x) for any p ≥ 2.
Proof (a) First, we observe thatψε,p(x, α)sgn(x) can be written as ψε,p(x, α)sgn(x) =
⎧⎪
⎨
⎪⎩
0 if 0≤ |x| ≤ ε − α,
pα−1
(p−1)(|x|−ε+α) pα
p
sgn(x) if ε − α < |x| < ε + pα−1, (|x| − ε)sgn(x) if |x| ≥ ε + pα−1.
Note that sgn(x) is continuous at x = 0 and ψε,p(x, α) = 0 at x = 0, then applying Lemma3.1(a) yieldsψε,p(x, α)sgn(x) is continuous. Furthermore, by simple calcu- lations, we have
∇x(ψε,p(x, α)sgn(x)) = ∇xψε,p(x, α)sgn(x)
=
⎧⎪
⎨
⎪⎩
0 if 0≤ |x| ≤ ε − α,
(p−1)(|x|−ε+α) pα
p−1
ifε − α < |x| < ε + p−1α ,
1 if|x| ≥ ε + p−1α .
(21)
Mimicking the arguments as in Lemma3.1(a), we can verify that∇x(ψε,p(x, α)sgn(x)) is continuous. Thus, the desired result follows.
(b) By Lemma3.1(b), it is easy to see that lim
α→0ψε,p(x, α)sgn(x) = |x|εsgn(x). Then,
the desired result follows.
Note that|x|2εis smooth with
∇(|x|2ε) = 2|x|εsgn(x) =
2(|x| − ε)sgn(x) if |x| > ε,
0 if |x| ≤ ε.
being continuous (but not differentiable). Then, we consider the optimality condition of (12), that is
∇ωFε,0(ω) = ω + C m
i=1
| ¯ATi ω − yi|εsgn
¯ATi ω − yi
¯Ai = 0, (22)
which is indeed sufficient and necessary for (5). Hence, solving (22) is equivalent to solving (5).
Using the family of smoothing functionsψε,p to replaceε-loss function of (22) leads to a system of smooth equations. More specifically, we define a function Hψ : IRn+2→ IRn+2by
Hψ(z) = Hψ(α, ω) =
α
ω + Cm
i=1ψε ¯ATi ω − yi, α
sgn ¯ATi ω − yi ¯Ai
where z:= (α, ω) ∈ IRn+2. From Lemma3.1, it is easy to see that if Hψ(z) = 0, then α = 0 and ω is the solution of the equations (22), i.e., the solution of (12). Moreover, for any z∈ IR++× IRn+1, the function Hψ is continuously differentiable with
Hψ (z) =
1 0
E(ω) I + D(ω)
(23)
where
E(ω) = C m
i=1
∇αψε
¯AiTω − yi, α sgn
¯ATi ω − yi
¯Ai,
D(ω) = C m
i=1
∇xψε
¯ATi ω − yi, α sgn
¯AiTω − yi
¯Ai ¯ATi .
Because∇xψε( ¯ATi ω − yi, α)sgn( ¯AiTω − yi) is nonnegative for any α > 0 from (21), we see that I+ D(x) is positive definite at any z ∈ IR++×IRn+1. Following the similar arguments as in Sect.2, we obtain that Hψ (z) is invertible at any z ∈ IR++× IRn+1. Proposition 3.1 Letφε(x, α) be defined as in (6) andψε,p(x, α) be defined as in (19).
Then, the following hold.
(a) For p≥ 2, we have φε(x, α) ≥ ψε,p2 (x, α) ≥ |x|2ε. (b) For p≥ q ≥ 2, we have ψε,q(x, α) ≥ ψε,p(x, α).
Proof (a) First, we show thatφε(x, α) ≥ ψε,p2 (x, α) holds. To proceed, we discuss four cases.
(i) If|x| ≤ ε − α, then φε(x, α) = 0 = ψε,p2 (x, α).
(ii) Ifε−α < |x| < ε+p−1α , then|x| ≤ ε+pα−1which is equivalent to|x|−ε+α1 ≥ pαp−1. Thus, we have
φε(x, α)
ψε,p2 (x, α) = α2 p−3p2 p
6(p − 1)2 p−2(|x| − ε + α)2 p−3 ≥ p3
6(p − 1) ≥ 1,
which impliesφε(x, α) ≥ ψε,p2 (x, α).
(iii) Forε + pα−1 ≤ |x| < ε + α, letting t := |x| − ε ∈ [pα−1, α) yields
φε(x, α) − ψε,p2 (x, α) = 1
6α(t + α)3− t2= t
1 6αt2−1
2t+1 2α
+1
6α2≥ 0.
here the last inequality follows from the fact that discriminant of 61αt2−12t+12α is less than 0 and 61α > 0. Then, φε(x, α) − ψε,p2 (x, α) > 0.
(iv) If|x| ≥ ε+α, then it is clear that φε(x, α) = (|x|−ε)2+13α2≥ (|x|−ε)2= ψε,p2 . Now we show that the other partψε,p(x, α) ≥ |x|ε, which is equivalent to verifying ψε,p2 (x, α) ≥ |x|2ε. Again, we discuss four cases.
(i) If|x| ≤ ε − α, then ψε,p(x, α) = 0 = |x|ε.
(ii) Ifε − α < |x| ≤ ε, then ε − α < |x| which says |x| − ε + α > 0. Thus, we have ψε,p(x, α) ≥ 0 = |x|ε.
(iii) Forε < |x| < ε + pα−1, we let t:= |x| − ε ∈ (0, pα−1) and define a function as
f(t) = α p− 1
(p − 1)(t + α) pα
p
− t,
which is a function on
0, pα−1
. Note that f(|x| − ε) = ψε,p(x, α) − |x|εfor|x| ∈ (ε, ε + pα−1) and observe that
f (t) =
(p − 1)(t + α) pα
p−1
− 1 ≤
(p − 1)(pα−1+ α) pα
p−1
− 1 = 0.
This means f(t) is monotone decreasing on (0, pα−1). Since f (pα−1) = 0, we have f(t) ≥ 0 for t ∈ (0,pα−1), which implies ψε,p(x, α) ≥ |x|εfor|x| ∈ (ε, ε + pα−1).
(iv) If|x| ≥ ε + pα−1, then it is clear thatψε,p(x, α) = |x| − ε = |x|ε. (b) For p≥ q ≥ 2, it is obvious to see that
ψε,q(x, α) = ψε,p(x, α) for |x| ∈ [0, ε − α] ∪
ε + α
q− 1, +∞
.
If|x| ∈ [ε + pα−1, ε + q−1α ), then ψε,p(x, α) = |x|ε ≤ ψε,q(x, α) from the above.
Thus, we only need to prove the case of|x| ∈ (ε − α, ε + p−1α ).
Consider|x| ∈ (ε − α, ε + pα−1) and t := |x| − ε + α, we observe that αt ≥ p−1p . Then, we verify that
ψε,q(x, α)
ψε,p(x, α) = (q − 1)q−1pp (p − 1)p−1qq ·α
t
p−q
≥ (q − 1)q−1pp (p − 1)p−1qq ·
p− 1 p
p−q
=
p q
q
·
q− 1 p− 1
q−1
=
1+ p−qq q
1+qp−q−1q−1
≥ 1,
where the last inequality is due to(1 + p−qx )xbeing increasing for x > 0. Thus, the
proof is complete.
4 A smoothing Newton algorithm
In Sects.2 and3, we construct two systems of smooth equations: Hφ(z) = 0 and Hψ(z) = 0. We briefly describe the difference between Hφ(z) = 0 and Hψ(z) = 0.
In general, the way we come up with Hφ(z) = 0 and Hψ(z) = 0 is a bit different.
For achieving Hφ(z) = 0, we first use the twice continuously differentiable functions φε(x, α) to replace |x|2ε in problem (5), and then write out its KKT condition. To the contrast, for achieving Hψ(z) = 0, we write out the KKT condition of problem (5) first, then we use the smoothing functionsψε,p(x, α) to replace ε-loss function of (22) therein. For convenience, we denote ˜H(z) ∈ {Hφ(z), Hψ(z)}. In other words, ˜H(z) possesses the property that if ˜H(z) = 0, then α = 0 and ω solves (12). In view of this, we apply some Newton-type methods to solve the system of smooth equations H˜(z) = 0 at each iteration and letting α → 0 so that the solution to the problem (12) can be found.
Algorithm 4.1 (A smoothing Newton method)
Step 0 Choose δ ∈ (0, 1), σ ∈ (0,12), and α0 > 0. Take τ ∈ (0, 1) such that τα0 < 1. Let ω0 ∈ IRn+1be an arbitrary vector. Set z0 := (α0, ω0). Set e0:= (1, 0, . . . , 0) ∈ IRn+2.
Step 1 If ˜H(zk) = 0, stop.
Step 2 Define function , β by
(z) := ˜H(zk) 2 and β(z) := τ min{1, (z)}. (24)
Computezk:= (αk, xk) by H˜
zk
+ ˜H
zk
zk = α0β zk
e0.
Step 3 Letθkbe the maximum of the values 1, δ, δ2, · · · such that
zk+ λkzk
≤
1− 2σ (1 − γ α0) θk
zk
. (25)
Step 4 Set zk+1:= zk+ θkzk, and k := k + 1, Go to step 1.
Proposition 4.1 Suppose that the sequence{zk} is generated by Algorithm4.1. Then, the following results hold.
(a) { (zk)} is monotonically decreasing.
(b) { ˜H(zk)} and {β(zk)} are monotonically decreasing.
(c) LetN (τ) := {z ∈ IR+× IRn+1: α0β(z) ≤ α}, then zk ∈ N (τ) for any k ∈ K and 0< αk+1≤ αk.
(d) The algorithm is well defined.
Proof Since the proof is much similar to [6, Remark 2.1], we omit it here.
Lemma 4.1 Let ¯λ := max λi(m
i=1 ¯Ai ¯AiT)
. Then, for any z∈ IR++× IRn+1, we have
(a) 1≤ λi(Hφ (z)) ≤ 1 + 2¯λ, i = 1, · · · , n + 2;
(b) 1≤ λi(Hψ (z)) ≤ 1 + ¯λ, i = 1, · · · , n + 2.
Proof (a) Hφ (z) is continuously differentiable at any z ∈ IR++× IRn+1, and by (18), it is easy to see that{1, λ1(∇ωω2 Fε,α(ω)), · · · , λn+1(∇ωω2 Fε,α(ω))} are eigenvalues of Hφ (z). From the representation of ∇x x2 φεin (8), we have 0≤ ∇x x2 φε( ¯ATi ω−yi, α) ≤ 2.
As∇ωω2 Fε,α(ω) = I +m
i=1∇x x2 φε( ¯AiTω − yi, α) ¯Ai ¯ATi , then 1≤ λi
∇ωω2 Fε,α(ω)
≤ 1 + 2¯λ(i = 1, · · · , n + 1). (26)
Thus the result (i) holds.