• 沒有找到結果。

123 Twosmoothsupportvectormachinesfor ε -insensitiveregression

N/A
N/A
Protected

Academic year: 2022

Share "123 Twosmoothsupportvectormachinesfor ε -insensitiveregression"

Copied!
29
0
0

加載中.... (立即查看全文)

全文

(1)

Two smooth support vector machines for ε-insensitive regression

Weizhe Gu1 · Wei-Po Chen2 · Chun-Hsu Ko3 · Yuh-Jye Lee4 · Jein-Shan Chen2

Received: 6 June 2017 / Published online: 18 December 2017

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Abstract In this paper, we propose two new smooth support vector machines for ε-insensitive regression. According to these two smooth support vector machines, we construct two systems of smooth equations based on two novel families of smoothing functions, from which we seek the solution toε-support vector regression (ε-SVR).

More specifically, using the proposed smoothing functions, we employ the smoothing Newton method to solve the systems of smooth equations. The algorithm is shown to be globally and quadratically convergent without any additional conditions. Numerical comparisons among different values of parameter are also reported.

J.-S. Chen work is supported by Ministry of Science and Technology, Taiwan.

B

Jein-Shan Chen jschen@math.ntnu.edu.tw Weizhe Gu

weizhegu@yahoo.com.cn Wei-Po Chen

weaper@gmail.com Chun-Hsu Ko chko@isu.edu.tw Yuh-Jye Lee

yuhjye@math.nctu.edu.tw

1 Department of Mathematics, School of Science, Tianjin University, Tianjin 300072, People’s Republic of China

2 Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan 3 Department of Electrical Engineering, I-Shou University, Kaohsiung 840, Taiwan

4 Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan

(2)

Keywords Support vector machine· ε-insensitive loss function · ε-smooth support vector regression· Smoothing Newton algorithm

1 Introduction

Support vector machine (SVM) is a popular and important statistical learning technol- ogy [1,7,8,16–19]. Generally speaking, there are two main categories for support vector machines (SVMs): support vector classification (SVC) and support vector regression (SVR). The model produced by SVR depends on a training data set S = {(A1, y1), . . . , (Am, ym)} ⊆ IRn × IR, where Ai ∈ IRn is the input data and yi ∈ IR is called the observation. The main goal of ε-insensitive regression with the idea of SVMs is to find a linear or nonlinear regression function f that has at mostε deviation from the actually obtained targets yifor all the training data, and at the same time is as flat as possible. This problem is calledε-support vector regression (ε-SVR).

For pedagogical reasons, we begin with the linear case, in which the regression function f() is defined as

f() = Tx+ b with x ∈ IRn, b ∈ IR. (1) Flatness in the case of (1) means that one seeks a small x. One way to ensure this is to minimize the norm of x, then the problemε-SVR can be formulated as a constrained minimization problem:

min 21xTx+ Cm

i=1i+ ξi) s.t.

⎧⎪

⎪⎩

yi− AiTx− b ≤ ε + ξi

ATi x+ b − yi ≤ ε + ξi ξi, ξi≥ 0, i = 1, . . . , m

(2)

The constant C > 0 determines the trade-off between the flatness of f and the amount up to which deviations larger thanε are tolerated. This corresponds to dealing with a so calledε-insensitive loss function |ξ|εdescribed by

|ξ|ε= max{0, |ξ| − ε}.

The formulation (2) is a convex quadratic minimization problem with n+ 1 free variables, 2m nonnegative variables, and 2m inequality constraints, which enlarges the problem size and could increase computational complexity.

In fact, the problem (2) can be reformulated as an unconstrained optimization problem:

(x,b)∈IRminn+1 1 2



xTx+ b2 +C

2 m i=1

AiTx+ b − yi2

ε (3)

This formulation has been proposed in active set support vector regression [11] and solved in its dual form. The objective function is strongly convex, hence, the problem

(3)

has a unique global optimal solution. However, according to the fact that the objective function is not twice continuously differentiable, Newton-type algorithms cannot be applied to solve (3) directly.

Lee, Hsieh and Huang [7] apply a smooth technique for (3). The smooth function

fε(x, α) = x + 1

αlog(1 + e−αx), (4)

which is the integral of the sigmoid function 1+e1−αx, is used to smooth the plus func- tion[x]+. More specifically, the smooth function fε(x, α) approaches to [x]+, whenα goes to infinity. Then, the problem (3) is recast to a strongly convex unconstrained min- imization problem with the smooth function fε(x, α) and a Newton-Armijo algorithm is proposed to solve it. It is proved that when the smoothing parameterα approaches to infinity, the unique solution of the reformulated problem converges to the unique solution of the original problem [7, Theorem 2.2]. However, the smoothing parameter α is fixed in the proposed algorithm, and in the implementation of this algorithm, α cannot be set large enough.

In this paper, we introduce two smooth support vector machines forε-insensitive regression. For the first smooth support vector machine, we reformulatedε-SVR to a strongly convex unconstrained optimization problem with one type of smoothing functionsφε(x, α). Then, we define a new function Hφ, which corresponds to the optimality condition of the unconstrained optimization problem. From the solution of Hφ(z) = 0, we can obtain the solution of ε-SVR. For the second smooth support vector machine, we smooth the optimality condition of the strongly convex uncon- strained optimization problem of (3) with another type of smooth functionsψε(x, α).

Accordingly we define the function Hψ, which also possesses the same properties as Hφdoes. For either Hφ = 0 or Hψ = 0, we consider the smoothing Newton method to solve it. The algorithm is shown to be globally convergent, specifically, the iterative sequence converges to the unique solution to (3). Furthermore, the algorithm is shown to be locally quadratically convergent without any assumptions.

The paper is organized as follows. In Sects.2 and3, we introduce two smooth support vector machine reformulations by two types of smoothing functions. In Sect.4, we propose a smoothing Newton algorithm and study its global and local quadratic convergence. Numerical results and comparisons are reported in Sect.5. Throughout this paper,K := {1, 2, . . .}, all vectors will be column vectors. For a given vector x= (x1, . . . , xn)T ∈ IRn, the plus function[x]+is defined as

([x]+)i = max{0, xi}, i = 1, . . . , n.

For a differentiable function f , we denote by∇ f (x) and ∇2f(x) the gradient and the Hessian matrix of f at x, respectively. For a differentiable mapping G : IRn → IRm, we denote by G (x) ∈ IRm×nthe Jacabian of G at x. For a matrix A∈ IRm×n, ATi is the i -th row of A. A column vector of ones and identity matrix of arbitrary dimension will be denoted by 1 and I , respectively. We denote the sign function by

(4)

sgn(x) =

⎧⎨

1 if x > 0, [−1, 1] if x = 0,

−1 if x < 0.

2 The first smooth support vector machine

As mentioned in [7], it is known thatε-SVR can be reformulated as a strongly convex unconstrained optimization problem (3). Denoteω := (x, b) ∈ IRn+1, ¯A:= (A, 1) and ¯ATi is the i -th row of ¯A, then the smooth support vector regression (3) can be rewritten as

minω

1

2ωTω +C 2

m i=1

¯ATi ω − yi2

ε. (5)

Note that | · |2ε is smooth, but not twice differentiable, which means the objective function is not twice continuously differentiable. Hence, the Newton-type method cannot be applied to solve (5) directly.

In view of this fact, we propose a family of twice continuously differentiable func- tionsφε(x, α) to replace |x|2ε. The family of functionsφε(x, α) : IR × IR+→ IR+is given by

φε(x, α) =

⎧⎪

⎪⎨

⎪⎪

(|x| − ε)2+13α2if |x| − ε ≥ α,

1

6α(|x| − ε + α)3if ||x| − ε| < α,

0 if |x| − ε ≤ −α,

(6)

where 0 < α < ε is a smooth parameter. The graphs of φε(x, α) are depicted in Fig.1. From this geometric view, it is clear to see thatφε(x, α) is a class of smoothing functions for|x|2ε.

Besides the geometric approach, we hereat show thatφε(x, α) is a class of smooth- ing functions for|x|2ε by algebraic verification. To this end, we compute the partial derivatives ofφε(x, α) as below:

xφε(x, α) =

⎧⎪

⎪⎨

⎪⎪

2(|x| − ε)sgn(x) if|x| − ε ≥ α,

1

2α(|x| − ε + α)2sgn(x) if x| − ε

0 if|x| − ε ≤ −α.

(7)

x x2 φε(x, α) =

⎧⎪

⎪⎨

⎪⎪

2 if |x| − ε ≥ α

|x|−ε+α

α if x| − ε 0 if |x| − ε ≤ −α.

(8)

2xαφε(x, α) =

⎧⎪

⎪⎨

⎪⎪

0 if |x| − ε ≥ α,

(|x|−ε+α)(α−|x|+ε)

2α2 sgn(x) if x| − ε

0 if |x| − ε ≤ −α.

(9)

(5)

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 x

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014

|x|2 (x, 0.03) (x, 0.06) (x, 0.09)

Fig. 1 Graphs ofφε(x, α) with ε = 0.1 and α = 0.03, 0.06, 0.09

With the above, the following lemma shows some basic properties ofφε(x, α).

Lemma 2.1 Letφε(x, α) be defined as in (6). Then, the following hold.

(a) For 0< α < ε, there holds 0 ≤ φε(x, α) − |x|2ε13α2.

(b) The functionφε(x, α) is twice continuously differentiable with respect to x for 0< α < ε.

(c) lim

α→0φε(x, α) = |x|2εand lim

α→0xφε(x, α) = ∇(|x|2ε).

Proof (a) To complete the arguments, we need to discuss four cases.

(i) For|x| − ε ≥ α, it is clear that φε(x, α) − |x|2ε= 13α2.

(ii) For 0< |x| − ε < α, i.e., 0 < x − ε < α or 0 < −x − ε < α, there have two subcase.

If 0< x − ε < α, letting f (x) := φε(x, α) − |x|2ε = 61α(x − ε + α)3− (x − ε)2gives f (x) = (x−ε+α)2α 2 − 2(x − ε), ∀x ∈ (ε, ε + α),

f (x) = x−ε+αα − 2 < 0, ∀x ∈ (ε, ε + α).

This indicates that f (x) is monotone decreasing on (ε, ε + α), which further implies f (x) ≥ f (ε + α) = 0, ∀x ∈ (ε, ε + α).

Thus, we obtain that f(x) is monotone increasing on (ε, ε + α). With this, we have f(x) ≤ f (ε + α) = 13α2, which yields

φε(x, α) − |x|2ε≤ 1

3α2, ∀x ∈ (ε, ε + α).

(6)

If 0< −x − ε < α, the arguments are similar as above, and we omit them.

(iii) For−α < |x| − ε ≤ 0, it is clear that φε(x, α) − |x|2ε= 61α(|x| − ε + α)3α6α3

α2 3.

(iv) For|x| − ε ≤ −α, we have φε(x, α) − |x|2ε = 0. Then, the desired result follows.

(b) To prove the twice continuous differentiability of φε(x, α), we need to check φε(·, α), ∇xφε(·, α) and ∇2x xφε(·, α) are all continuous. Since they are piecewise functions, it suffices to check the junction points.

First, we check thatφε(·, α) is continuous.

(i) If|x| − ε = α, then φε(x, α) = 43α2, which impliesφε(·, α) is continuous.

(ii) If|x| − ε = −α, then φε(x, α) = 0. Hence, φε(·, α) is continuous.

Next, we check∇xφε(·, α) is continuous.

(i) If|x| − ε = α, then ∇xφε(x, α) = 2α sgn(x).

(ii) If |x| − ε = −α, then ∇xφε(x, α) = 0. From the above, it clear to see that

xφε(·, α) is continuous.

Now we show that∇2x xφε(·, α) is continuous.

(i) If|x| − ε = α, ∇x x2 φε(x, α) = 2.

(ii)|x| − ε = −α then ∇x x2 φε(x, α) = 0. Hence, ∇2x xφε(·, α) is continuous.

(c) It is clear that lim

α→0φε(x, α) = |x|2ε holds by part(a). It remains to verify

α→0lim∇xφε(x, α) = ∇(|x|2ε). First, we compute that

∇(|x|2ε) =

2(|x| − ε)sgn(x) if |x| − ε ≥ 0,

0 if |x| − ε < 0. (10)

In light of (10), we proceed the arguments by discussing four cases.

(i) For|x| − ε ≥ α, we have ∇xφε(x, α) − ∇(|x|2ε) = 0. Then, the desired result follows.

(ii) For 0< |x| − ε < α, we have

xφε(x, α) − ∇(|x|2ε) = 1

(|x| − ε + α)2sgn(x) − 2(|x| − ε)sgn(x) which yields

α→0lim(∇xφε(x, α) − ∇(|x|2ε)) = lim

α→0

(|x| − ε + α)2− 4α(|x| − ε)

2α sgn(x).

We notice that|x| → ε when α → 0, and hence (|x|−ε+α)22−4α(|x|−ε)α00. Then, applying L’hopital rule yields

α→0lim

(|x| − ε + α)2− 4α(|x| − ε)

2α = lim

α→0(α − (|x| − ε)) = 0.

This implies lim

α→0(∇xφε(x, α) − ∇(|x|2ε)) = 0, which is the desired result.

(7)

(iii) For−α < |x|−ε ≤ 0, we have ∇xφε(x, α)−∇(|x|2ε) = 21α(|x|−ε +α)2sgn(x).

Then, applying L’hopital rule gives

α→0lim

(|x| − ε + α)2

2α = lim

α→0(|x| − ε + α) = 0.

Thus, we prove that lim

α→0(∇xφε(x, α) − ∇(|x|2ε)) = 0 under this case.

(iv) For|x| − ε ≤ −α, we have ∇xφε(x, α) − ∇(|x|2ε) = 0. Then, the desired result

follows clearly.

Now, we use the family of smoothing functions φε to replace the square of ε- insensitive loss function in (5) to obtain the first smooth support vector regression. In other words, we consider

minω Fε,α(ω) := 1

2ωTω +C

21T ε ¯Aω − y, α

. (11)

whereω := (x, b) ∈ IRn+1, and ε(Ax + 1b − y, α) ∈ IRmis defined by ε(Ax + 1b − y, α)i = φε(Aix+ b − yi, α) .

This is a strongly convex unconstrained optimization with the twice continuously differentiable objective function. Noting lim

α→0φε(x, α) = |x|2ε, we see that

minω Fε,0(ω) := lim

α→0Fε,α(ω) = 1

2ωTω + C 2

m i=1

¯AiTω − yi2

ε (12)

which is exactly the problem (5).

The following Theorem shows that the unique solution of the smooth problem (11) approaches to the unique solution of the problem (12) asα → 0. Indeed, it plays as the same role as [7, Theorem 2.2].

Theorem 2.1 Let Fε,α(ω) and Fε,0(ω) be defined as in (11) and (12), respectively.

Then, the following hold.

(a) There exists a unique solution ¯ωα to min

ω∈IRn+1Fε,α(ω) and a unique solution ¯ω to

ω∈IRminn+1Fε,0(ω).

(b) For all 0< α < ε, we have the following inequality:

¯ωα− ¯ω 2≤ 1

6Cmα2. (13)

Moreover, ¯ωα converges to ¯ω as α → 0 with an upper bound given by (13).

(8)

Proof (a) In view ofφε(x, α) − |x|2ε ≥ 0 in Lemma2.1(a), we see that the level sets

Lv(Fε,α(ω)) :=

ω ∈ IRn+1| Fε,α(ω) ≤ v Lv(Fε,0(ω)) :=

ω ∈ IRn+1| Fε,0(ω) ≤ v

satisfy

Lv(Fε,α(ω)) ⊆ Lv(Fε,0(ω)) ⊆

ω ∈ IRn+1| ωTω ≤ 2v

(14)

for any v ≥ 0. Hence, we obtain that Lv(Fε,α(ω)) and Lv(Fε,0(ω)) are compact (closed and bounded) subsets in IRn+1. Then, by the strong convexity of Fε,0(ω) and Fε,α(ω) with α > 0, each of the problems min

ω∈IRn+1Fε,α(ω) and min

ω∈IRn+1Fε,0(ω) has a unique solution.

(b) From the optimality condition and strong convexity of Fε,0(ω) and Fε,α(ω) with α > 0, we know that

Fε,0( ¯ωα) − Fε,0( ¯ω) ≥ ∇ Fε,0( ¯ωα− ¯ω) +1

2 ¯ωα− ¯ω 2≥ 1

2 ¯ωα− ¯ω 2, (15) Fε,α( ¯ω) − Fε,α( ¯ωα) ≥ ∇ Fε,α( ¯ω − ¯ωα) +1

2 ¯ω − ¯ωα 2≥ 1

2 ¯ω − ¯ωα 2. (16) Note that Fε,α(ω) ≥ Fε,0(ω) because φε(x, α) − |x|2ε ≥ 0. Then, adding up (15) and (16) along with this fact yield

¯ωα− ¯ω 2≤ (Fε,α( ¯ω) − Fε,0( ¯ω)) − (Fε,α( ¯ωα) − Fε,0( ¯ωα))

≤ Fε,α( ¯ω) − Fε,0( ¯ω)

=C

21T ε( ¯A ¯ω − y, α) −C 2

m i=1

¯AiT¯ω − yi2

ε

=C 2

m i=1

φε( ¯Ai¯ω − yi, α) −C 2

m i=1

¯ATi ¯ω − yi2

ε

≤ 1 6Cmα2,

where the last inequality is due to Lemma2.1(a). It is clear that ¯ωα converges to ¯ω as α → 0 with an upper bound given by the above. Then, the proof is complete.

Next, we focus on the optimality condition of the minimization problem (11), which is indeed sufficient and necessary for (11) and has the form of

ωFε,α(ω) = 0.

(9)

With this, we define a function Hφ: IRn+2→ IRn+2by

Hφ(z) =

 α

ωFε,α(ω)



=

 α

ω + Cm

i=1xφε( ¯ATi ω − yi, α) ¯Ai



(17)

where z := (α, ω) ∈ IRn+2. From Lemma2.1and the strong convexity of Fε,α(ω), it is easy to see that if Hφ(z) = 0, then α = 0 and ω solves (11); and for any z ∈ IR++× IRn+1, the function Hφ is continuously differentiable. In addition, the Jacobian of Hφcan be calculated as below:

Hφ (z) =

 1 0

ωα2 Fε,α(ω) ∇ωω2 Fε,α(ω)



(18)

where

ωα2 Fε,α(ω) = C m i=1

x2αφε

¯ATi ω − yi, α

¯Ai,

ωω2 Fε,α(ω) = I + C m i=1

x x2 φε

¯ATi ω − yi, α

¯Ai ¯ATi .

From (8), we can see ∇x x2 φε(x, α) ≥ 0, which implies Cm

i=1x x2 φε( ¯ATi ω − yi, α) ¯Ai ¯AiT is positive semidefinite. Hence, ∇ωω2 Fε,α(ω) is positive definite. This helps us to prove that Hφ (z) is invertible at any z ∈ IR++× IRn+1. In fact, if there exists a vector d := (d1, d2) ∈ IR × IRn+1such that Hφ (z)d = 0, then we have

 d1

d1ωα2 Fε,α(ω) + ∇ωω2 Fε,α(ω)d2



= 0.

This implies that d= 0, and hence Hφ (z) is invertible at any z ∈ IR++× IRn+1.

3 The second smooth support vector machine

In this section, we consider another type of smoothing functionsψε,p(x, α) : IR × IR+→ IR+, which is defined by

ψε,p(x, α) =

⎧⎪

⎪⎩

0 if 0≤ |x| ≤ ε − α,

pα−1

(p−1)(|x|−ε+α) pα

p

if ε − α < |x| < ε + pα−1,

|x| − ε if |x| ≥ ε + pα−1.

(19)

here p≥ 2. The graphs of ψε,p(x, α) are depicted in Fig.2, which clearly verify that ψε,p(x, α) is a family of smoothing functions for |x|ε.

As in Lemma3.1, we verify thatψε,p(x, α) is a family of smoothing functions for

|x|ε, hence,ψε,p2 (x, α) is also a family of smoothing functions for |x|2ε. Then, we can

(10)

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 x

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

|x|

,p(x, 0.03) ,p(x, 0.06) ,p(x, 0.09)

Fig. 2 Graphs ofψε,p(x, α) with ε = 0.1, α = 0.03, 0.06, 0.09 and p = 2

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

x 0

0.002 0.004 0.006 0.008 0.01 0.012 0.014

|x|2 (x, 0.06) (x, 0.09)

,p 2 (x, 0.06)

,p 2 (x, 0.09)

Fig. 3 Graphs of|x|2ε,φε(x, α) and ψε,p2 (x, α) with ε = 0.1, α = 0.06, 0.09 and p = 2

employψε,p2 to replace the square ofε-insensitive loss function in (5) as the same way done in Sect.2. The graphs ofψε,p2 (x, α) with comparison to φε(x, α) are depicted in Fig.3. In fact, there is a relation betweenψε,p2 (x, α) and φε(x, α) shown as in Proposition3.1.

(11)

In other words, we obtain an alternative strongly convex unconstrained optimization for (5):

minω

1

2ωTω +C 2

m i=1

ψε,p2 

¯ATi ω − yi, α

. (20)

However, the smooth functionψε,p2 (x, α) is not twice differentiable with respect x, and hence the objective function of (20) is not twice differentiable although it smooth.

Then, we still cannot apply Newton-type method to solve (20). To conquer this, we take another smoothing technique. Before presenting the idea of this smoothing technique, the following two lemmas regarding properties ofψε,p(x, α) are needed. To this end, we also compute the partial derivative ofψε,p(x, α) as below:

xψε,p(x, α) =

⎧⎪

⎪⎩

0 if 0≤ |x| ≤ ε − α,

sgn(x)

(p−1)(|x|−ε+α) pα

p−1

ifε − α < |x| < ε + p−1α ,

sgn(x) if|x| ≥ ε + p−1α .

αψε,p(x, α) =

⎧⎪

⎪⎩

0 if 0≤ |x| ≤ ε − α,

(ε−|x|)(p−1)+α pα

(p−1)(|x|−ε+α) pα

p−1

if ε − α < |x| < ε + pα−1,

0 if |x| ≥ ε + pα−1.

Lemma 3.1 Letψε,p(x, α) be defined as in (19). Then, we have (a) ψε,p(x, α) is smooth with respect to x for any p ≥ 2;

(b) lim

α→0ψε,p(x, α) = |x|εfor any p≥ 2.

Proof (a) To prove the result, we need to check bothψε,p(·, α) and ∇xψε,p(·, α) are continuous.

(i) If|x| = ε − α, then ψε,p(x, α) = 0.

(ii) If |x| = ε + pα−1, thenψε,p(x, α) = pα−1. Form (i) and (ii), it is clear to see ψε,p(·, α) is continuous.

Moreover, (i) If|x| = ε − α, then ∇xψε,p(x, α) = 0.

(ii) If|x| = ε + pα−1, then∇xψε,p(x, α) = sgn(x). In view of (i) and (ii), we see that

xψε,p(·, α) is continuous.

(b) To proceed, we discuss four cases.

(1) If 0≤ |x| ≤ ε − α, then ψε,p(x, α) − |x|ε= 0. Then, the desired result follows.

(2) Ifε − α ≤ |x| ≤ ε, then ψε,p(x, α) − |x|ε= pα−1

(p−1)(|x|−ε+α) pα

p

. Hence,

α→0lim



ψε,p(x, α) − |x|ε



= lim

α→0

 α

p− 1

 (p − 1)(|x| − ε + α)

p

= lim

α→0

 α

p− 1



α→0lim

(p − 1)(|x| − ε + α)

p

.

(12)

It is clear that the first limit is zero, so we only need to show that the second limit is bounded. To this end, we rewrite it as

α→0lim

(p − 1)(|x| − ε + α)

p

= lim

α→0

p− 1 p

p

|x| − ε + α α

p

.

We notice that|x| → ε when α → 0 so that |x|−ε+αα00. Therefore, by applying L’hopital’s rule, we obtain

α→0lim

|x| − ε + α α



= 1

which implies that limα→0



ψε,p(x, α) − |x|ε



= 0 under this case.

(3) Ifε ≤ |x| ≤ ε + pα−1, then

ψε,p(x, α) − |x|ε = α p− 1

(p − 1)(|x| − ε + α)

p

− (|x| − ε).

We have shown in case (2) that

α→0lim α p− 1

(p − 1)(|x| − ε + α)

p

= 0.

It is also obvious that limα→0(|x| − ε) = 0. Hence, we obtain limα→0

ψε,p(x, α) −

|x|ε

= 0 under this case.

(4) If|x| ≥ ε +pα−1, the desired result follows since it is clear thatψε,p(x, α)−|x|ε=

0. From all the above, the proof is complete.

Lemma 3.2 Letψε,p(x, α) be defined as in (19). Then, we have (a) ψε,p(x, α)sgn(x) is smooth with respect to x for any p ≥ 2;

(b) lim

α→0ψε,p(x, α)sgn(x) = |x|εsgn(x) for any p ≥ 2.

Proof (a) First, we observe thatψε,p(x, α)sgn(x) can be written as ψε,p(x, α)sgn(x) =

⎧⎪

⎪⎩

0 if 0≤ |x| ≤ ε − α,

pα−1

(p−1)(|x|−ε+α) pα

p

sgn(x) if ε − α < |x| < ε + pα−1, (|x| − ε)sgn(x) if |x| ≥ ε + pα−1.

Note that sgn(x) is continuous at x = 0 and ψε,p(x, α) = 0 at x = 0, then applying Lemma3.1(a) yieldsψε,p(x, α)sgn(x) is continuous. Furthermore, by simple calcu- lations, we have

(13)

xε,p(x, α)sgn(x)) = ∇xψε,p(x, α)sgn(x)

=

⎧⎪

⎪⎩

0 if 0≤ |x| ≤ ε − α,

(p−1)(|x|−ε+α) pα

p−1

ifε − α < |x| < ε + p−1α ,

1 if|x| ≥ ε + p−1α .

(21)

Mimicking the arguments as in Lemma3.1(a), we can verify that∇xε,p(x, α)sgn(x)) is continuous. Thus, the desired result follows.

(b) By Lemma3.1(b), it is easy to see that lim

α→0ψε,p(x, α)sgn(x) = |x|εsgn(x). Then,

the desired result follows.

Note that|x|2εis smooth with

∇(|x|2ε) = 2|x|εsgn(x) =

2(|x| − ε)sgn(x) if |x| > ε,

0 if |x| ≤ ε.

being continuous (but not differentiable). Then, we consider the optimality condition of (12), that is

ωFε,0(ω) = ω + C m

i=1

| ¯ATi ω − yi|εsgn

¯ATi ω − yi

 ¯Ai = 0, (22)

which is indeed sufficient and necessary for (5). Hence, solving (22) is equivalent to solving (5).

Using the family of smoothing functionsψε,p to replaceε-loss function of (22) leads to a system of smooth equations. More specifically, we define a function Hψ : IRn+2→ IRn+2by

Hψ(z) = Hψ(α, ω) =

 α

ω + Cm

i=1ψε ¯ATi ω − yi, α

sgn ¯ATi ω − yi ¯Ai



where z:= (α, ω) ∈ IRn+2. From Lemma3.1, it is easy to see that if Hψ(z) = 0, then α = 0 and ω is the solution of the equations (22), i.e., the solution of (12). Moreover, for any z∈ IR++× IRn+1, the function Hψ is continuously differentiable with

Hψ (z) =

 1 0

E(ω) I + D(ω)



(23)

where

E(ω) = C m

i=1

αψε

¯AiTω − yi, α sgn

¯ATi ω − yi

 ¯Ai,

D(ω) = C m

i=1

xψε

¯ATi ω − yi, α sgn

¯AiTω − yi

 ¯Ai ¯ATi .

(14)

Because∇xψε( ¯ATi ω − yi, α)sgn( ¯AiTω − yi) is nonnegative for any α > 0 from (21), we see that I+ D(x) is positive definite at any z ∈ IR++×IRn+1. Following the similar arguments as in Sect.2, we obtain that Hψ (z) is invertible at any z ∈ IR++× IRn+1. Proposition 3.1 Letφε(x, α) be defined as in (6) andψε,p(x, α) be defined as in (19).

Then, the following hold.

(a) For p≥ 2, we have φε(x, α) ≥ ψε,p2 (x, α) ≥ |x|2ε. (b) For p≥ q ≥ 2, we have ψε,q(x, α) ≥ ψε,p(x, α).

Proof (a) First, we show thatφε(x, α) ≥ ψε,p2 (x, α) holds. To proceed, we discuss four cases.

(i) If|x| ≤ ε − α, then φε(x, α) = 0 = ψε,p2 (x, α).

(ii) Ifε−α < |x| < ε+p−1α , then|x| ≤ ε+pα−1which is equivalent to|x|−ε+α1pαp−1. Thus, we have

φε(x, α)

ψε,p2 (x, α) = α2 p−3p2 p

6(p − 1)2 p−2(|x| − ε + α)2 p−3p3

6(p − 1) ≥ 1,

which impliesφε(x, α) ≥ ψε,p2 (x, α).

(iii) Forε + pα−1 ≤ |x| < ε + α, letting t := |x| − ε ∈ [pα−1, α) yields

φε(x, α) − ψε,p2 (x, α) = 1

(t + α)3− t2= t

 1 6αt2−1

2t+1 2α

 +1

6α2≥ 0.

here the last inequality follows from the fact that discriminant of 61αt212t+12α is less than 0 and 61α > 0. Then, φε(x, α) − ψε,p2 (x, α) > 0.

(iv) If|x| ≥ ε+α, then it is clear that φε(x, α) = (|x|−ε)2+13α2≥ (|x|−ε)2= ψε,p2 . Now we show that the other partψε,p(x, α) ≥ |x|ε, which is equivalent to verifying ψε,p2 (x, α) ≥ |x|2ε. Again, we discuss four cases.

(i) If|x| ≤ ε − α, then ψε,p(x, α) = 0 = |x|ε.

(ii) Ifε − α < |x| ≤ ε, then ε − α < |x| which says |x| − ε + α > 0. Thus, we have ψε,p(x, α) ≥ 0 = |x|ε.

(iii) Forε < |x| < ε + pα−1, we let t:= |x| − ε ∈ (0, pα−1) and define a function as

f(t) = α p− 1

(p − 1)(t + α)

p

− t,

which is a function on

0, pα−1

. Note that f(|x| − ε) = ψε,p(x, α) − |x|εfor|x| ∈ (ε, ε + pα−1) and observe that

f (t) =

(p − 1)(t + α)

p−1

− 1 ≤

(p − 1)(pα−1+ α)

p−1

− 1 = 0.

(15)

This means f(t) is monotone decreasing on (0, pα−1). Since f (pα−1) = 0, we have f(t) ≥ 0 for t ∈ (0,pα−1), which implies ψε,p(x, α) ≥ |x|εfor|x| ∈ (ε, ε + pα−1).

(iv) If|x| ≥ ε + pα−1, then it is clear thatψε,p(x, α) = |x| − ε = |x|ε. (b) For p≥ q ≥ 2, it is obvious to see that

ψε,q(x, α) = ψε,p(x, α) for |x| ∈ [0, ε − α] ∪

 ε + α

q− 1, +∞

 .

If|x| ∈ [ε + pα−1, ε + q−1α ), then ψε,p(x, α) = |x|ε ≤ ψε,q(x, α) from the above.

Thus, we only need to prove the case of|x| ∈ (ε − α, ε + p−1α ).

Consider|x| ∈ (ε − α, ε + pα−1) and t := |x| − ε + α, we observe that αtp−1p . Then, we verify that

ψε,q(x, α)

ψε,p(x, α) = (q − 1)q−1pp (p − 1)p−1qq ·

t

p−q

(q − 1)q−1pp (p − 1)p−1qq ·

p− 1 p

p−q

=

p q

q

·

q− 1 p− 1

q−1

=



1+ p−qq q



1+qp−q−1q−1

≥ 1,

where the last inequality is due to(1 + p−qx )xbeing increasing for x > 0. Thus, the

proof is complete.

4 A smoothing Newton algorithm

In Sects.2 and3, we construct two systems of smooth equations: Hφ(z) = 0 and Hψ(z) = 0. We briefly describe the difference between Hφ(z) = 0 and Hψ(z) = 0.

In general, the way we come up with Hφ(z) = 0 and Hψ(z) = 0 is a bit different.

For achieving Hφ(z) = 0, we first use the twice continuously differentiable functions φε(x, α) to replace |x|2ε in problem (5), and then write out its KKT condition. To the contrast, for achieving Hψ(z) = 0, we write out the KKT condition of problem (5) first, then we use the smoothing functionsψε,p(x, α) to replace ε-loss function of (22) therein. For convenience, we denote ˜H(z) ∈ {Hφ(z), Hψ(z)}. In other words, ˜H(z) possesses the property that if ˜H(z) = 0, then α = 0 and ω solves (12). In view of this, we apply some Newton-type methods to solve the system of smooth equations H˜(z) = 0 at each iteration and letting α → 0 so that the solution to the problem (12) can be found.

Algorithm 4.1 (A smoothing Newton method)

(16)

Step 0 Choose δ ∈ (0, 1), σ ∈ (0,12), and α0 > 0. Take τ ∈ (0, 1) such that τα0 < 1. Let ω0 ∈ IRn+1be an arbitrary vector. Set z0 := (α0, ω0). Set e0:= (1, 0, . . . , 0) ∈ IRn+2.

Step 1 If ˜H(zk) = 0, stop.

Step 2 Define function , β by

(z) := ˜H(zk) 2 and β(z) := τ min{1, (z)}. (24)

Computezk:= (αk, xk) by H˜

 zk

+ ˜H

 zk

zk = α0β zk

 e0.

Step 3 Letθkbe the maximum of the values 1, δ, δ2, · · · such that 

zk+ λkzk

≤

1− 2σ (1 − γ α0) θk

  zk

. (25)

Step 4 Set zk+1:= zk+ θkzk, and k := k + 1, Go to step 1.

Proposition 4.1 Suppose that the sequence{zk} is generated by Algorithm4.1. Then, the following results hold.

(a) { (zk)} is monotonically decreasing.

(b) { ˜H(zk)} and {β(zk)} are monotonically decreasing.

(c) LetN (τ) := {z ∈ IR+× IRn+1: α0β(z) ≤ α}, then zk ∈ N (τ) for any k ∈ K and 0< αk+1≤ αk.

(d) The algorithm is well defined.

Proof Since the proof is much similar to [6, Remark 2.1], we omit it here.

Lemma 4.1 Let ¯λ := max λi(m

i=1 ¯Ai ¯AiT)

. Then, for any z∈ IR++× IRn+1, we have

(a) 1≤ λi(Hφ (z)) ≤ 1 + 2¯λ, i = 1, · · · , n + 2;

(b) 1≤ λi(Hψ (z)) ≤ 1 + ¯λ, i = 1, · · · , n + 2.

Proof (a) Hφ (z) is continuously differentiable at any z ∈ IR++× IRn+1, and by (18), it is easy to see that{1, λ1(∇ωω2 Fε,α(ω)), · · · , λn+1(∇ωω2 Fε,α(ω))} are eigenvalues of Hφ (z). From the representation of ∇x x2 φεin (8), we have 0≤ ∇x x2 φε( ¯ATi ω−yi, α) ≤ 2.

As∇ωω2 Fε,α(ω) = I +m

i=1x x2 φε( ¯AiTω − yi, α) ¯Ai ¯ATi , then 1≤ λi

∇ωω2 Fε,α(ω)

≤ 1 + 2¯λ(i = 1, · · · , n + 1). (26)

Thus the result (i) holds.

參考文獻

相關文件

the prediction of protein secondary structure, multi-class protein fold recognition, and the prediction of human signal peptide cleavage sites.. By using similar data, we

infinite ensemble learning could be better – existing AdaBoost-Stump applications may switch. derived new and

include domain knowledge by specific kernel design (e.g. train a generative model for feature extraction, and use the extracted feature in SVM to get discriminative power).

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning.. 3 Distributed clustering

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning?. 3 Distributed clustering