• 沒有找到結果。

Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
23
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Techniques ( 機器學習技巧)

Lecture 4: Soft-Margin SVM

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22

(2)

Soft-Margin SVM

Agenda

Lecture 4: Soft-Margin SVM

Soft-Margin SVM: Primal

Soft-Margin SVM: Dual

Soft-Margin SVM: Solution

Soft-Margin SVM: Selection

(3)

Soft-Margin SVM Soft-Margin SVM: Primal

Cons of Hard-Margin SVM

recall: SVM can still overfit :-(

Φ

1

part of reasons: Φ

other part:

separable

Φ

4

if always insisting on

separable

(=⇒

shatter),

have power to

overfit to noise

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

(4)

Soft-Margin SVM Soft-Margin SVM: Primal

Give Up on Some Examples

want:

give up

on some noisy examples

pocket

min b,w N

X

n=1

qy

n

6= sign(w

T

z

n

+ b) y

hard-margin SVM

min b,w

1 2 w

T

w

s.t. y n (w T x n + b) ≥ 1 for all n

combination: min

b,w

1

2 w T w

+

C

·

N

X

n=1

r

y n 6= sign(w T z n + b) z

s.t. y

n

(w

T x n

+b)≥ 1 for

correct

n

y

n

(w

T x n

+b)≥

−∞

for

incorrect

n

C: trade-off of large margin

&

noise tolerance

(5)

Soft-Margin SVM Soft-Margin SVM: Primal

Soft-Margin SVM (1/2)

min b,w

1

2 w

T

w + C ·

N

X

n=1

qy

n

6= sign(w

T

z

n

+ b) y

s.t. y n (w T x n + b) ≥ 1 − ∞ · qy

n

6= sign(w

T

z

n

+ b) y

• J·K

: non-linear—not QP anymore :-((dual? kernel?)

cannot distinguish

small error (slightly away from fat boundary)

or

large error (a...w...a...y... from fat boundary)

record ‘margin violation’ by

ξ n

—linear constraints

penalize with

margin violation

instead of

error count

—quadratic objective

soft-margin SVM: min

b,w,ξ

1

2 w

T

w + C ·

N

X

n=1

ξ

n

s.t. y n (w T x n + b) ≥ 1 − ξ

n

and ξ

n

≥ 0 for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

(6)

Soft-Margin SVM Soft-Margin SVM: Primal

Soft-Margin SVM (2/2)

record ‘margin violation’ by

ξ n

penalize with

margin violation

b,w,ξ min 1

2 w

T

w + C ·

N

X

n=1

ξ

n

s.t. y n (w T x n + b) ≥ 1 − ξ

n

and ξ

n

≥ 0 for all n

Hi Hi

violation

parameter

C: trade-off of large margin

&

margin violation

• large C: want less margin violation

• small C: want large margin

• QP

of

d ˜

+1 + N variables, 2N constraints next: remove dependence on

d ˜

by

soft-margin SVM primal⇒

dual?

(7)

Soft-Margin SVM Soft-Margin SVM: Primal

Fun Time

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

(8)

Soft-Margin SVM Soft-Margin SVM: Dual

Lagrange Dual

primal: min

b,w,ξ

1

2

w T w + C

·

N

X

n=1

ξ

n

s.t.

y n (w T x n + b) ≥ 1 − ξ n

and

ξ n ≥ 0

for all n Lagrange function with Lagrange multipliers

α n

and

β n

L(b, w, ξ, α, β) = 1

2 w T w + C ·

N

X

n=1

ξ n

+

N

X

n=1

α

n

· 1 − ξ

n

− y

n

(w

T

x

n

+ b)  +

N

X

n=1

β

n

· (−ξ

n

)

want: Lagrange dual max

α

n

≥0, β

n

≥0



b,w,ξ

min L(b, w, ξ,

α, β)



(9)

Soft-Margin SVM Soft-Margin SVM: Dual

Simplify ξ n and β n

max

αn≥0,βn≥0

min

b,w,ξ

1

2 w

T

w + C ·

N

X

n=1

ξ

n

+

N

X

n=1

α

n

· 1 − ξ

n

− y

n

(w

T

x

n

+ b)  +

N

X

n=1

β

n

· (− ξ

n

)

!

∂ξ ∂L

n =0 = C

−α n −β n

no loss of optimality if solving with implicit constraint

β n

=C−

α n

and explicit constraint 0≤

α n

C: β n

removed

ξ can also be removed :-), like how we removed b

max

0≤α

n

≤C,

βn

=C−α

n

b,w,ξ min 1 2 w T w +

N

X

n=1

α

n

(1 − y

n

(w

T

z

n

+ b))

      XX XX

XX XX XX X +

N

P

n=1

(C − α

n

− β

n

) · ξ

n

!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(10)

Soft-Margin SVM Soft-Margin SVM: Dual

Other Simplifications

max

0≤α

n

≤C,

βn

=C−α

n

min b,w

1 2 w T w +

N

X

n=1

α

n

(1 − y

n

(w

T

z

n

+ b))

!

familiar? :-)

inner problem

same as hard-margin SVM

∂L ∂b

=0: no loss of optimality if solving with constraint

N

P

n=1

α n y n = 0

∂w ∂L

i =0: no loss of optimality if solving with constraint

w =

N

P

n=1

α n y n z n

standard dual can be derived

using the same steps as Lecture 18

(11)

Soft-Margin SVM Soft-Margin SVM: Dual

Standard Soft-Margin SVM Dual

min

α

1 2

N

X

n=1 N

X

m=1

α n α m

y

n

y

m z T n z m

N

X

n=1

α n

subject to

N

X

n=1

y

n α n

=0;

0≤

α n ≤ C

, for n = 1, 2, . . . , N;

implicitly

w =

N

X

n=1

α n

y

n z n

;

β n

=C−

α n

, for n = 1, 2, . . . , N

—only difference to hard-margin:

upper bound

on

α n

another (convex)

QP,

with

N variables

&

2N + 1

constraints

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

(12)

Soft-Margin SVM Soft-Margin SVM: Dual

Fun Time

(13)

Soft-Margin SVM Soft-Margin SVM: Solution

Kernel Soft-Margin SVM

Kernel Soft-Margin SVM Algorithm

1 q n,m

=y

n

y

m K

(x

n

, x

m

);

c

=−1

N

; (P,

r)

for

equ./lower-bound/upper-bound constraints

2

α← QP(

Q, c, P, r)

3

b←?

4

return

SVs

and theirα

n

as well as b such that for new

x,

gSVM(x) = sign

 P

SV indices n

α

n y n K

(x

n

, x) + b



• almost

the same as hard-margin

more flexible than hard-margin

—primal/dual always solvable

remaining question:

step 3

?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

(14)

Soft-Margin SVM Soft-Margin SVM: Solution

Solving for b

hard-margin SVM

complementary slackness:

α n

(1− y

n

(w

T x n

+b)) = 0

SV (α

m

> 0)

⇒ b = y

m

− w

T x m

unbounded (α

m

< C)

ξ m

=0

soft-margin SVM

complementary slackness:

α n

(1−

ξ n

− y

n

(w

T x n

+b)) = 0 (C−

α n

n

=0

SV (α

m

> 0)

⇒ b = y

m

− y

m ξ m

− w

T x m

unbounded (α

m

< C)

ξ m

=0

solve unique b with

unbounded SV (x m , y m ):

b =

y m

N

X

n=1

α n

y

n

K (x

n

,

x m

)

—range of b otherwise

(15)

Soft-Margin SVM Soft-Margin SVM: Solution

Soft-Margin Gaussian SVM in Action

C = 1 C = 10 C = 100

large C =⇒ less

noise tolerance

=⇒

‘overfit’?

warning: SVM can still overfit :-(

soft-margin Gaussian SVM:

need

careful selection of (γ, C)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(16)

Soft-Margin SVM Soft-Margin SVM: Solution

Physical Meaning of α n

complementary slackness:

α n

(1

−ξ n − y n (w T x n + b)) =

0 (C−

α n

n

=0

non SV (0 =

α n

):

ξ n

=0,

‘away from’/on

fat boundary

 unbounded SV (0 <

α n

< C):

ξ n

=0, on

fat boundary, locates b

4 bounded SV (

α n

=C):

ξ n

=violation amount,

‘violate’/on

fat boundary

α n

can be used for

data analysis

(17)

Soft-Margin SVM Soft-Margin SVM: Solution

Fun Time

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

(18)

Soft-Margin SVM Soft-Margin SVM: Selection

Practical Need: Model Selection

replacemen

complicated even for

(C, γ) of Gaussian SVM

more combinations if including other kernels or parameters

how to select?

validation :-)

(19)

Soft-Margin SVM Soft-Margin SVM: Selection

Selection by Cross Validation

replacemen

0.3500 0.3250 0.3250

0.2000 0.2250 0.2750

0.1750 0.2250 0.2000

E

cv

(C, γ): ‘non-smooth’

function of (C, γ)

—difficult to optimize

proper models can be chosen by

V -fold cross validation

on

a few grid values of (C, γ)

E

cv

: very popular criteria for soft-margin SVM

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(20)

Soft-Margin SVM Soft-Margin SVM: Selection

Leave-One-Out CV Error for SVM

recall: E

loocv

= E

cv

with N folds claim: E

loocv

#SV N

for

(x N , y N ): if optimal α N = 0

(non-SV)

=⇒

(α 1 , α 2 , . . . , α N−1 ) still optimal

when

leaving out (x N , y N )

key:

what if there’s better

α

n

?

SVM:

g

=g when

leaving out non-SV

e

non-SV

= err(g

,

non-SV)

= err(g,

non-SV) =

0 e

SV

≤ 1

x1−x2−1=0 0.707

motivation from hard-margin SVM:

only

SVs needed

scaled #SV bounds leave-one-out CV error

(21)

Soft-Margin SVM Soft-Margin SVM: Selection

Selection by # SV

replacemen

38 37 37

27 21 17

21 18 19

nSV(C, γ): ‘non-smooth’

function of (C, γ)

—difficult to optimize

just an upper bound!

dangerous models can be ruled out by

nSV

on

a few grid values of (C, γ)

nSV: often used as a

safety check

if computing E

cv

not allowed

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

(22)

Soft-Margin SVM Soft-Margin SVM: Selection

Fun Time

(23)

Soft-Margin SVM Soft-Margin SVM: Selection

Summary

Lecture 4: Soft-Margin SVM Soft-Margin SVM: Primal

add margin violations ξ n Soft-Margin SVM: Dual

adds upper bound to α n

Soft-Margin SVM: Solution

formulated by bounded/unbounded SVs Soft-Margin SVM: Selection

cross-validation, or approximately nSV

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/22

參考文獻

相關文件

in the deep soft tissues of the lower extremities and rarely in the cheek [1]; (2) most ASPS tumours have poorly defined margins and have lobulated or irregular contours [1, 13, 18],

Efficient Image - - Based Methods Based Methods for Rendering Soft Shadows.. for Rendering

soft-margin k -means OOB error RBF network probabilistic SVM GBDT PCA random forest matrix factorization Gaussian kernel kernel LogReg large-margin prototype quadratic programming

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

I The medium lowest-order perturbative contribution enhances the small mass region I Hard splitting can &#34;shield&#34; inner soft radiations from being soft-dropped. I

Alzheimer’s disease, mad cow disease, type II diabetes and other neurodegenerative diseases could also be membrane problems... elegans The red part is a

• Adds variables to the model and subtracts variables from the model, on the basis of the F statistic. •