Machine Learning Techniques ( 機器學習技巧)
Lecture 4: Soft-Margin SVM
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.twDepartment of Computer Science
& Information Engineering
National Taiwan University ( 國立台灣大學資訊工程系)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22
Soft-Margin SVM
Agenda
Lecture 4: Soft-Margin SVM
Soft-Margin SVM: Primal
Soft-Margin SVM: Dual
Soft-Margin SVM: Solution
Soft-Margin SVM: Selection
Soft-Margin SVM Soft-Margin SVM: Primal
Cons of Hard-Margin SVM
recall: SVM can still overfit :-(
Φ
1
•
part of reasons: Φ•
other part:separable
Φ
4
if always insisting onseparable
(=⇒shatter),
have power to
overfit to noise
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22
Soft-Margin SVM Soft-Margin SVM: Primal
Give Up on Some Examples
want:
give up
on some noisy examplesmin b,w N
X
n=1
qy
n6= sign(w
Tz
n+ b) y
hard-margin SVM
min b,w
1 2 w
Tw
s.t. y n (w T x n + b) ≥ 1 for all n
combination: min
b,w
1
2 w T w
+C
·N
X
n=1
r
y n 6= sign(w T z n + b) z
s.t. yn
(wT x n
+b)≥ 1 forcorrect
ny
n
(wT x n
+b)≥−∞
forincorrect
nC: trade-off of large margin
&noise tolerance
Soft-Margin SVM Soft-Margin SVM: Primal
Soft-Margin SVM (1/2)
min b,w
1
2 w
Tw + C ·
N
X
n=1
qy
n6= sign(w
Tz
n+ b) y
s.t. y n (w T x n + b) ≥ 1 − ∞ · qy
n6= sign(w
Tz
n+ b) y
• J·K
: non-linear—not QP anymore :-((dual? kernel?)•
cannot distinguishsmall error (slightly away from fat boundary)
orlarge error (a...w...a...y... from fat boundary)
•
record ‘margin violation’ byξ n
—linear constraints•
penalize withmargin violation
instead oferror count
—quadratic objective
soft-margin SVM: min
b,w,ξ
1
2 w
Tw + C ·
N
X
n=1
ξ
ns.t. y n (w T x n + b) ≥ 1 − ξ
nand ξ
n≥ 0 for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22
Soft-Margin SVM Soft-Margin SVM: Primal
Soft-Margin SVM (2/2)
•
record ‘margin violation’ byξ n
•
penalize withmargin violation
b,w,ξ min 1
2 w
Tw + C ·
N
X
n=1
ξ
ns.t. y n (w T x n + b) ≥ 1 − ξ
nand ξ
n≥ 0 for all n
Hi Hi
violation
•
parameterC: trade-off of large margin
&margin violation
• large C: want less margin violation
• small C: want large margin
• QP
ofd ˜
+1 + N variables, 2N constraints next: remove dependence ond ˜
bysoft-margin SVM primal⇒
dual?
Soft-Margin SVM Soft-Margin SVM: Primal
Fun Time
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22
Soft-Margin SVM Soft-Margin SVM: Dual
Lagrange Dual
primal: min
b,w,ξ
1
2
w T w + C
·N
X
n=1
ξ
n
s.t.
y n (w T x n + b) ≥ 1 − ξ n
andξ n ≥ 0
for all n Lagrange function with Lagrange multipliersα n
andβ n
L(b, w, ξ, α, β) = 1
2 w T w + C ·
N
X
n=1
ξ n
+
N
X
n=1
α
n· 1 − ξ
n− y
n(w
Tx
n+ b) +
N
X
n=1
β
n· (−ξ
n)
want: Lagrange dual max
α
n≥0, β
n≥0
b,w,ξ
min L(b, w, ξ,α, β)
Soft-Margin SVM Soft-Margin SVM: Dual
Simplify ξ n and β n
max
αn≥0,βn≥0
min
b,w,ξ
1
2 w
Tw + C ·
N
X
n=1
ξ
n+
N
X
n=1
α
n· 1 − ξ
n− y
n(w
Tx
n+ b) +
N
X
n=1
β
n· (− ξ
n)
!
• ∂ξ ∂L
n =0 = C−α n −β n
•
no loss of optimality if solving with implicit constraintβ n
=C−α n
and explicit constraint 0≤α n
≤C: β n
removedξ can also be removed :-), like how we removed b
max
0≤α
n≤C,
βn=C−α
nb,w,ξ min 1 2 w T w +
N
X
n=1
α
n(1 − y
n(w
Tz
n+ b))
XX XX
XX XX XX X +
N
P
n=1
(C − α
n− β
n) · ξ
n!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22
Soft-Margin SVM Soft-Margin SVM: Dual
Other Simplifications
max
0≤α
n≤C,
βn=C−α
nmin b,w
1 2 w T w +
N
X
n=1
α
n(1 − y
n(w
Tz
n+ b))
!
familiar? :-)
•
inner problemsame as hard-margin SVM
• ∂L ∂b
=0: no loss of optimality if solving with constraintN
P
n=1
α n y n = 0
• ∂w ∂L
i =0: no loss of optimality if solving with constraintw =
N
P
n=1
α n y n z n
standard dual can be derived
using the same steps as Lecture 18
Soft-Margin SVM Soft-Margin SVM: Dual
Standard Soft-Margin SVM Dual
min
α
1 2
N
X
n=1 N
X
m=1
α n α m
yn
ym z T n z m
−N
X
n=1
α n
subject to
N
X
n=1
y
n α n
=0;0≤
α n ≤ C
, for n = 1, 2, . . . , N;implicitly
w =
N
X
n=1
α n
yn z n
;β n
=C−α n
, for n = 1, 2, . . . , N—only difference to hard-margin:
upper bound
onα n
another (convex)
QP,
with
N variables
&2N + 1
constraintsHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22
Soft-Margin SVM Soft-Margin SVM: Dual
Fun Time
Soft-Margin SVM Soft-Margin SVM: Solution
Kernel Soft-Margin SVM
Kernel Soft-Margin SVM Algorithm
1 q n,m
=yn
ym K
(xn
, xm
);c
=−1N
; (P,r)
forequ./lower-bound/upper-bound constraints
2
α← QP(Q, c, P, r)
3
b←?4
returnSVs
and theirαn
as well as b such that for newx,
gSVM(x) = signP
SV indices n
α
n y n K
(xn
, x) + b
• almost
the same as hard-margin•
more flexible than hard-margin—primal/dual always solvable
remaining question:
step 3
?Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22
Soft-Margin SVM Soft-Margin SVM: Solution
Solving for b
hard-margin SVM
complementary slackness:
α n
(1− yn
(wT x n
+b)) = 0•
SV (αm
> 0)⇒ b = y
m
− wT x m
•
unbounded (αm
< C)⇒
ξ m
=0soft-margin SVM
complementary slackness:
α n
(1−ξ n
− yn
(wT x n
+b)) = 0 (C−α n
)ξn
=0•
SV (αm
> 0)⇒ b = y
m
− ym ξ m
− wT x m
•
unbounded (αm
< C)⇒
ξ m
=0solve unique b with
unbounded SV (x m , y m ):
b =
y m
−N
X
n=1
α n
yn
K (xn
,x m
)—range of b otherwise
Soft-Margin SVM Soft-Margin SVM: Solution
Soft-Margin Gaussian SVM in Action
C = 1 C = 10 C = 100
•
large C =⇒ lessnoise tolerance
=⇒‘overfit’?
• warning: SVM can still overfit :-(
soft-margin Gaussian SVM:
need
careful selection of (γ, C)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22
Soft-Margin SVM Soft-Margin SVM: Solution
Physical Meaning of α n
complementary slackness:
α n
(1−ξ n − y n (w T x n + b)) =
0 (C−α n
)ξn
=0•
non SV (0 =α n
):ξ n
=0,‘away from’/on
fat boundary
•
unbounded SV (0 <α n
< C):ξ n
=0, onfat boundary, locates b
•
4 bounded SV (α n
=C):ξ n
=violation amount,‘violate’/on
fat boundary
α n
can be used fordata analysis
Soft-Margin SVM Soft-Margin SVM: Solution
Fun Time
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22
Soft-Margin SVM Soft-Margin SVM: Selection
Practical Need: Model Selection
replacemen
•
complicated even for(C, γ) of Gaussian SVM
•
more combinations if including other kernels or parametershow to select?
validation :-)
Soft-Margin SVM Soft-Margin SVM: Selection
Selection by Cross Validation
replacemen
0.3500 0.3250 0.3250
0.2000 0.2250 0.2750
0.1750 0.2250 0.2000
•
Ecv
(C, γ): ‘non-smooth’function of (C, γ)
—difficult to optimize
•
proper models can be chosen byV -fold cross validation
ona few grid values of (C, γ)
E
cv
: very popular criteria for soft-margin SVMHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22
Soft-Margin SVM Soft-Margin SVM: Selection
Leave-One-Out CV Error for SVM
recall: E
loocv
= Ecv
with N folds claim: Eloocv
≤#SV N
•
for(x N , y N ): if optimal α N = 0
(non-SV)=⇒
(α 1 , α 2 , . . . , α N−1 ) still optimal
whenleaving out (x N , y N )
key:
what if there’s better
αn
?•
SVM:g −
=g whenleaving out non-SV
enon-SV
= err(g−
,non-SV)
= err(g,
non-SV) =
0 eSV
≤ 1x1−x2−1=0 0.707
motivation from hard-margin SVM:
only
SVs needed
scaled #SV bounds leave-one-out CV error
Soft-Margin SVM Soft-Margin SVM: Selection
Selection by # SV
replacemen
38 37 37
27 21 17
21 18 19
•
nSV(C, γ): ‘non-smooth’function of (C, γ)
—difficult to optimize
• just an upper bound!
•
dangerous models can be ruled out bynSV
ona few grid values of (C, γ)
nSV: often used as a
safety check
if computing Ecv
not allowedHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22
Soft-Margin SVM Soft-Margin SVM: Selection
Fun Time
Soft-Margin SVM Soft-Margin SVM: Selection
Summary
Lecture 4: Soft-Margin SVM Soft-Margin SVM: Primal
add margin violations ξ n Soft-Margin SVM: Dual
adds upper bound to α n
Soft-Margin SVM: Solution
formulated by bounded/unbounded SVs Soft-Margin SVM: Selection
cross-validation, or approximately nSV
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/22