Machine Learning Techniques ( 機器學習技法)
Lecture 4: Soft-Margin Support Vector Machine
Hsuan-Tien Lin (林軒田)htlin@csie.ntu.edu.tw
Department of Computer Science
& Information Engineering
National Taiwan University ( 國立台灣大學資訊工程系)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22
Soft-Margin Support Vector Machine
Roadmap
1
Embedding Numerous Features: Kernel ModelsLecture 3: Kernel Support Vector Machine kernel
as a shortcut to (transform + inner product) toremove dependence on ˜ d: allowing a spectrum of
simple (linear) models to infinite dimensional(Gaussian) ones with margin control
Lecture 4: Soft-Margin Support Vector Machine Motivation and Primal Problem
Dual Problem
Messages behind Soft-Margin SVM Model Selection
2 Combining Predictive Features: Aggregation Models
3 Distilling Implicit Features: Extraction Models
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Cons of Hard-Margin SVM
recall: SVM can still overfit :-(
Φ
1
•
part of reasons: Φ•
other part:separable
Φ
4
if always insisting onseparable
(=⇒shatter),
have power to
overfit to noise
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Give Up on Some Examples
want:
give up
on some noisy examplesmin
b,w NX
n=1
qy
n6= sign(w
Tz
n+ b) y
hard-margin SVM
min
b,w1 2 w
Tw
s.t. y
n(w
Tz
n+ b) ≥ 1 for all n
combination: min
b,w
1
2 w T w
+C
·N
X
n=1
r
y n 6= sign(w T z n + b) z
s.t. yn
(wT z n
+b)≥ 1 forcorrect
ny
n
(wT z n
+b)≥−∞
forincorrect
nC: trade-off of large margin
&noise tolerance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Soft-Margin SVM (1/2)
min
b,w
1
2 w
Tw + C ·
N
X
n=1
qy
n6= sign(w
Tz
n+ b) y
s.t. y
n(w
Tz
n+ b) ≥ 1 − ∞ · qy
n6= sign(w
Tz
n+ b) y
• J·K
: non-linear,not QP anymore :-(
—what about dual? kernel?
•
cannot distinguishsmall error (slightly away from fat boundary)
orlarge error (a...w...a...y... from fat boundary)
•
record ‘margin violation’ byξ n
—linear constraints•
penalize withmargin violation
instead oferror count
—quadratic objective
soft-margin SVM: min
b,w,ξ
1
2 w
Tw + C ·
N
X
n=1
ξ
ns.t. y
n(w
Tz
n+ b) ≥ 1 − ξ
nand ξ
n≥ 0 for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Soft-Margin SVM (2/2)
•
record ‘margin violation’ byξ n
•
penalize withmargin violation
b,w,ξ
min 1
2 w
Tw + C ·
N
X
n=1
ξ
ns.t. y
n(w
Tz
n+ b) ≥ 1 − ξ
nand ξ
n≥ 0 for all n
Hi Hi
violation
•
parameterC: trade-off of large margin
&margin violation
• large C: want less margin violation
• small C: want large margin
• QP
ofd ˜
+1 + N variables, 2N constraints next: remove dependence ond ˜
bysoft-margin SVM primal⇒
dual?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Fun Time
At the optimal solution of
b,w,ξ
min 12
w T w + C
·N
X
n=1
ξ
n
s.t. y
n
(wT z n
+b)≥ 1 − ξn
andξn
≥ 0 for all n, assume that y1
(wT z 1
+b) =−10. What is the corresponding ξ1
?1
12
113
214
31Reference Answer: 2
ξ
1
is simply 1− y1
(wT z 1
+b) when y1
(wT z 1
+b)≤ 1.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22
Soft-Margin Support Vector Machine Motivation and Primal Problem
Fun Time
At the optimal solution of
b,w,ξ
min 12
w T w + C
·N
X
n=1
ξ
n
s.t. y
n
(wT z n
+b)≥ 1 − ξn
andξn
≥ 0 for all n, assume that y1
(wT z 1
+b) =−10. What is the corresponding ξ1
?1
12
113
214
31Reference Answer: 2
ξ
1
is simply 1− y1
(wT z 1
+b) when y1
(wT z 1
+b)≤ 1.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22
Soft-Margin Support Vector Machine Dual Problem
Lagrange Dual
primal: min
b,w,ξ
1
2
w T w + C
·N
X
n=1
ξ
n
s.t.
y n (w T z n + b) ≥ 1 − ξ n
andξ n ≥ 0
for all n Lagrange function with Lagrange multipliersα n
andβ n
L(b, w, ξ, α, β) = 1
2 w
Tw + C ·
N
X
n=1
ξ
n+
N
X
n=1
α
n· 1 − ξ
n− y
n(w
Tz
n+ b) +
N
X
n=1
β
n· (−ξ
n)
want: Lagrange dual max
α
n≥0, β
n≥0
b,w,ξ
min L(b, w, ξ,α, β)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22
Soft-Margin Support Vector Machine Dual Problem
Simplify ξ n and β n
max
αn≥0,βn≥0
min
b,w,ξ
1
2 w
Tw + C ·
N
X
n=1
ξ
n+
N
X
n=1
α
n· 1 − ξ
n− y
n(w
Tz
n+ b) +
N
X
n=1
β
n· (− ξ
n)
!
• ∂ξ ∂L
n =0 = C−α n −β n
•
no loss of optimality if solving with implicit constraintβ n
=C−α n
and explicit constraint 0≤α n
≤C: β n
removedξ can also be removed :-), like how we removed b
max
0≤αn≤C,βn=C−αn
b,w,ξ
min 1 2 w
Tw +
N
X
n=1
α
n(1 − y
n(w
Tz
n+ b))
XX XX
XX XX XX X +
N
P
n=1
(C − α
n− β
n) · ξ
n!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22
Soft-Margin Support Vector Machine Dual Problem
Other Simplifications
max
0≤αn≤C,βn=C−αn
min
b,w1 2 w
Tw +
N
X
n=1
α
n(1 − y
n(w
Tz
n+ b))
!
familiar? :-)
•
inner problemsame as hard-margin SVM
• ∂L ∂b
=0: no loss of optimality if solving with constraintN
P
n=1
α n y n = 0
• ∂w ∂L
i =0: no loss of optimality if solving with constraintw =
N
P
n=1
α n y n z n
standard dual can be derived
using the same steps as Lecture 2
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22
Soft-Margin Support Vector Machine Dual Problem
Standard Soft-Margin SVM Dual
min
α
1 2
N
X
n=1 N
X
m=1
α n α m
yn
ym z T n z m
−N
X
n=1
α n
subject to
N
X
n=1
y
n α n
=0;0≤
α n ≤ C
, for n = 1, 2, . . . , N;implicitly
w =
N
X
n=1
α n
yn z n
;β n
=C−α n
, for n = 1, 2, . . . , N—only difference to hard-margin:
upper bound
onα n
another (convex)
QP,
with
N variables
&2N + 1
constraintsHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22
Soft-Margin Support Vector Machine Dual Problem
Fun Time
In the soft-margin SVM, assume that we want to increase the parameter C by 2. How shall the corresponding dual problem be changed?
1
the upper bound ofαn
shall be halved2
the upper bound ofαn
shall be decreased by 23
the upper bound ofαn
shall be increased by 24
the upper bound ofαn
shall be doubledReference Answer: 3
Because C is exactly the upper bound ofα
n
, increasing C by 2 in the primal problem is equivalent to increasing the upper bound by 2 in the dual problem.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22
Soft-Margin Support Vector Machine Dual Problem
Fun Time
In the soft-margin SVM, assume that we want to increase the parameter C by 2. How shall the corresponding dual problem be changed?
1
the upper bound ofαn
shall be halved2
the upper bound ofαn
shall be decreased by 23
the upper bound ofαn
shall be increased by 24
the upper bound ofαn
shall be doubledReference Answer: 3
Because C is exactly the upper bound ofα
n
, increasing C by 2 in the primal problem is equivalent to increasing the upper bound by 2 in the dual problem.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Kernel Soft-Margin SVM
Kernel Soft-Margin SVM Algorithm
1 q n,m
=yn
ym K
(xn
, xm
);p
=−1N
; (A,c)
forequ./lower-bound/upper-bound constraints
2
α← QP(Q
D,p, A, c)
3
b←?4
returnSVs
and theirαn
as well as b such that for newx,
gSVM(x) = signP
SV indices n
α
n y n K
(xn
, x) + b
• almost
the same as hard-margin•
more flexible than hard-margin—primal/dual always solvable
remaining question:
step 3
?Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Solving for b
hard-margin SVM
complementary slackness:
α n
(1− yn
(wT z n
+b)) = 0•
SV (αs
> 0)⇒ b = y
s
− wT z s
•
free (αs
< C)⇒
ξ s
=0soft-margin SVM
complementary slackness:
α n
(1−ξ n
− yn
(wT z n
+b)) = 0 (C−α n
)ξn
=0•
SV (αs
> 0)⇒ b = y
s
− ys ξ s
− wT z s
•
free (αs
< C)⇒
ξ s
=0solve unique b with
free SV (x s , y s ):
b =
y s
− XSV indices n
α n
yn
K (xn
,x s
)—range of b otherwise
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Soft-Margin Gaussian SVM in Action
C = 1 C = 10 C = 100
•
large C =⇒ lessnoise tolerance
=⇒‘overfit’?
• warning: SVM can still overfit :-(
soft-margin Gaussian SVM:
need
careful selection of (γ, C)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Physical Meaning of α n
complementary slackness:
α n
(1−ξ n − y n (w T z n + b)) =
0 (C−α n
)ξn
=0•
non SV (0 =α n
):ξ n
=0,‘away from’/on
fat boundary
•
free SV (0 <α n
< C):ξ n
=0, onfat boundary, locates b
•
4 bounded SV (α n
=C):ξ n
=violation amount,‘violate’/on
fat boundary
α n
can be used fordata analysis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Fun Time
For a data set of size 10000, after solving SVM, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded.
What is the possible range of E
in
(gSVM)in terms of 0/1 error?1
0.0000≤ Ein
(gSVM)≤ 0.10002
0.1000≤ Ein
(gSVM)≤ 0.11263
0.1126≤ Ein
(gSVM)≤ 0.50004
0.1126≤ Ein
(gSVM)≤ 1.0000Reference Answer: 1
The bounded support vectors are the only ones that could violate the fat boundary: ξ
n
≥ 0. If ξn
≥ 1, then the violation causes a 0/1 error on the example. On the other hand, it is also possible thatξn
< 1, and in that case the violation does not cause a 0/1 error.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22
Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM
Fun Time
For a data set of size 10000, after solving SVM, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded.
What is the possible range of E
in
(gSVM)in terms of 0/1 error?1
0.0000≤ Ein
(gSVM)≤ 0.10002
0.1000≤ Ein
(gSVM)≤ 0.11263
0.1126≤ Ein
(gSVM)≤ 0.50004
0.1126≤ Ein
(gSVM)≤ 1.0000Reference Answer: 1
The bounded support vectors are the only ones that could violate the fat boundary:
ξ
n
≥ 0. If ξn
≥ 1, then the violation causes a 0/1 error on the example. On the other hand, it is also possible thatξn
< 1, and in that case the violation does not cause a 0/1 error.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22
Soft-Margin Support Vector Machine Model Selection
Practical Need: Model Selection
replacemen
•
complicated even for(C, γ) of Gaussian SVM
•
more combinations if including other kernels or parametershow to select?
validation :-)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22
Soft-Margin Support Vector Machine Model Selection
Selection by Cross Validation
replacemen
0.3500 0.3250 0.3250
0.2000 0.2250 0.2750
0.1750 0.2250 0.2000
•
Ecv
(C, γ): ‘non-smooth’function of (C, γ)
—difficult to optimize
•
proper models can be chosen byV -fold cross validation
ona few grid values of (C, γ)
E
cv
: very popular criteria for soft-margin SVMHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22
Soft-Margin Support Vector Machine Model Selection
Leave-One-Out CV Error for SVM
recall: E
loocv
= Ecv
with N folds claim: Eloocv
≤#SV N
•
for(x N , y N ): if optimal α N = 0
(non-SV)=⇒
(α 1 , α 2 , . . . , α N−1 ) still optimal
whenleaving out (x N , y N )
key:
what if there’s better
αn
?•
SVM:g −
=g whenleaving out non-SV
enon-SV
= err(g−
,non-SV)
= err(g,
non-SV) =
0 eSV
≤ 1x1−x2−1=0 0.707
motivation from hard-margin SVM:
only
SVs needed
scaled #SV bounds leave-one-out CV error
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22
Soft-Margin Support Vector Machine Model Selection
Selection by # SV
replacemen
38 37 37
27 21 17
21 18 19
•
nSV(C, γ): ‘non-smooth’function of (C, γ)
—difficult to optimize
• just an upper bound!
•
dangerous models can be ruled out bynSV
ona few grid values of (C, γ)
nSV: often used as a
safety check
if computing Ecv
is too time-consumingHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22
Soft-Margin Support Vector Machine Model Selection
Fun Time
For a data set of size 10000, after solving SVM on some parameters, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded. Which of the following cannot be E
loocv
with those parameters?1
0.00002
0.08053
0.11114
0.5566Reference Answer: 4
Note that the upper bound of E
loocv
is 0.1126.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22
Soft-Margin Support Vector Machine Model Selection
Fun Time
For a data set of size 10000, after solving SVM on some parameters, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded. Which of the following cannot be E
loocv
with those parameters?1
0.00002
0.08053
0.11114
0.5566Reference Answer: 4
Note that the upper bound of E
loocv
is 0.1126.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22
Soft-Margin Support Vector Machine Model Selection
Summary
1
Embedding Numerous Features: Kernel ModelsLecture 4: Soft-Margin Support Vector Machine Motivation and Primal Problem
add margin violations ξ n
Dual Problem
upper-bound α n by C Messages behind Soft-Margin SVM
bounded/free SVs for data analysis Model Selection
cross-validation, or approximately nSV
• next: other kernel models for soft binary classification
2 Combining Predictive Features: Aggregation Models
3 Distilling Implicit Features: Extraction Models
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/22