Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane: Preliminary
max w margin(w)
subject to every y n w T x n > 0 margin(w) = min
n=1,...,N distance(x n , w)
‘shorten’ x and w
distance
needsw 0
and(w 1 , . . . , w d )
differently (to be derived)b
=w 0
| w
|
=
w 1
.. . w d
;
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane: Preliminary
max w margin(w)
subject to every y n w T x n > 0 margin(w) = min
n=1,...,N distance(x n , w)
‘shorten’ x and w
distance
needsw 0
and(w 1 , . . . , w d )
differently (to be derived)b
=w 0
| w
|
=
w 1
.. . w d
;
XX x 0 = X X 1
| x
|
=
x 1
.. . x d
for this part: h(x) = sign(w
T x
+b)
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane: Preliminary
max w margin(w)
subject to every y n w T x n > 0 margin(w) = min
n=1,...,N distance(x n , w)
‘shorten’ x and w
distance
needsw 0
and(w 1 , . . . , w d )
differently (to be derived)b
=w 0
| w
|
=
w 1
.. . w d
;
XX x 0 = X X 1
| x
|
=
x 1
.. . x d
for this part: h(x) = sign(w
T x
+b)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane: Preliminary
max w margin(w)
subject to every y n w T x n > 0 margin(w) = min
n=1,...,N distance(x n , w)
‘shorten’ x and w
distance
needsw 0
and(w 1 , . . . , w d )
differently (to be derived)b
=w 0
| w
|
=
w 1
.. . w d
;
XX x 0 = X X 1
| x
|
=
x 1
.. . x d
for this part: h(x) = sign(w
T x
+b)
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
=−
b
,
w T x 00
=−
b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′ w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
=−
b
,
w T x 00
=−
b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=
0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
=−
b
,
w T x 00
=−
b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=
0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=
0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=
0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
kw
k(x−
x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
k
w
k(x−x 0
)=
1
1 kw
k|w T x
+
b
|
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
k
w
k(x−x 0
)=
1
1 kw
k|w T x
+
b
|
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Hyperplane
want: distance(x,
b, w), with hyperplane w T x 0
+b
=0consider
x 0
,x 00
on hyperplane1 w T x 0
= −b, w T x 00
= −b
2 w
⊥ hyperplane:
w T
(x00
−x 0
)| {z } vector on hyperplane
=0
3
distance = project (x−x 0
)to⊥ hyperplane
dist(x, h)
x′ x′′
w x
distance(x,
b, w) =
w T
k
w
k(x−x 0
)=
1
1k
w
k|w T x + b
|Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Separating Hyperplane
distance(x,
b, w) =
1k
w
k|w T x + b
|• separating
hyperplane: for every ny n (w T x n + b) > 0
•
distance toseparating
hyperplane: distance(xn
,b, w) =
1k
w
ky n
(wT x n
+b)
max
b,w
margin(b,w)
subject to every
y n (w T x n + b) > 0
margin(b,w) =
minn=1,...,N
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Separating Hyperplane
distance(x,
b, w) =
1k
w
k|w T x + b
|• separating
hyperplane: for every ny n (w T x n + b) > 0
•
distance toseparating
hyperplane: distance(xn
,b, w) =
1k
w
ky n
(wT x n
+b)
max
b,w
margin(b,w)
subject to every
y n (w T x n + b) > 0
margin(b,w) =
minn=1,...,N
distance(xn
,b, w)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Separating Hyperplane
distance(x,
b, w) =
1k
w
k|w T x + b
|• separating
hyperplane: for every ny n (w T x n + b) > 0
•
distance toseparating
hyperplane:distance(x
n
,b, w) =
1k
w
ky n
(wT x n
+b)
max
b,w
margin(b,w)
subject to every
y n (w T x n + b) > 0
margin(b,w) =
minn=1,...,N
distance(xn
,b, w)
Linear Support Vector Machine Standard Large-Margin Problem
Distance to Separating Hyperplane
distance(x,
b, w) =
1k
w
k|w T x + b
|• separating
hyperplane: for every ny n (w T x n + b) > 0
•
distance toseparating
hyperplane:distance(x
n
,b, w) =
1k
w
ky n
(wT x n
+b)
max
b,w
margin(b,w)
subject to every
y n (w T x n + b) > 0
margin(b,w) =
minn=1,...,N 1
kwk y n
(wT x n
+b)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatmin
n=1,...,N y n (w T x n + b) = 1
=⇒margin(b,
w) = kwk 1
max
b,w 1 kwk
subject to every y
n
(wT x n
+b)> 0min
n=1,...,N y n (w T x n + b) = 1
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatmin
n=1,...,N y n (w T x n + b) = 1
=⇒margin(b,
w) = kwk 1
max
b,w 1 kwk
subject to every y
n
(wT x n
+b)> 0min
n=1,...,N y n (w T x n + b) = 1
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatn=1,...,N min y n (w T x n + b) = 1
=⇒margin(b,
w) = kwk 1
max
b,w 1 kwk
subject to every y
n
(wT x n
+b)> 0min
n=1,...,N y n (w T x n + b) = 1
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatn=1,...,N min y n (w T x n + b) = 1
=⇒ margin(b, w) = kwk 1
max
b,w 1 kwk
subject to every y
n
(wT x n
+b)> 0min
n=1,...,N y n (w T x n + b) = 1
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatn=1,...,N min y n (w T x n + b) = 1
=⇒ margin(b, w) = kwk 1
max
b,w 1 kwk
subject to every y
n
(wT x n
+b)> 0min
n=1,...,N y n (w T x n + b) = 1
Linear Support Vector Machine Standard Large-Margin Problem
Margin of Special Separating Hyperplane
max
b,w
margin(b,w)
subject to every y
n
(wT x n
+b)
> 0 margin(b,w) =
minn=1,...,N 1
kwk
yn
(wT x n
+b)
• w T x + b
=0 same as 3wT x + 3b
=0: scaling does not matter• special
scaling: only consider separating (b,w)
such thatn=1,...,N min y n (w T x n + b) = 1
=⇒ margin(b, w) = kwk 1
max
b,w 1 kwk
subject to
every y n (w T x n + b) > 0 min
n=1,...,N y n (w T x n + b) = 1
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all noriginal constraint:
min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,
w) here (inside)
if optimal (b,w)
outside, e.g. yn
(wT x n
+b)
>1.126
for all n
—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,
w) here (inside)
if optimal (b,w)
outside, e.g. yn
(wT x n
+b)
>1.126
for all n
—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside, e.g. yn
(wT x n
+b)
>1.126
for all n
—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside,e.g. y
n
(wT x n
+b)
>1.126
for all n
—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside, e.g. yn
(wT x n
+b)
>1.126
for all n
—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside, e.g. yn
(wT x n
+b)
> 1.126 for all n—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside, e.g. yn
(wT x n
+b)
> 1.126 for all n—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1
k
w
k subject tomin
n=1,...,N y n (w T x n + b) = 1
necessary constraints: y
n
(wT x n
+b)
≥ 1 for all n original constraint:min n=1,...,N y n (w T x n + b) = 1
want: optimal (b,w) here (inside)
if optimal (b,
w)
outside, e.g. yn
(wT x n
+b)
> 1.126 for all n—can scale (b,
w)
to “more optimal” (1.126 b
,1.126 w
)(contradiction!)
final change: max =⇒ min, remove√
w
, add
1 2
minb,w
1 2 w T w
subject to y
n
(wT x n
+b)
≥ 1for all n
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28
Linear Support Vector Machine Standard Large-Margin Problem
Standard Large-Margin Hyperplane Problem
max
b,w
1