• 沒有找到結果。

Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
126
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Techniques ( 機器學習技法)

Lecture 1: Linear Support Vector Machine

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’ (previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/28

(3)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’

(previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

(4)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’

(previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/28

(5)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(6)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(7)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(8)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(9)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(10)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(11)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(12)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(13)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(14)

Linear Support Vector Machine Course Introduction

Fun Time

Which of the following description of this course is true?

1

the course will be taught in Taiwanese

2

the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek

3

the course will be 16 weeks long

4

the course will focus on three major techniques

Reference Answer: 4

1

no, my Taiwanese is unfortunately not good enough for teaching (yet)

2

no, although what we teach may serve as building blocks

3

no, unless you have also joined the previous course

4

yes,

let’s get started!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/28

(15)

Linear Support Vector Machine Course Introduction

Fun Time

Which of the following description of this course is true?

1

the course will be taught in Taiwanese

2

the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek

3

the course will be 16 weeks long

4

the course will focus on three major techniques

Reference Answer: 4

1

no, my Taiwanese is unfortunately not good enough for teaching (yet)

2

no, although what we teach may serve as building blocks

3

no, unless you have also joined the previous course

4

yes,

let’s get started!

(16)

Linear Support Vector Machine Course Introduction

Roadmap

1

Embedding Numerous Features: Kernel Models

Lecture 1: Linear Support Vector Machine

Course Introduction

Large-Margin Separating Hyperplane Standard Large-Margin Problem Support Vector Machine

Reasons behind Large-Margin Hyperplane

2 Combining Predictive Features: Aggregation Models

3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/28

(17)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Linear Classification Revisited

PLA/pocket

h(x) = sign(s)

s x

x

x x

0

1 2

d

h ( ) x

plausible err = 0/1

(small flipping noise)

minimize

specially

(linear separable)

linear (hyperplane) classifiers:

h(x) = sign(w

T x)

(18)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like! E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/28

(19)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like! E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

(20)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like!

E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/28

(21)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like!

E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

(22)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(23)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(24)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(25)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(26)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(27)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(28)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(29)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

(30)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/28

(31)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

(32)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w fatness(w)

subject to

w classifies every (x n

, y

n

)correctly

fatness(w) =

min

n=1,...,N

distance(x

n

, w)

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin separating

hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/28

(33)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to

w classifies every (x n

, y

n

)correctly

margin(w) =

min

n=1,...,N

distance(x

n

, w)

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin

separating

hyperplane

(34)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin separating

hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/28

(35)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin

separating

hyperplane

(36)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fun Time

Consider two examples (v, +1) and (−v, −1) where v ∈ R

2

(without padding the v

0

=1). Which of the following hyperplane is the

largest-margin separating

one for the two examples? You are highly encouraged to visualize by considering, for instance,

v = (3, 2).

1

x

1

=0

2

x

2

=0

3

v

1

x

1

+v

2

x

2

=0

4

v

2

x

1

+v

1

x

2

=0

Reference Answer: 3

Here the

largest-margin separating

hyperplane (line) must be a perpendicular bisector of the line segment between

v and

−v. Hence v is a normal vector of the largest-margin line. The result can be extended to the more general case of

v

∈ R

d

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/28

(37)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fun Time

Consider two examples (v, +1) and (−v, −1) where v ∈ R

2

(without padding the v

0

=1). Which of the following hyperplane is the

largest-margin separating

one for the two examples? You are highly encouraged to visualize by considering, for instance,

v = (3, 2).

1

x

1

=0

2

x

2

=0

3

v

1

x

1

+v

2

x

2

=0

4

v

2

x

1

+v

1

x

2

=0

Reference Answer: 3

Here the

largest-margin separating

hyperplane (line) must be a perpendicular bisector of the line segment between

v and

−v. Hence v is a normal vector of the largest-margin line. The result can be extended to the more general case of

v

∈ R

d

.

(38)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/28

(39)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

(40)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/28

(41)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

(42)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′ w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(43)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(44)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(45)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(46)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(47)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(48)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(49)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(50)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(51)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

(52)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1

k

w

k|

w T x + b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(53)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane: distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

(54)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane: distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

distance(x

n

,

b, w)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28

(55)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane:

distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

distance(x

n

,

b, w)

(56)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane:

distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N 1

kwk y n

(w

T x n

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28

(57)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

min

n=1,...,N y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(58)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

min

n=1,...,N y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(59)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(60)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(61)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(62)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to

every y n (w T x n + b) > 0 min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(63)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n

original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(64)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(65)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(66)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside,

e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(67)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(68)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(69)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(70)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(71)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

max

b,w

min

b,w 1 2 w T w

1 kwk

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

(72)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w, add 1 2

max

b,w

min

b,w 1 2 w T w

1 kwk

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(73)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w, add 1 2

min

b,w

1 2

w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

參考文獻

相關文件

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24.:. Deep Learning Deep