• 沒有找到結果。

Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
126
0
0

全文

(1)

Machine Learning Techniques ( 機器學習技法)

Lecture 1: Linear Support Vector Machine

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’ (previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/28

(3)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’

(previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

(4)

Linear Support Vector Machine Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundations’

(previous course) + 8 weeks of ‘techniques’ (this course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/28

(5)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(6)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(7)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(8)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(9)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(10)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(11)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(12)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28

(13)

Linear Support Vector Machine Course Introduction

Course Design

from Foundations to Techniques

mixture of philosophical illustrations, key theory, core algorithms, usage in practice, and hopefully jokes

:-)

three major techniques surrounding

feature transforms:

• Embedding Numerous Features: how to exploit and regularize numerous features?

—inspires Support Vector Machine (SVM) model

• Combining Predictive Features: how to construct and blend predictive features?

—inspires Adaptive Boosting (AdaBoost) model

• Distilling Implicit Features: how to identify and learn implicit features?

—inspires Deep Learning model

allows students to

use ML professionally

(14)

Linear Support Vector Machine Course Introduction

Fun Time

Which of the following description of this course is true?

1

the course will be taught in Taiwanese

2

the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek

3

the course will be 16 weeks long

4

the course will focus on three major techniques

Reference Answer: 4

1

no, my Taiwanese is unfortunately not good enough for teaching (yet)

2

no, although what we teach may serve as building blocks

3

no, unless you have also joined the previous course

4

yes,

let’s get started!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/28

(15)

Linear Support Vector Machine Course Introduction

Fun Time

Which of the following description of this course is true?

1

the course will be taught in Taiwanese

2

the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek

3

the course will be 16 weeks long

4

the course will focus on three major techniques

Reference Answer: 4

1

no, my Taiwanese is unfortunately not good enough for teaching (yet)

2

no, although what we teach may serve as building blocks

3

no, unless you have also joined the previous course

4

yes,

let’s get started!

(16)

Linear Support Vector Machine Course Introduction

Roadmap

1

Embedding Numerous Features: Kernel Models

Lecture 1: Linear Support Vector Machine

Course Introduction

Large-Margin Separating Hyperplane Standard Large-Margin Problem Support Vector Machine

Reasons behind Large-Margin Hyperplane

2 Combining Predictive Features: Aggregation Models

3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/28

(17)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Linear Classification Revisited

PLA/pocket

h(x) = sign(s)

s x

x

x x

0

1 2

d

h ( ) x

plausible err = 0/1

(small flipping noise)

minimize

specially

(linear separable)

linear (hyperplane) classifiers:

h(x) = sign(w

T x)

(18)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like! E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/28

(19)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like! E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

(20)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like!

E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/28

(21)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Which Line Is Best?

PLA? depending on randomness

VC bound? whichever you like!

E

out

(w)≤

E in (w)

| {z }

0

+

Ω( H)

| {z }

d

VC

=d +1

You?

rightmost one, possibly :-)

(22)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(23)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(24)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(25)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(26)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(27)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

(28)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Why Rightmost Hyperplane?

informal argument

if (Gaussian-like) noise on future

x

≈ x

n

:

⇐⇒

x n further from hyperplane

⇐⇒

tolerate more noise

⇐⇒

more robust to overfitting

⇐⇒

distance to closest x n

⇐⇒

amount of noise tolerance

⇐⇒

robustness of hyperplane

rightmost one:

more robust

because of

larger distance to closest x n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/28

(29)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

(30)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/28

(31)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fat Hyperplane

• robust

separating hyperplane:

fat

—far from both sides of examples

• robustness

fatness: distance to closest x n

goal: find

fattest

separating hyperplane

(32)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w fatness(w)

subject to

w classifies every (x n

, y

n

)correctly

fatness(w) =

min

n=1,...,N

distance(x

n

, w)

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin separating

hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/28

(33)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to

w classifies every (x n

, y

n

)correctly

margin(w) =

min

n=1,...,N

distance(x

n

, w)

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin

separating

hyperplane

(34)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin separating

hyperplane

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/28

(35)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Large-Margin Separating Hyperplane

max

w margin(w)

subject to every

y n w T x n > 0

margin(w) =

min

n=1,...,N

distance(x

n

, w)

fatness: formally called

margin

• correctness: y n

=sign(w

T x n

)

goal: find

largest-margin

separating

hyperplane

(36)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fun Time

Consider two examples (v, +1) and (−v, −1) where v ∈ R

2

(without padding the v

0

=1). Which of the following hyperplane is the

largest-margin separating

one for the two examples? You are highly encouraged to visualize by considering, for instance,

v = (3, 2).

1

x

1

=0

2

x

2

=0

3

v

1

x

1

+v

2

x

2

=0

4

v

2

x

1

+v

1

x

2

=0

Reference Answer: 3

Here the

largest-margin separating

hyperplane (line) must be a perpendicular bisector of the line segment between

v and

−v. Hence v is a normal vector of the largest-margin line. The result can be extended to the more general case of

v

∈ R

d

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/28

(37)

Linear Support Vector Machine Large-Margin Separating Hyperplane

Fun Time

Consider two examples (v, +1) and (−v, −1) where v ∈ R

2

(without padding the v

0

=1). Which of the following hyperplane is the

largest-margin separating

one for the two examples? You are highly encouraged to visualize by considering, for instance,

v = (3, 2).

1

x

1

=0

2

x

2

=0

3

v

1

x

1

+v

2

x

2

=0

4

v

2

x

1

+v

1

x

2

=0

Reference Answer: 3

Here the

largest-margin separating

hyperplane (line) must be a perpendicular bisector of the line segment between

v and

−v. Hence v is a normal vector of the largest-margin line. The result can be extended to the more general case of

v

∈ R

d

.

(38)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/28

(39)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

(40)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/28

(41)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane: Preliminary

max w margin(w)

subject to every y n w T x n > 0 margin(w) = min

n=1,...,N distance(x n , w)

‘shorten’ x and w

distance

needs

w 0

and

(w 1 , . . . , w d )

differently (to be derived)

b

=

w 0

| w

|

=

 w 1

.. . w d

;

   XX x 0 = X X 1

| x

|

=

 x 1

.. . x d

for this part: h(x) = sign(w

T x

+

b)

(42)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′ w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(43)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(44)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

=

b

,

w T x 00

=

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(45)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(46)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=

0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(47)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(48)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(49)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k

(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(50)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(51)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1 k

w

k|

w T x

+

b

|

(52)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Hyperplane

want: distance(x,

b, w), with hyperplane w T x 0

+

b

=0

consider

x 0

,

x 00

on hyperplane

1 w T x 0

= −

b, w T x 00

= −

b

2 w

⊥ hyperplane:

w T

(x

00

x 0

)

| {z } vector on hyperplane

=0

3

distance = project (x−

x 0

)to

⊥ hyperplane

dist(x, h)

x x′′

w x

distance(x,

b, w) =

w T

k

w

k(x−

x 0

)

=

1

1

k

w

k|

w T x + b

|

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/28

(53)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane: distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

(54)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane: distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

distance(x

n

,

b, w)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28

(55)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane:

distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N

distance(x

n

,

b, w)

(56)

Linear Support Vector Machine Standard Large-Margin Problem

Distance to Separating Hyperplane

distance(x,

b, w) =

1

k

w

k|

w T x + b

|

separating

hyperplane: for every n

y n (w T x n + b) > 0

distance to

separating

hyperplane:

distance(x

n

,

b, w) =

1

k

w

k

y n

(w

T x n

+

b)

max

b,w

margin(b,

w)

subject to every

y n (w T x n + b) > 0

margin(b,

w) =

min

n=1,...,N 1

kwk y n

(w

T x n

+

b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/28

(57)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

min

n=1,...,N y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(58)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

min

n=1,...,N y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(59)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒

margin(b,

w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(60)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(61)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to every y

n

(w

T x n

+b)> 0

min

n=1,...,N y n (w T x n + b) = 1

(62)

Linear Support Vector Machine Standard Large-Margin Problem

Margin of Special Separating Hyperplane

max

b,w

margin(b,

w)

subject to every y

n

(w

T x n

+

b)

> 0 margin(b,

w) =

min

n=1,...,N 1

kwk

y

n

(w

T x n

+

b)

w T x + b

=0 same as 3w

T x + 3b

=0: scaling does not matter

special

scaling: only consider separating (b,

w)

such that

n=1,...,N min y n (w T x n + b) = 1

=⇒ margin(

b, w) = kwk 1

max

b,w 1 kwk

subject to

every y n (w T x n + b) > 0 min

n=1,...,N y n (w T x n + b) = 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/28

(63)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n

original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(64)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(65)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(66)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside,

e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(67)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

>

1.126

for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(68)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(69)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

(70)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

min

b,w

1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1

for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(71)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w

, add

1 2

max

b,w

min

b,w 1 2 w T w

1 kwk

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

(72)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w, add 1 2

max

b,w

min

b,w 1 2 w T w

1 kwk

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/28

(73)

Linear Support Vector Machine Standard Large-Margin Problem

Standard Large-Margin Hyperplane Problem

max

b,w

1

k

w

k subject to

min

n=1,...,N y n (w T x n + b) = 1

necessary constraints: y

n

(w

T x n

+

b)

≥ 1 for all n original constraint:

min n=1,...,N y n (w T x n + b) = 1

want: optimal (b,

w) here (inside)

if optimal (b,

w)

outside, e.g. y

n

(w

T x n

+

b)

> 1.126 for all n

—can scale (b,

w)

to “more optimal” (

1.126 b

,

1.126 w

)

(contradiction!)

final change: max =⇒ min, remove√

w, add 1 2

min

b,w

1 2

w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

參考文獻

Outline

相關文件

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/27.. The Learning Problem What is Machine Learning. The Machine

Feature Exploitation Techniques Error Optimization Techniques Overfitting Elimination Techniques Machine Learning in Practice... Finale Feature

soft-margin k -means OOB error RBF network probabilistic SVM GBDT PCA random forest matrix factorization Gaussian kernel kernel LogReg large-margin prototype quadratic programming

decision tree: a traditional learning model that realizes conditional aggregation.. Decision Tree Decision Tree Hypothesis.. Disclaimers about

decision tree: a traditional learning model that realizes conditional aggregation.. Disclaimers about Decision

• validation set blending: a special any blending model E test (squared): 519.45 =⇒ 456.24. —helped secure the lead in last

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25.. Gradient Boosted Decision Tree Summary of Aggregation Models. Map of

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24.:. Deep Learning Deep

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22.. Matrix Factorization Summary of Extraction Models.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23...

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25.. Noise and Error Noise and Probabilistic Target.

Lecture 1: Large-Margin Linear Classification Large-Margin Separating Hyperplane Standard Large-Margin Problem Support Vector Machine.. Reasons behind

Lecture 4: Soft-Margin SVM Soft-Margin SVM: Primal Soft-Margin SVM: Dual Soft-Margin SVM: Solution Soft-Margin SVM: Selection.. Soft-Margin SVM Soft-Margin

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/26... The

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF