• 沒有找到結果。

Linear Support Vector Machine Support Vector Machine

Solving a Particular Standard Problem

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

X =

0 0 2 2 2 0 3 0

y =

−1

−1 +1 +1

− b

≥ 1 (i)

−2 w 1 − 2 w 2 − b

≥ 1 (ii)

2w 1

+ 0w 2

+ b

≥ 1 (iii)

3w 1

+ 0w 2

+ b

≥ 1 (iv)

 (i) & (iii) =⇒

w 1

≥ +1 (ii) & (iii) =⇒

w 2

≤ −1



=⇒

1 2 w T w

1

(w

1

=1,

w 2

=−1,

b

=−1) at

lower bound

and satisfies (i)− (iv) gSVM(x) = sign(x

1

− x

2

− 1):

Linear Support Vector Machine Support Vector Machine

Solving a Particular Standard Problem

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

X =

0 0 2 2 2 0 3 0

y =

−1

−1 +1 +1

− b

≥ 1 (i)

−2 w 1 − 2 w 2 − b

≥ 1 (ii)

2w 1

+ 0w 2

+ b

≥ 1 (iii)

3w 1

+ 0w 2

+ b

≥ 1 (iv)

 (i) & (iii) =⇒

w 1

≥ +1 (ii) & (iii) =⇒

w 2

≤ −1



=⇒

1 2 w T w

1

(w

1

=1,

w 2

=−1,

b

=−1) at

lower bound

and satisfies (i)− (iv) gSVM(x) = sign(x

1

− x

2

− 1):

SVM? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/28

Linear Support Vector Machine Support Vector Machine

Support Vector Machine (SVM)

optimal solution: (w

1

=1,

w 2

=−1,

b

=−1) margin(b,

w)

=

kwk 1

=

1

2

x1−x2−1=0 0.707

examples on boundary:

‘locates’ fattest hyperplane

other examples:

not needed

call boundary example

support vector (candidate)

support vector

machine (SVM): learn

fattest hyperplanes

(with help of

support vectors

)

Linear Support Vector Machine Support Vector Machine

Support Vector Machine (SVM)

optimal solution: (w

1

=1,

w 2

=−1,

b

=−1) margin(b,

w)

=

kwk 1

=

1

2

x1−x2−1=0 0.707

examples on boundary:

‘locates’ fattest hyperplane

other examples:

not needed

call boundary example

support vector (candidate)

support vector

machine (SVM): learn

fattest hyperplanes

(with help of

support vectors

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/28

Linear Support Vector Machine Support Vector Machine

Support Vector Machine (SVM)

optimal solution: (w

1

=1,

w 2

=−1,

b

=−1) margin(b,

w)

=

kwk 1

=

1

2

x1−x2−1=0 0.707

examples on boundary:

‘locates’ fattest hyperplane

other examples:

not needed

call boundary example

support vector (candidate)

support vector

machine (SVM): learn

fattest hyperplanes

(with help of

support vectors

)

Linear Support Vector Machine Support Vector Machine

Support Vector Machine (SVM)

optimal solution: (w

1

=1,

w 2

=−1,

b

=−1) margin(b,

w)

=

kwk 1

=

1

2

x1−x2−1=0 0.707

examples on boundary:

‘locates’ fattest hyperplane

other examples:

not needed

call boundary example

support vector (candidate)

support vector

machine (SVM):

learn

fattest hyperplanes

(with help of

support vectors

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/28

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/28

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/28

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Linear Support Vector Machine Support Vector Machine

Solving General SVM

min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1 for all n

not easy manually, of course :-)

gradient descent?

not easy with constraints

luckily:

• (convex) quadratic objective function of (b, w)

• linear constraints of (b, w)

—quadratic programming

quadratic programming

(QP):

‘easy’ optimization problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/28

Linear Support Vector Machine Support Vector Machine

Quadratic Programming

optimal (b,

w) =

? min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1, for n = 1, 2, . . . , N

optimal

u

← QP(

Q, p, A, c)

min

u 1

2 u T Qu

+

p T u

subject to

a T m u

c m

,

for m = 1, 2, . . . , M

objective function:

u =



b w



;

Q =

 0 0 T d 0 d I d



;

p =

0 d +1

constraints:

a T n =

y n

 1 x T n 

;

c n =

1

;M =

N

SVM with general QP solver: easy

if you’ve read the manual :-)

Linear Support Vector Machine Support Vector Machine

Quadratic Programming

optimal (b,

w) =

? min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1, for n = 1, 2, . . . , N

optimal

u

← QP(

Q, p, A, c)

min

u 1

2 u T Qu

+

p T u

subject to

a T m u

c m

,

for m = 1, 2, . . . , M

objective function:

u =



b w



;

Q =

 0 0 T d 0 d I d



;

p =

0 d +1

constraints:

a T n =

y n

 1 x T n 

;

c n =

1

;M =

N

SVM with general QP solver: easy

if you’ve read the manual :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/28

Linear Support Vector Machine Support Vector Machine

Quadratic Programming

optimal (b,

w) =

? min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1, for n = 1, 2, . . . , N

optimal

u

← QP(

Q, p, A, c)

min

u 1

2 u T Qu

+

p T u

subject to

a T m u

c m

,

for m = 1, 2, . . . , M

objective function:

u =



b w



;

Q =

 0 0 T d 0 d I d



;

p =

0 d +1

constraints:

a T n =

y n

 1 x T n 

;

c n =

1

;M =

N

SVM with general QP solver: easy

if you’ve read the manual :-)

Linear Support Vector Machine Support Vector Machine

Quadratic Programming

optimal (b,

w) =

? min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1, for n = 1, 2, . . . , N

optimal

u

← QP(

Q, p, A, c)

min

u 1

2 u T Qu

+

p T u

subject to

a T m u

c m

,

for m = 1, 2, . . . , M

objective function:

u =



b w



;

Q =

 0 0 T d 0 d I d



;

p = 0 d +1

constraints:

a T n = y n

 1 x T n 

;

c n = 1;

M = N

SVM with general QP solver: easy

if you’ve read the manual :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/28

Linear Support Vector Machine Support Vector Machine

Quadratic Programming

optimal (b,

w) =

? min

b,w 1 2 w T w

subject to y

n

(w

T x n

+

b)

≥ 1, for n = 1, 2, . . . , N

optimal

u

← QP(

Q, p, A, c)

min

u 1

2 u T Qu

+

p T u

subject to

a T m u

c m

,

for m = 1, 2, . . . , M

objective function:

u =



b w



;

Q =

 0 0 T d 0 d I d



;

p = 0 d +1

constraints:

a T n = y n

 1 x T n 

;

c n = 1;

M = N

SVM with general QP solver:

easy

if you’ve read the manual :-)

Linear Support Vector Machine Support Vector Machine

SVM with QP Solver

Linear Hard-Margin SVM Algorithm

1 Q =

 0 0 T d 0 d I d



;

p = 0 d +1

;

a T n = y n 

1 x T n 

;

c n = 1

2

 b w



← QP(

Q, p, A, c)

3

return

b

&

w

as

g

SVM

hard-margin: nothing violate ‘fat boundary’

linear: x n

want

non-linear? z n

= Φ(x

n

)—remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/28

Linear Support Vector Machine Support Vector Machine

SVM with QP Solver

Linear Hard-Margin SVM Algorithm

1 Q =

 0 0 T d 0 d I d



;

p = 0 d +1

;

a T n = y n 

1 x T n 

;

c n = 1

2

 b w



← QP(

Q, p, A, c)

3

return

b

&

w

as

g

SVM

hard-margin: nothing violate ‘fat boundary’

linear: x n

want

non-linear?

z n

= Φ(x

n

)—remember? :-)

Linear Support Vector Machine Support Vector Machine

SVM with QP Solver

Linear Hard-Margin SVM Algorithm

1 Q =

 0 0 T d 0 d I d



;

p = 0 d +1

;

a T n = y n 

1 x T n 

;

c n = 1

2

 b w



← QP(

Q, p, A, c)

3

return

b

&

w

as

g

SVM

hard-margin: nothing violate ‘fat boundary’

linear: x n

want

non-linear? z n

= Φ(x

n

)—remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/28

Linear Support Vector Machine Support Vector Machine

SVM with QP Solver

Linear Hard-Margin SVM Algorithm

1 Q =

 0 0 T d 0 d I d



;

p = 0 d +1

;

a T n = y n 

1 x T n 

;

c n = 1

2

 b w



← QP(

Q, p, A, c)

3

return

b

&

w

as

g

SVM

hard-margin: nothing violate ‘fat boundary’

linear: x n

want

non-linear?

z n

= Φ(x

n

)—remember? :-)

Linear Support Vector Machine Support Vector Machine

SVM with QP Solver

Linear Hard-Margin SVM Algorithm

1 Q =

 0 0 T d 0 d I d



;

p = 0 d +1

;

a T n = y n 

1 x T n 

;

c n = 1

2

 b w



← QP(

Q, p, A, c)

3

return

b

&

w

as

g

SVM

hard-margin: nothing violate ‘fat boundary’

linear: x n

want

non-linear?

z n

= Φ(x

n

)—remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/28

Linear Support Vector Machine Support Vector Machine

Fun Time

Consider two negative examples with

x 1

= (0, 0) and x

2

= (2, 2); two positive examples with

x 3

= (2, 0) and x

4

= (3, 0), as shown on page 17 of the slides. Define

u, Q, p, c n

as those listed on page 20 of the slides. What are

a T n

that need to be fed into the QP solver?

1 a

T1

= [−1, 0, 0]

,

a

T2

= [−1, 2, 2]

,

a

T3

= [−1, 2, 0]

,

a

T4

= [−1, 3, 0]

2 a

T1

= [1, 0, 0]

,

a

T2

= [1, −2, −2]

,

a

T3

= [−1, 2, 0]

,

a

T4

= [−1, 3, 0]

3 a

T1

= [1, 0, 0]

,

a

T2

= [1, 2, 2]

,

a

T3

= [1, 2, 0]

,

a

T4

= [1, 3, 0]

4 a

T1

= [−1, 0, 0]

,

a

T2

= [−1, −2, −2]

,

a

T3

= [1, 2, 0]

,

a

T4

= [1, 3, 0]

Reference Answer: 4

We need

a T n

=y

n

 1

x T n

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/28

Linear Support Vector Machine Support Vector Machine

Fun Time

Consider two negative examples with

x 1

= (0, 0) and x

2

= (2, 2); two positive examples with

x 3

= (2, 0) and x

4

= (3, 0), as shown on page 17 of the slides. Define

u, Q, p, c n

as those listed on page 20 of the slides. What are

a T n

that need to be fed into the QP solver?

1 a

T1

= [−1, 0, 0]

,

a

T2

= [−1, 2, 2]

,

a

T3

= [−1, 2, 0]

,

a

T4

= [−1, 3, 0]

2 a

T1

= [1, 0, 0]

,

a

T2

= [1, −2, −2]

,

a

T3

= [−1, 2, 0]

,

a

T4

= [−1, 3, 0]

3 a

T1

= [1, 0, 0]

,

a

T2

= [1, 2, 2]

,

a

T3

= [1, 2, 0]

,

a

T4

= [1, 3, 0]

4 a

T1

= [−1, 0, 0]

,

a

T2

= [−1, −2, −2]

,

a

T3

= [1, 2, 0]

,

a

T4

= [1, 3, 0]

Reference Answer: 4

We need

a T n

=y

n

 1

x T n

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Why Large-Margin Hyperplane?

min

b,w 1 2 w T w

subject to y

n

(w

T z n

+

b)

≥ 1 for all n

minimize constraint regularization E

in w T w

≤ C

SVM

w T w

E

in

=0 [and more]

SVM (large-margin hyperplane):

‘weight-decay regularization’ within E in = 0

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Why Large-Margin Hyperplane?

min

b,w 1 2 w T w

subject to y

n

(w

T z n

+

b)

≥ 1 for all n

minimize constraint regularization E

in w T w

≤ C

SVM

w T w

E

in

=0 [and more]

SVM (large-margin hyperplane):

‘weight-decay regularization’ within E in = 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 23/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Why Large-Margin Hyperplane?

min

b,w 1 2 w T w

subject to y

n

(w

T z n

+

b)

≥ 1 for all n

minimize constraint regularization E

in w T w

≤ C

SVM

w T w

E

in

=0 [and more]

SVM (large-margin hyperplane):

‘weight-decay regularization’ within E in = 0

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Large-Margin Restricts Dichotomies

consider ‘large-margin algorithm’A

ρ

:

either

returns g with margin(g) ≥ ρ (if exists)

, or 0 otherwise

A 0 : like PLA = ⇒ shatter ‘general’ 3 inputs

A 1.126 : more strict than SVM = ⇒ cannot shatter any 3 inputs

ρ

fewer dichotomies =⇒ smaller ‘VC dim.’ =⇒

better generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 24/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Large-Margin Restricts Dichotomies

consider ‘large-margin algorithm’A

ρ

:

either

returns g with margin(g) ≥ ρ (if exists)

, or 0 otherwise

A 0 : like PLA = ⇒ shatter ‘general’ 3 inputs

A 1.126 : more strict than SVM = ⇒ cannot shatter any 3 inputs

ρ

fewer dichotomies =⇒ smaller ‘VC dim.’ =⇒

better generalization

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Large-Margin Restricts Dichotomies

consider ‘large-margin algorithm’A

ρ

:

either

returns g with margin(g) ≥ ρ (if exists)

, or 0 otherwise

A 0 : like PLA = ⇒ shatter ‘general’ 3 inputs

A 1.126 : more strict than SVM = ⇒ cannot shatter any 3 inputs

ρ

fewer dichotomies =⇒ smaller ‘VC dim.’ =⇒

better generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 24/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Large-Margin Restricts Dichotomies

consider ‘large-margin algorithm’A

ρ

:

either

returns g with margin(g) ≥ ρ (if exists)

, or 0 otherwise

A 0 : like PLA = ⇒ shatter ‘general’ 3 inputs

A 1.126 : more strict than SVM = ⇒ cannot shatter any 3 inputs

ρ

fewer dichotomies =⇒ smaller ‘VC dim.’ =⇒

better generalization

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC=3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 25/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC=3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 25/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC=3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 25/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC =3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC =3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 25/28

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

‘VC dim.’

considers d

VC

( A ρ ) [data-dependent, need more than VC]

instead of

d

VC

( H) [data-independent, covered by VC]

d

VC

( A ρ ) when X = unit circle in R 2

ρ = 0: just perceptrons (dVC =3)

ρ >

√ 3

2

: cannot shatter any 3 inputs (dVC< 3)

—some inputs must be of

distance ≤ √ 3

generally, whenX in

radius-R hyperball:

dVC(A

ρ

)≤ min



R 2 ρ 2

, d



+1≤ d + 1

| {z }

d

VC

(perceptrons)

Linear Support Vector Machine Reasons behind Large-Margin Hyperplane

Benefits of Large-Margin Hyperplanes

large-margin

hyperplanes hyperplanes hyperplanes + feature transform Φ

# even fewer not many many

boundary simple simple sophisticated

not many

good, for dVC and generalization

sophisticated

good, for possibly better E

in

a new possibility: non-linear SVM

large-margin

hyperplanes

相關文件