• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
28
0
0

(1)

### Lecture 2: Dual Support Vector Machine

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

### National Taiwan University ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23

(2)

Dual Support Vector Machine

### 1

Embedding Numerous Features: Kernel Models

SVM: more

### robust

and solvable with

### 3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/23

(3)

Dual Support Vector Machine Motivation of Dual SVM

min

s. t. y

(w

n

### )

+b)≥ 1, for n = 1, 2, . . . , N

;

;

;

 b



← QP(

### 3

return b∈ R & w ∈

### R d˜

with gSVM(x) = sign(w

+b)

demanded:

### not many

(large-margin), but

### sophisticated

boundary (feature transform)

QP with

### d + 1 ˜

variables and N constraints

—challenging if

large,

goal: SVM

### without dependence on ˜ d

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(4)

Dual Support Vector Machine Motivation of Dual SVM

(convex) QP of

N constraints

(convex) QP of

constraints

### •

introduce some necessary math

to help

### • ‘claim’ some results

if details unnecessary

—like how we ‘claimed’ Hoeffding

‘Equivalent’ SVM: based on some

### dual problem

of Original SVM

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(5)

Dual Support Vector Machine Motivation of Dual SVM

min

E

(w) s.t.

## ⇔ Regularization by Minimizing E aug

min

E

(w) = E

(w) +

N

### •

C equivalent to some

≥ 0 by checking

∇E

(w) +

### •

regularization: view

as

solve ‘easily’

dual SVM: view

how many

### N—one per constraint

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

(6)

Dual Support Vector Machine Motivation of Dual SVM

min

s.t.

### y n (w Tz n + b) ≥ 1

, for n = 1, 2, . . . , N

with

,

(b, w,

| {z }

+

X

(1

| {z }

)

SVM≡ min



maxn

(b, w,



=min



if

;

if



### •

any ‘violating’ (b, w): max

n



+P

(some positive)

### •

any ‘feasible’ (b, w): max

n



+P

### n α n

(all non-positive)



=

constraints now

### hidden in max

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

(7)

Dual Support Vector Machine Motivation of Dual SVM

### Fun Time

Consider two transformed examples (z

, +1) and (z

,−1) with z

=

and

### z2

=−z. What is the Lagrange function

(b, w,

### α)

of hard-margin SVM?

(1 +

(1 +

(1− w

− b) +

(1− w

(1 +

(1 +

− b)

(1− w

− b) +

(1− w

− b)

By definition,

(b, w,

1

2

(1− y

(w

+b)) +

(1− y

(w

+b)) with (z

, y

### 1

) = (z, +1) and (z

, y

### 2

) = (−z, −1).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23

(8)

Dual Support Vector Machine Motivation of Dual SVM

### Fun Time

Consider two transformed examples (z

, +1) and (z

,−1) with z

=

and

### z2

=−z. What is the Lagrange function

(b, w,

### α)

of hard-margin SVM?

(1 +

(1 +

(1− w

− b) +

(1− w

(1 +

(1 +

− b)

(1− w

− b) +

(1− w

− b)

By definition,

(b, w,

1

2

(1− y

(w

+b)) +

(1− y

(w

+b)) with (z

, y

### 1

) = (z, +1) and (z

, y

### 2

) = (−z, −1).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23

(9)

Dual Support Vector Machine Lagrange Dual SVM

for

, min

 max

n

L(b, w,



≥ min

L(b, w,

) because

for

, min

 max

n

L(b, w,



≥ max

n0

min

L(b, w,

)

| {z }

because

is one of

### any

Lagrange dual problem:

on

### lower bound of original problem

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

(10)

Dual Support Vector Machine Lagrange Dual SVM

min



max

n

L(



| {z }

max

n



min

L(



| {z }

### • linear constraints

—called constraint qualification

exists

### primal-dual

optimal solution (b,

for

### both sides

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

(11)

Dual Support Vector Machine Lagrange Dual SVM

max

n

 min

+

X

(1− y

(w

+

| {z }

### • inner problem

‘unconstrained’, at optimal:

=0 = −P

y

### •

no loss of optimality if solving with constraint

but wait,

max

n

n

n

min

+

X

(1− y

(w

))−

·

### b

!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(12)

Dual Support Vector Machine Lagrange Dual SVM

max

n

n

n

min

+

X

(1− y

(w

))

!

### • inner problem

‘unconstrained’, at optimal:

i =0 =

−P

y

z

### •

no loss of optimality if solving with constraint

but wait!

max

n

n

n

n

n

n

+

X

!

⇐⇒ max

n

n

n

n

n

n

k

X

y

k

+

X

### α n

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(13)

Dual Support Vector Machine Lagrange Dual SVM

max

n

n

n

n

n

n

k

X

y

k

+

X

if

optimal (b,

(w

+

≥ 1

≥ 0

optimal: P y

=0;

=P

y

### • primal-inner

optimal (at optimal all ‘Lagrange terms’ disappear):

(1− y

(w

+

0

—called

### Karush-Kuhn-Tucker (KKT) conditions, necessary for

optimality [& sufficient here]

will use

to ‘solve’ (b,

from optimal

### α

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23

(14)

Dual Support Vector Machine Lagrange Dual SVM

### Fun Time

For a single variable w , consider minimizing

w

### 2

subject to two linear constraints w ≥ 1 and w ≤ 3. We know that the Lagrange function L(w, α) =

w

(1− w) + α

### 2

(w− 3). Which of the following equations that contain α are among the KKT conditions of the optimization problem?

α

≥ 0 and α

≥ 0

w =α

− α

α

(1− w) = 0 and α

(w− 3) = 0.

### 4

all of the above

1 contains dual-feasible constraints; 2 contains dual-inner-optimal constraints; 3 contains primal-inner-optimal constraints.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/23

(15)

Dual Support Vector Machine Lagrange Dual SVM

### Fun Time

For a single variable w , consider minimizing

w

### 2

subject to two linear constraints w ≥ 1 and w ≤ 3. We know that the Lagrange function L(w, α) =

w

(1− w) + α

### 2

(w− 3). Which of the following equations that contain α are among the KKT conditions of the optimization problem?

α

≥ 0 and α

≥ 0

w =α

− α

α

(1− w) = 0 and α

(w− 3) = 0.

### 4

all of the above

1 contains dual-feasible constraints;

2 contains dual-inner-optimal constraints;

3 contains primal-inner-optimal constraints.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/23

(16)

Dual Support Vector Machine Solving Dual SVM

max

n

n

n

n

n

n −

k

X

y

k

+

X

### α n

standard hard-margin SVM

min

1 2

X

X

y

y

X

subject to

X

y

=0;

### α n

≥ 0, for n = 1, 2, . . . , N

(convex) QP of

&

### N + 1

constraints, as promised how to solve?

### yeah, we know QP! :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/23

(17)

Dual Support Vector Machine Solving Dual SVM

optimal

=? min

1 2

N

n=1 N

m=1

n

m

n

m

Tn

m

X

subject to

X

y

=0;

### α n

≥ 0,

for n = 1, 2, . . . , N

optimal

← QP(

min

+

subject to

### c i

,

for i = 1, 2, . . .

### • c ≥ = 0, c ≤ = 0; c n = 0

note: many solvers treat

### bound (an) constraints specially for numerical stability

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

(18)

Dual Support Vector Machine Solving Dual SVM

optimal

← QP(

D,

min

D

+

subject to

, often

if N = 30, 000,

### dense Q

D(N by N symmetric) takes > 3G RAM

need

for

D

### • utilizing special constraints properly to scale up to large N

usually better to use

### special solver

in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

(19)

Dual Support Vector Machine Solving Dual SVM

if

optimal (b,

(w

+

≥ 1

≥ 0

optimal: P y

=0;

=P

y

### • primal-inner

optimal (at optimal all ‘Lagrange terms’ disappear):

(1− y

(w

+

### b)) =

0 (complementary slackness)

optimal

=⇒ optimal

optimal

=⇒ optimal

&

equality from

if one

> 0⇒

=y

### α n

> 0 ⇒ on fat boundary (SV!)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

(20)

Dual Support Vector Machine Solving Dual SVM

### Fun Time

Consider two transformed examples (z

, +1) and (z

,−1) with z

=

and

### z2

=−z. After solving the dual problem of hard-margin SVM, assume that the optimalα

andα

### 2

are both strictly positive. What is the optimal b?

−1

0

1

### 4

not certain with the descriptions above

With the descriptions, at the optimal (b, w), b = +1− w

−1 + w

That is,

### wTz = 1 and b = 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23

(21)

Dual Support Vector Machine Solving Dual SVM

### Fun Time

Consider two transformed examples (z

, +1) and (z

,−1) with z

=

and

### z2

=−z. After solving the dual problem of hard-margin SVM, assume that the optimalα

andα

### 2

are both strictly positive. What is the optimal b?

−1

0

1

### 4

not certain with the descriptions above

With the descriptions, at the optimal (b, w), b = +1− w

−1 + w

That is,

### wTz = 1 and b = 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23

(22)

Dual Support Vector Machine Messages behind Dual SVM

on boundary:

others:

examples with

> 0: on boundary

call

> 0 examples

### • SV (positive α n )

⊆ SV candidates (on boundary)

x1−x2−1=0 0.707

only

### SV

needed to compute

=

P

y

= P

y

only

### SV

needed to compute

=y

with any

by identifying

with

### dual

optimal solution

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

(23)

Dual Support Vector Machine Messages behind Dual SVM

SVM =

X

(y

)

from

PLA =

X

(y

)

by

### w

=linear combination of y

### •

also true for GD/SGD-based LogReg/LinReg when

=

call

by

### SVs only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23

(24)

Dual Support Vector Machine Messages behind Dual SVM

min

sub. to y

(w

+

### b)

≥ 1, for n = 1, 2, . . . , N

N constraints

—suitable when

### •

physical meaning: locate

(b,

min

QD

− 1

s.t.

=0;

### α n

≥ 0 for n = 1, . . . , N

### • N variables,

N + 1 simple constraints

—suitable when

### •

physical meaning: locate

(z

, y

)& their

### α n

both eventually result in optimal (b,

### w)

for fattest hyperplane

SVM(x) = sign(w

+

### b)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

(25)

Dual Support Vector Machine Messages behind Dual SVM

goal: SVM

min

D

− 1

subject to

=0;

### α n

≥ 0, for n = 1, 2, . . . , N

### • q n,m = y n y mzTnzm

: inner product in R

—O(˜

### d )

via naïve computation!

no dependence

### avoiding naïve computation (next lecture :-))

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/23

(26)

Dual Support Vector Machine Messages behind Dual SVM

### Fun Time

Consider applying dual hard-margin SVM on N = 5566 examples and getting 1126 SVs. Which of the following can be the number of

examples that are on the fat boundary—that is, SV candidates?

0

1024

1234

### 4

9999

Because SVs are always on the fat boundary,

# SVs≤ # SV candidates ≤ N.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/23

(27)

Dual Support Vector Machine Messages behind Dual SVM

### Fun Time

Consider applying dual hard-margin SVM on N = 5566 examples and getting 1126 SVs. Which of the following can be the number of

examples that are on the fat boundary—that is, SV candidates?

0

1024

1234

### 4

9999

Because SVs are always on the fat boundary,

# SVs≤ # SV candidates ≤ N.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/23

(28)

Dual Support Vector Machine Messages behind Dual SVM

### 1

Embedding Numerous Features: Kernel Models

d˜

### 3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 23/23

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning.. 3 Distributed clustering

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning?. 3 Distributed clustering

Keywords Support vector machine · ε-insensitive loss function · ε-smooth support vector regression · Smoothing Newton algorithm..