• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
27
0
0

(1)

### Lecture 1: Large-Margin Linear Classification

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

### ( 國立台灣大學資訊工程系)

(2)

Large-Margin Linear Classification

### Reasons behind Large-Margin Hyperplane

(3)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

h(x) = sign(s)

0

1 2

d

### h ( ) x

plausible err = 0/1

minimize

### (linear separable)

linear (hyperplane) classifiers:

h(x) = sign(w

### Tx)

(4)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

### •

PLA? depending on randomness

### •

VC bound? whichever you like!

E

(w)≤ E

(w)

| {z }

+ Ω(H)

| {z }

VC

You?

### rightmost one, possibly :-)

(5)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

### •

if (Gaussian-like) noise on future

≈ x

:

⇐⇒

⇐⇒

⇐⇒

⇐⇒

⇐⇒

rightmost one:

### more robust

(6)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

### • robust

separating hyperplane:

### fat

—far from both sides of examples

goal: find

### fattest

separating hyperplane

(7)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

max

subject to

, y

)correctly

min

distance(x

, w)

max

subject to every

min

distance(x

, w)

fatness: called

=sign(w

)

goal: find

### largest-margin

separating hyperplane

(8)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

max

subject to every

min

distance(x

, w)

fatness: called

=sign(w

)

goal: find

### largest-margin

separating hyperplane

(9)

Large-Margin Linear Classification Large-Margin Separating Hyperplane

### Fun Time

(10)

Large-Margin Linear Classification Standard Large-Margin Problem

needs

and

### (w 1 , . . . , w d )

differently (to be derived)

=

=

## ;

=

### 

next: h(x) = sign(w

+

### b)

(11)

Large-Margin Linear Classification Standard Large-Margin Problem

### Distance to Hyperplane

want: distance(x,

=0 consider

on hyperplane

=−

⊥ hyperplane:

(x

### x0

)

| {z } vector on hyperplane

=0

### 3

distance = project (x−

)to

dist(x, h)

x x′′

w x

distance(x,

k

k(x−

)

=

1

k

k|

### wTx + b

|

(12)

Large-Margin Linear Classification Standard Large-Margin Problem

distance(x,

1

k

k|

|

### • separating

hyperplane: for every n

distance to

hyperplane:

distance(x

,

1

k

k

(w

+

max

margin(w,

subject to every

margin(w,

min

(w

+

### b)

(13)

Large-Margin Linear Classification Standard Large-Margin Problem

max

margin(w,

### b)

subject to every y

(w

+

> 0 margin(w,

min

y

(w

+

(w,

### b)

and (1126w, 1126b): same hyperplane, same margin

### • special

scaling: only consider separating (w,

such that

=⇒ margin(

max

subject to

### n=1,...,N y n (w Txn + b) = 1

(14)

Large-Margin Linear Classification Standard Large-Margin Problem

max

1

k

k = 1

subject to

final changes:

### •

max =⇒ min, remove√

### •

min(. . .) = 1 =⇒ (. . .) ≥ 1

—min

### 12wTw

means not all (. . .) > 1

min

subject to y

(w

+

### b)

≥ 1

(15)

Large-Margin Linear Classification Standard Large-Margin Problem

### Fun Time

(16)

Large-Margin Linear Classification Support Vector Machine

min

subject to y

(w

+

≥ 1

X =

≥ 1 (i)

≥ 1 (ii)

≥ 1 (iii)

≥ 1 (iv)

 (i) & (iii) =⇒

### w 1

≥ +1 (ii) & (iii) =⇒

≤ −1



=⇒

(w

=1,

=−1,

=−1) at

### lower bound

and satisfies (i)− (iv) gSVM(x) = sign(x

− x

− 1):

### SVM? :-)

(17)

Large-Margin Linear Classification Support Vector Machine

### Support Vector Machine (SVM)

optimal solution: (w

=1,

=−1,

=−1) margin(w,

=

=

x1−x2−1=0 0.707

### •

examples on boundary:

other examples:

### •

call boundary examples

machine (SVM):

learn

(with help of

### support vectors

)

(18)

Large-Margin Linear Classification Support Vector Machine

min

subject to y

(w

+

≥ 1

luckily:

### • linear constraints of (b, w)

(QP):

‘easy’ optimization problem

(19)

Large-Margin Linear Classification Support Vector Machine

optimal (b,

? min

subject to y

(w

+

### b)

≥ 1, for n = 1, 2, . . . , N

optimal

← QP(

min

+

subject to

### r m

,

for m = 1, 2, . . . , M

objective function:





;

;

constraints:

;

### r n = 1;

M = N

SVM with general QP solver:

easy

### if you’ve read the manual :-)

(20)

Large-Margin Linear Classification Support Vector Machine

;

;

;

← QP(

return

&

as

SVM

want

= Φ(x

### n

)—remember? :-)

(21)

Large-Margin Linear Classification Support Vector Machine

### Fun Time

(22)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

min

subject to y

(w

+

### b)

≥ 1

minimize constraint regularization E

≤ C

SVM

E

### in

=0 [and more]

SVM (large-margin hyperplane):

### ‘weight-decay regularization’ within Ein = 0

(23)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

### Large-Margin Restricts Dichotomies

consider ‘large-margin algorithm’A

:

either

, or 0 otherwise

### A 1.126 : more strict than SVM = ⇒ no-shatter some 3 inputs

ρ

fewer dichotomies =⇒ smaller ‘VC dim.’ =⇒

### better generalization

(24)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

### VC Dimension of Large-Margin Algorithm

fewer dichotomies =⇒ smaller

VC

VC

VC

### •

ρ = 0: just perceptrons (dVC =3)

ρ >

### 2

: no shatter any 3 inputs (dVC< 3)

—some inputs must be of

### distance ≤ √ 3

generally, whenX in

dVC(A

)≤ min



, d



+1≤ d + 1

| {z }

VC

### (perceptrons)

(25)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

### • not many

good, for dVC and generalization

### • sophisticated

good, for possibly better E

### in

a new possibility: non-linear SVM

### boundary sophisticated

(26)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

### Fun Time

(27)

Large-Margin Linear Classification Reasons behind Large-Margin Hyperplane

### fewer dichotomies and better generalization

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning.. 3 Distributed clustering