• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
116
0
0

(1)

## ( 機器學習技法)

### Lecture 5: Kernel Logistic Regression

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### ( 國立台灣大學資訊工程系)

(2)

Kernel Logistic Regression

### 1

Embedding Numerous Features: Kernel Models

allow some

### margin violations ξn

while penalizing them by C; equivalent to

by C

### 2 Combining Predictive Features: Aggregation Models

(3)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Wrap-Up

### soft-margin preferred in practice;

linear: LIBLINEAR; non-linear: LIBSVM

(4)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Wrap-Up

### soft-margin preferred in practice;

linear: LIBLINEAR; non-linear: LIBSVM

(5)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Wrap-Up

### soft-margin preferred in practice;

linear: LIBLINEAR; non-linear: LIBSVM

(6)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Wrap-Up

### soft-margin preferred in practice;

linear: LIBLINEAR; non-linear: LIBSVM

(7)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Wrap-Up

### soft-margin preferred in practice;

linear: LIBLINEAR; non-linear: LIBSVM

(8)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max

1 − y

(w

+b)

,

0



(x

,y

### n

)violating margin:

=

1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=

0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(9)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max

1 − y

(w

+b)

,

0



(x

,y

### n

)violating margin:

=

1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=

0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(10)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max

1 − y

(w

+b)

,

0



(x

,y

### n

)violating margin:

=

1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=

0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(11)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max 1 − y

(w

+b),

0



(x

,y

### n

)violating margin:

=1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=

0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(12)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max 1 − y

(w

+b),

0



(x

,y

### n

)violating margin:

=1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=

0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(13)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max 1 − y

(w

+b), 0

(x

,y

### n

)violating margin:

=1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=0

‘unconstrained’ form of soft-margin SVM:

n

T

n

### + b), 0 

(14)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Slack Variables ξ n

### •

record ‘margin violation’ by

penalize with

T

n

n

n

Hi Hi

violation

on any (b,

=

=max 1 − y

(w

+b), 0

(x

,y

### n

)violating margin:

=1 − y

(w

+b)

(x

,y

### n

)not violating margin:

### ξ n

=0

‘unconstrained’ form of soft-margin SVM:

(15)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

not QP,

### • max(·, 0) not differentiable, harder to

solve

(16)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

not QP,

### • max(·, 0) not differentiable, harder to

solve

(17)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

not QP,

### • max(·, 0) not differentiable, harder to

solve

(18)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

not QP,

### • max(·, 0) not differentiable, harder to

solve

(19)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

not QP,

### • max(·, 0) not differentiable, harder to

solve

(20)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Unconstrained Form

T

n

T

n

min 1

2

+

X

min λ

N

1 N

Xerr with

### why not solve this? :-)

(21)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization

viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(22)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization

viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(23)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization

viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(24)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(25)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(26)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization

viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(27)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## SVM as Regularized Model

minimize constraint regularization by constraint E

hard-margin SVM

E

### in

=0 [and more]

L2 regularization

soft-margin SVM

E

### in

large margin ⇐⇒ fewer hyperplanes ⇐⇒ L2 regularization of short

### w

soft margin ⇐⇒ specialerrc

larger

or

### C

⇐⇒ smaller λ ⇐⇒ less regularization viewing SVM as regularized model:

allows

### extending/connecting

to other learning models

(28)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Fun Time

When viewing soft-margin SVM as regularized model, a larger C corresponds to

### 1

a larger λ, that is, stronger regularization

### 2

a smaller λ, that is, stronger regularization

### 3

a larger λ, that is, weaker regularization

### 4

a smaller λ, that is, weaker regularization

Comparing the formulations on page 4 of the slides, we see that C corresponds to

### 2λ1

. So larger C corresponds to smaller λ, which surely means weaker regularization.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/20

(29)

Kernel Logistic Regression Soft-Margin SVM as Regularized Model

## Fun Time

When viewing soft-margin SVM as regularized model, a larger C corresponds to

### 1

a larger λ, that is, stronger regularization

### 2

a smaller λ, that is, stronger regularization

### 3

a larger λ, that is, weaker regularization

### 4

a smaller λ, that is, weaker regularization

Comparing the formulations on page 4 of the slides, we see that C corresponds to

### 2λ1

. So larger C corresponds to smaller λ, which surely means weaker regularization.

(30)

Kernel Logistic Regression SVM versus Logistic Regression

## Algorithmic Error Measure of SVM

n

T

n

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0): upper bound of err

—often called

SVM:

by

of err

### 0/1

(31)

Kernel Logistic Regression SVM versus Logistic Regression

## Algorithmic Error Measure of SVM

n

T

n

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

—often called

SVM:

by

of err

### 0/1

(32)

Kernel Logistic Regression SVM versus Logistic Regression

## Algorithmic Error Measure of SVM

n

T

n

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

—often called

SVM:

by

of err

### 0/1

(33)

Kernel Logistic Regression SVM versus Logistic Regression

## Algorithmic Error Measure of SVM

n

T

n

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

—often called

SVM:

by

of err

### 0/1

(34)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

)

=0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(35)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

)

=0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(36)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

)

=0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(37)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

)

=0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(38)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

) =0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(39)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

) =0

≈ −y

(ln 2) ·

SCE(s,

)

≈ 0

### L2-regularized logistic regression

(40)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

) =0

≈ −y

(ln 2) ·

(s,

) ≈ 0

### L2-regularized logistic regression

(41)

Kernel Logistic Regression SVM versus Logistic Regression

## Connection between SVM and Logistic Regression

linear score

=

err

(s,

) =J

≤ 0K

SVM(s,

) =max(1 −

### y s,

0):

upper bound of err

SCE(s,

) =log

(1 + exp(−y

### s)):

another upper bound of err

used in

−∞ ←−

−→ +∞

≈ −y

SVM(s,

) =0

≈ −y

(ln 2) ·

SCE(s,

) ≈ 0

### L2-regularized logistic regression

(42)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### ‘easy’optimization

& theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### ‘easy’optimization

& regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(43)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### ‘easy’optimization

& theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(44)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### ‘easy’optimization

& theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(45)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### optimization

&

theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(46)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### optimization

&

theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(47)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### optimization

&

theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

regularized LogReg =⇒ approximate SVM

### SVM =⇒ approximate LogReg (?)

(48)

Kernel Logistic Regression SVM versus Logistic Regression

## Linear Models for Binary Classification

minimize err

specially

pros:

### •

cons: works only if lin. separable, otherwise needing

### soft-margin SVM

minimize regularized errcSVMby QP

pros:

### optimization

&

theoretical guarantee

### •

cons: loose bound of err

### 0/1

for very negative

### logistic regression for classification

minimize regularized errSCE by GD/SGD/...

pros:

### optimization

&

regularization guard

### •

cons: loose bound of err

### 0/1

for very negative

### ys

(49)

Kernel Logistic Regression SVM versus Logistic Regression

## Fun Time

We know thaterrcSVM(s, y ) is an upper bound of err

### 0/1

(s, y ). When is the upper bound tight? That is, when iserrcSVM(s, y ) = err

(s, y )?

ys ≥ 0

ys ≤ 0

ys ≥ 1

### 4

ys ≤ 1

By plotting the figure, we can easily see that errcSVM(s, y ) = err

### 0/1

(s, y ) if and only if ys ≥ 1. In that case, both error functions evaluate to 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/20

(50)

Kernel Logistic Regression SVM versus Logistic Regression

## Fun Time

We know thaterrcSVM(s, y ) is an upper bound of err

### 0/1

(s, y ). When is the upper bound tight? That is, when iserrcSVM(s, y ) = err

(s, y )?

ys ≥ 0

ys ≤ 0

ys ≥ 1

### 4

ys ≤ 1

By plotting the figure, we can easily see that errcSVM(s, y ) = err

### 0/1

(s, y ) if and only if ys ≥ 1.

In that case, both error functions evaluate to 0.

(51)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(52)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(53)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(54)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(55)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(56)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(57)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(58)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(59)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(60)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(61)

Kernel Logistic Regression SVM for Soft Binary Classification

## SVM for Soft Binary Classification

### 1

run SVM and get (bSVM,

SVM)

return

g(x) = θ(w

SVM

SVM)

### •

‘direct’ use of similarity

—works reasonably well

### 1

run SVM and get (bSVM,

SVM)

### 2

run LogReg with (bSVM,

SVM)as

### 3

return LogReg solution as g(x)

### •

not really ‘easier’ than original LogReg

### • SVM flavor (kernel?) lost

want: flavors from both sides

(62)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

·

(w

SVMΦ(x) +

SVM)

+

)

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(63)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(64)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(65)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(66)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(67)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

SVM

n

) +

### B



two-level learning:

on

### SVM-transformed

data

(68)

Kernel Logistic Regression SVM for Soft Binary Classification

## A Possible Model: Two-Level Learning

· (w

SVMΦ(x) +

SVM) +

### • SVM flavor: fix hyperplane direction byw

SVM—kernelapplies

likelihood by

SVM

SVM

### reasonably good

new LogReg Problem:

min

1 N

X

log

1 + exp

−y



· (w

SVMΦ(x

) +

SVM

| {z }

) +

### B



(69)

Kernel Logistic Regression SVM for Soft Binary Classification

## Probabilistic SVM

run

on D to get

SVM

SVM

### )

[or the equivalent α], and transform D to

=

SVMΦ(x

) +

### b

SVM

—actual model performs this step in a more complicated manner

run

on {(z

,y

)}

to get

### (A, B)

—actual model adds some special regularization here

return g(x) =

· (w

SVMΦ(x) +

SVM) +

### • soft binary classifier

not having the same boundary as

—because of

how to solve

—because only

### two variables

kernel SVM =⇒ approx. LogReg in Z-space

### exact LogReg in Z-space?

(70)

Kernel Logistic Regression SVM for Soft Binary Classification

## Probabilistic SVM

run

on D to get

SVM

SVM

### )

[or the equivalent α], and transform D to

=

SVMΦ(x

) +

### b

SVM

—actual model performs this step in a more complicated manner

run

on {(z

,y

)}

to get

### (A, B)

—actual model adds some special regularization here

return g(x) =

· (w

SVMΦ(x) +

SVM) +

### • soft binary classifier

not having the same boundary as

—because of

how to solve

—because only

### two variables

kernel SVM =⇒ approx. LogReg in Z-space

### exact LogReg in Z-space?

(71)

Kernel Logistic Regression SVM for Soft Binary Classification

## Probabilistic SVM

run

on D to get

SVM

SVM

### )

[or the equivalent α], and transform D to

=

SVMΦ(x

) +

### b

SVM

—actual model performs this step in a more complicated manner

run

on {(z

,y

)}

to get

### (A, B)

—actual model adds some special regularization here

return g(x) =

· (w

SVMΦ(x) +

SVM) +

### • soft binary classifier

not having the same boundary as

—because of

how to solve

—because only

### two variables

kernel SVM =⇒ approx. LogReg in Z-space

### exact LogReg in Z-space?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28.. Linear Support Vector Machine Course Introduction.

Feature Exploitation Techniques Error Optimization Techniques Overfitting Elimination Techniques Machine Learning in Practice... Finale Feature

soft-margin k -means OOB error RBF network probabilistic SVM GBDT PCA random forest matrix factorization Gaussian kernel kernel LogReg large-margin prototype quadratic programming

decision tree: a traditional learning model that realizes conditional aggregation.. Decision Tree Decision Tree Hypothesis.. Disclaimers about

decision tree: a traditional learning model that realizes conditional aggregation.. Disclaimers about Decision

• validation set blending: a special any blending model E test (squared): 519.45 =⇒ 456.24. —helped secure the lead in last

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25.. Gradient Boosted Decision Tree Summary of Aggregation Models. Map of

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Principle Component Analysis Denoising Auto Encoder Deep Neural Network... Deep Learning Optimization

For a deep NNet for written character recognition from raw pixels, which type of features are more likely extracted after the first hidden layer.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24.:. Deep Learning Deep

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22.. Matrix Factorization Summary of Extraction Models.

Lecture 1: Large-Margin Linear Classification Large-Margin Separating Hyperplane Standard Large-Margin Problem Support Vector Machine.. Reasons behind

Lecture 4: Soft-Margin SVM Soft-Margin SVM: Primal Soft-Margin SVM: Dual Soft-Margin SVM: Solution Soft-Margin SVM: Selection.. Soft-Margin SVM Soft-Margin

For a data set of size 10000, after solving SVM on some parameters, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded.. Soft-Margin

Random Forest Algorithm Out-Of-Bag Estimate Feature Selection.. Random Forest

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

• validation set blending: a special any blending model E test (squared): 519.45 =⇒ 456.24. —helped secure the lead in last