• 沒有找到結果。

© Deng Cai, College of Computer Science, Zhejiang University

N/A
N/A
Protected

Academic year: 2021

Share "© Deng Cai, College of Computer Science, Zhejiang University"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

© Deng Cai, College of Computer Science, Zhejiang University

So Far…

Our goal (supervised learning):

To learn a set of discriminant functions

Bayesian framework

We could design an optimal classifier if we knew:

P(i) : priors and P(x | i) : class‐conditional densities

Using training data to estimate P(i) and P(x | i)

Directly learning discriminant functions from the training data

We only know the form of the discriminant functions

Linear Regression

Logistic Regression

SVM

0

Linear

(2)

© Deng Cai, College of Computer Science, Zhejiang University

Nonlinear Distributed Data

Impossible to separate with a hyperplane

?

(3)

2

Generalized Linear Function & 

Kernel Methods

Deng Cai (蔡登)

College of Computer Science Zhejiang University

[email protected]

(4)

© Deng Cai, College of Computer Science, Zhejiang University

A Circle from 2D to 3D

Here is an example of mapping a (special case) circle in 2D to 3D (the  result is linear separable):

(5)

© Deng Cai, College of Computer Science, Zhejiang University

Generalized Linear Discriminant  Functions

Recall the Linear Discriminant Function

positive implies class 1

negative implies class 2

Generalized Linear Discriminant

Add additional terms involving the products of features

For example, 

Given: [x1,  x2,  x3]

Make it: [ x1,  x2,  x3,  x1x2,  x2x3,  x1x2x3 ] by adding products of  features.

Learn a discriminant function that is linear in the new  feature space

(6)

© Deng Cai, College of Computer Science, Zhejiang University

Quadratic Discriminant Function

Quadratic Discriminant Function

Obtained by adding pair‐wise products of features

g(x) positive implies class 1;  g(x) negative implies class 2

g(x) = 0, represents a hyperquadric (hyperparaboloid,  hyperellipsoid, hyperhyperboloids), as opposed to  hyperplanes in linear discriminant case.

Adding more terms such as w

ijkxixjxk

results in  polynomial discriminant functions.

Linear Part

(d+1) parameters

Quadratic part, d(d+1)/2 additional parameters

(7)

© Deng Cai, College of Computer Science, Zhejiang University

Quadratic Discriminant Function

(8)

© Deng Cai, College of Computer Science, Zhejiang University

Quadratic Discriminant 

Functions

(9)

© Deng Cai, College of Computer Science, Zhejiang University

Generalized Discriminant Function

A generalized linear discriminant function can be written as,

Equivalently, 

Setting 

functions

Setting yi(x)to be  monomials results in  polynomial 

discriminant  functions Dimensionality of the 

augmented feature  space.

Weights in the augmented  Weights in the augmented  feature space.  Note that the  function is linear in a.

t

a

d

a

a , ,..., ] [

a 

1 2 ˆ

y  [ y

1

( x ), y

2

( x ),..., y

dˆ

( x )]

t

also called the augmented feature vector.

(10)

© Deng Cai, College of Computer Science, Zhejiang University

Phi Function

The discriminant function g(x) is not linear in x, but is  linear in y.

The mapping      is taking a d‐

dimensional vector x and mapping it to a        

dimensional space. The mapping y is called the phi‐

function.

When the input patterns x are non‐linearly separable in  the input space, mapping them using the right phi‐

function maps them to a space where the patterns are  linearly separable.

Unfortunately, the curse of dimensionality makes it hard  to capitalize this in practice. A complete QDF involves (d  +1) (d+2)/2 terms; for modest values of d, say d =50, this  requires many terms

t

d x

y x

y x

y

y  [ 1( ), 2( ),..., ˆ( )]

(11)

© Deng Cai, College of Computer Science, Zhejiang University

Representer Theorem

10

(12)

© Deng Cai, College of Computer Science, Zhejiang University

Kernelized Ridge Regression

Woodbury matrix identity

11

argmin

, ,

(13)

© Deng Cai, College of Computer Science, Zhejiang University

Support Vector Machine

Hyper plane of maximum margin is supported by those points (vectors) on the margin. Those are called Support Vectors.

Non-support vectors can move freely without

affecting the position of the hyperplane as long as they don’t exceed the margin.

(14)

© Deng Cai, College of Computer Science, Zhejiang University

Support Vector Machine

The final classifier is

sgn sgn ,

Note: for non‐support vectors, the corresponding  is zero.

(15)

© Deng Cai, College of Computer Science, Zhejiang University

Kernels

Let  , 0 be some measure of similarity between objects  , , where  is some abstract space; we will call  a kernel function.

 Typically the function is symmetric, and non‐negative

Examples

 Linear kernels

 Polynomial kernels 

 RBF kernels 

14

,

, 1

, exp

2

(16)

© Deng Cai, College of Computer Science, Zhejiang University

The advantages of kernel methods

Non‐linear classifiers

 The kernel  Nonlinearity of the learned function.

The samples can not be represented as feature vectors

 But we can get the similarity of two samples

 String kernels

 Graph kernels

15

參考文獻

相關文件

Department of Computer Science and Information

Department of Computer Science and Information

Department of Computer Science and Information

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =>

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

Professor of Computer Science and Information Engineering National Chung Cheng University. Chair

Department of Physics, National Chung Hsing University, Taichung, Taiwan National Changhua University of Education, Changhua, Taiwan. We investigate how the surface acoustic wave