Input data

(1)

Implementation of the MLP Kernel

Cheng-Yuan Liou* and Wei-Chen Cheng

Department of Computer Science and Information Engineering

National Taiwan University Republic of China

*[email protected]

25th, Nov., 2008 17:40-19:00

Auckland

(2)

(3)

ICONIP 2008 Liou, C.-Y. 3

Related works

ICONIP, Weight design, upper bound Liou, Yu

1994

ICNN, Perth, AIR Liou, Yu

1995

ICONIP Liou, Cheng

2007

ICS, SIR Liou, Chen, Huang

2000

Support vector machine Boser

1992

Contribution People

Year

⎥⎥

⎥

⎤

⎢⎢

⎢

<⎡

−

1 1

m m

m n

n Y

(4)

Liou and Yu, 1995, ICNN, Perth

0 =14

= Y

X ^Y¹ ⁼ ⁶ ^Y² ⁼ ⁴ ^Y³ ⁼ ²

m

m Y

Y ⁻¹ <<

xp

=2 C

( )p,1

y xq

, many -to-one mapping

= 2

= C Y^L

(5)

⎥⎥

⎥

⎤

⎢⎢

⎢

< ⎡

−

1 1

m m

m n

n Y

y^(p,m-1)

y^(p,m-2) y^(p,m)

W_m by design, and guaranteed (Liou and Yu, 1994, ICONIP, Seoul)

W_m by training, SIR

(Liou, Chen, Huang, 2000, ICS) (Liou, Cheng, 2007, ICONIP)

m^th layer m-1

C Y^L =

( )

{ }

^p ^m

Y m = y ^,

(6)

The reason why the training is layer after layer independently.

[Liou and Yu, 1995, ICNN, Perth]

(7)

ICNN, 1995, Perth, AIR tree

L₁ L2

bia s

L₃

x bi a s

y

input l a ye r bi a s

out put la ye r

first hidde n l a ye r se c ond hi dde n la ye r

(001) (111)

(100)

(110)

(000) (010)

(011) L3

L₁

L2

(0)

(010) (1)

(011)

(110) (100)

(001)

(111) (001) (010) (000)

(000)

(100) (011)

(8)

Conclusions of AIR, 1995

• BP can not correct the latent error neurons by adjusting their succeeding layers.

• AIR tree can trace the errors in a latent layer that near the front input layer.

• The front layers must send right signals to their succeeding layers.

• The front layer must be trained layer after layer in order to get right signals.

• Split the function of supervised BP, categorization and calibration.

• Reduced number of representations. ^Y^m⁻¹ ^<< ^Y^m

(9)

AIR, 1995

• Supervised BP

• Identified the function of MLP

– Classification =

Categorization + Calibration Differences of classesclass labels

Front

Categorize

Output Calibrate

x y

(10)

Weight design

ICONIP, 1994, Seoul

• Weight design for each layer

• Number of neurons (E.B. Baum 1988)

– Upper boundfor first hidden layer

– for hidden layers

– the number of classes, guaranteed

⎥⎥

⎥

⎤

⎢⎢

⎢

< ⎡

−

1 1

m m

m n

n Y

C Y ^L =

⎥⎥⎤

⎢⎢⎡ D

P

(11)

Continuous border

1st hidden layer

divide and conquer

(12)

Devise training for layers SIR, ICS 2000

• Categorization sector

• Using differences between classes implicitly

(^p ^m) (^q ^m)

E

rep ^, ^,

2 1 y − y

= −

(p m) (q m)

E

att ^, ^,

2 1 y − y

=

Inter-class

Intra-class

(13)

SIR kernel

output space input space

Input data

(p,0)

p y

x = y^{( )}^p^,¹

W1

x2

x1

x3

x4

x5

( )1,1

y

( )2,1

y

( )3,1

y

( )4,1

y

( )5,1

y

( ) ( ) ( )

{

¹ ² ¹ ³ ² ³

}

1 x ,x , x ,x , x ,x

U =

( )

{

⁴ ⁵

}

2 x , x

U =

( ) ( ) ( ) ( ) ( ) ( )

{

¹ ⁴ ¹ ⁵ ² ⁴ ² ⁵ ³ ⁴ ³ ⁵

}

2 ,

1 x ,x , x ,x , x ,x , x ,x , x ,x , x ,x

V = :inter-class pattern pairs

(14)

SIR kernel

Input data

(p,0)

p y

x = y^{( )}^p^,¹

W1

x2

x1

x3

x4

x5

( )1,1

y

( )2,1

y

( )3,1

y

( )4,1

y

( )5,1

y

( )

( ),1 ( ),1 ,

4

3 argmin

2 , 1

, ⁱ ^j

V x xⁱ ^j

x

x = y −y

∈

( )

⁼ ^{( )}⁻ ^{( )}

( ) ( ) ( )

{

¹ ² ¹ ³ ² ³

}

1 x ,x , x ,x , x ,x

U =

( )

{

⁴ ⁵

}

2 x , x

U =

(15)

SIR kernel

Input data

(p,0)

p y

x = y^{( )}^p^,¹

W1

x2

x1

x3

x4

x5

( )1,1

y

( )2,1

y

( )3,1

y

( )4,1

y

( )5,1

y

( )

( ),1 ( ),1 ,

4

3 argmin

2 , 1

, ⁱ ^j

V x xⁱ ^j

x

x = y −y

∈

( )

( ) ( )

( ),1 ( ),1 ,

or ,

2

1 argmax

2 1

, ^p ^q

U x x U x

x^p ^q ^p ^q

x

x = y −y

∈

( ) ( ) ( )

{

¹ ² ¹ ³ ² ³

}

1 x ,x , x ,x , x ,x

U =

( )

{

⁴ ⁵

}

2 x , x

U =

( ) ( ) ( ) ( ) ( ) ( )

{

¹ ⁴ ¹ ⁵ ² ⁴ ² ⁵ ³ ⁴ ³ ⁵

}

2 ,

1 x ,x , x ,x , x ,x , x ,x , x ,x , x ,x

V =

(16)

SIR kernel

Input data

(p,0)

p y

x = y^{( )}^p^,¹

W1

x2

x1

x3

x4

x5

( )1,1

y

( )2,1

y

( )3,1

y

( )4,1

y

( )5,1

y

(

³ ⁴

)

^{( )}³^,¹ ^{( )}⁴^,¹

2

,x 1 y y

x = − −

Erep

( )

( ) ( ) ( )

{

¹ ² ¹ ³ ² ³

}

1 x ,x , x ,x , x ,x

U =

( )

{

⁴ ⁵

}

2 x , x

U =

(17)

SIR kernel

( ) ( )

1 2

1 1

1

, ,

W x x E

W x x W E

s r rep q

p att

∂

− ∂

∂

← ∂

∇ η η

1 1

1 W W

W ← −∇

Input dat a

(p,0)

p y

x = y^{( )}^p^,¹

W1

In this work, we set,

1 . 0 ,

01 .

0 ₂

1 = η =

η

Update by the following two equations,

This means that the force of repelling is stronger than attracting.

(18)

Two-Class Problem

• The border of the data is

• All input values are in the range [-1,1].

( )

₁ ³ ₁ ₂

10 1 x x

x + =

(19)

Two-Class Problem

• The number of neuron of SIR kernel is five, n_m=5.

• The supervised BP uses two hidden layers which consists of five neurons, n^MLP₁=n^MLP₂=5.

• SVM kernel:^K( )^u^,^v ⁼

(

^u^T^v⁺¹

)

³

supervised BP SIR kernel SVM

(20)

Three-Class Problem

(21)

Three-Class Problem

• SOM is used for analyzing the output of each layer. The y is the output of each

layer.

• The class color of input pattern are

plotted on the winner neuron.

(22)

Three-Class Problem

(23)

Three-class problem, n _m =5

(24)

Real World Data

• Patterns in a whole dataset are divided into 5 partitions.

• The testing accuracy is the average of the 5 -fold cross validation.

• The SVM uses a Gaussian kernel.

• The parameters, C and gamma are in

the list.

(25)

Real World Data

10 30

0.05 50

(5,1) (20,5)

Parkinsons

10 30

0.05 50

(5,1) (30,7)

Wisconsin Breast Cancer

10 20

0.05 50

(5,3) (11,5)

iris

C

Supervised BP SVM

SIR kernel

(ⁿ_m^{, L}_max) γ _n^MLP

1

n₂MLP

(ⁿ1^c,ⁿ2^c)

(26)

Real World Data

92.82%

88.20%

91.28%

99.87%

98.33%

100%

Parkinsons

96.42%

95.57%

96.00%

97.53%

98.89%

100%

Wisconsin Breast Cancer

96.00%

94.66%

97.33%

97.50%

99.67%

100%

iris

Supervised SVM SIR kernel BP

Testing Accuracy Training Accuracy

(27)

Summary

• Class to point, guaranteed,

• Widely separated class points

• Weights by design or training

• Class labels are not used.

• SIR kernel can be used in SVM.

• Hairy network techniques can be used in the calibration sector.

• Suitable for multiple classes problem.

C Y ^L =

(28)

Thank You

Implementation of the MLP Kernel

Cheng-Yuan Liou* and Wei-Chen Cheng

Department of Computer Science and Information Engineering

National Taiwan University Republic of China

*[email protected]