Implementation of the MLP Kernel
Cheng-Yuan Liou* and Wei-Chen Cheng
Department of Computer Science and Information Engineering
National Taiwan University Republic of China
*cyliou@csie.ntu.edu.tw
25th, Nov., 2008 17:40-19:00
Auckland
ICONIP 2008 Liou, C.-Y. 3
Related works
ICONIP, Weight design, upper bound Liou, Yu
1994
ICNN, Perth, AIR Liou, Yu
1995
ICONIP Liou, Cheng
2007
ICS, SIR Liou, Chen, Huang
2000
Support vector machine Boser
1992
Contribution People
Year
⎥⎥
⎥
⎤
⎢⎢
⎢
<⎡
−
−
1 1
m m
m n
n Y
Liou and Yu, 1995, ICNN, Perth
0 =14
= Y
X Y1 = 6 Y2 = 4 Y3 = 2
m
m Y
Y −1 <<
xp
=2 C
( )p,1
y xq
, many -to-one mapping
= 2
= C YL
ICONIP 2008 Liou, C.-Y. 5
⎥⎥
⎥
⎤
⎢⎢
⎢
< ⎡
−
−
1 1
m m
m n
n Y
y(p,m-1)
y(p,m-2) y(p,m)
Wm by design, and guaranteed (Liou and Yu, 1994, ICONIP, Seoul)
Wm by training, SIR
(Liou, Chen, Huang, 2000, ICS) (Liou, Cheng, 2007, ICONIP)
mth layer m-1
C YL =
( )
{ }
p mY m = y ,
The reason why the training is layer after layer independently.
[Liou and Yu, 1995, ICNN, Perth]
ICONIP 2008 Liou, C.-Y. 7
ICNN, 1995, Perth, AIR tree
L1 L2
bia s
L3
x bi a s
y
input l a ye r bi a s
out put la ye r
first hidde n l a ye r se c ond hi dde n la ye r
(001) (111)
(100)
(110)
(000) (010)
(011) L3
L1
L2
(0)
(010) (1)
(011)
(110) (100)
(001)
(111) (001) (010) (000)
(000)
(100) (011)
Conclusions of AIR, 1995
• BP can not correct the latent error neurons by adjusting their succeeding layers.
• AIR tree can trace the errors in a latent layer that near the front input layer.
• The front layers must send right signals to their succeeding layers.
• The front layer must be trained layer after layer in order to get right signals.
• Split the function of supervised BP, categorization and calibration.
• Reduced number of representations. Ym−1 << Ym
ICONIP 2008 Liou, C.-Y. 9
AIR, 1995
• Supervised BP
• Identified the function of MLP
– Classification =
Categorization + Calibration Differences of classes class labels
Front
Categorize
Output Calibrate
x y
Weight design
ICONIP, 1994, Seoul
• Weight design for each layer
• Number of neurons (E.B. Baum 1988)
– Upper bound for first hidden layer
– for hidden layers
– the number of classes, guaranteed
⎥⎥
⎥
⎤
⎢⎢
⎢
< ⎡
−
−
1 1
m m
m n
n Y
C Y L =
⎥⎥⎤
⎢⎢⎡ D
P
ICONIP 2008 Liou, C.-Y. 11
Continuous border
1st hidden layer
divide and conquer
Devise training for layers SIR, ICS 2000
• Categorization sector
• Using differences between classes implicitly
(p m) (q m)
E
rep , ,2
1 y − y
= −
(p m) (q m)
E
att , ,2
1 y − y
=
Inter-class
Intra-class
ICONIP 2008 Liou, C.-Y. 13
SIR kernel
output space input space
Input data
(p,0)
p y
x = y( )p,1
W1
x2
x1
x3
x4
x5
( )1,1
y
( )2,1
y
( )3,1
y
( )4,1
y
( )5,1
y
( ) ( ) ( )
{
1 2 1 3 2 3}
1 x ,x , x ,x , x ,x
U =
( )
{
4 5}
2 x , x
U =
( ) ( ) ( ) ( ) ( ) ( )
{
1 4 1 5 2 4 2 5 3 4 3 5}
2 ,
1 x ,x , x ,x , x ,x , x ,x , x ,x , x ,x
V = :inter-class pattern pairs
SIR kernel
output space input space
Input data
(p,0)
p y
x = y( )p,1
W1
x2
x1
x3
x4
x5
( )1,1
y
( )2,1
y
( )3,1
y
( )4,1
y
( )5,1
y
( )
( )
( ),1 ( ),1 ,
4
3 argmin
2 , 1
, i j
V x xi j
x
x = y −y
∈
( )
= ( )− ( )( ) ( ) ( )
{
1 2 1 3 2 3}
1 x ,x , x ,x , x ,x
U =
( )
{
4 5}
2 x , x
U =
ICONIP 2008 Liou, C.-Y. 15
SIR kernel
output space input space
Input data
(p,0)
p y
x = y( )p,1
W1
x2
x1
x3
x4
x5
( )1,1
y
( )2,1
y
( )3,1
y
( )4,1
y
( )5,1
y
( )
( )
( ),1 ( ),1 ,
4
3 argmin
2 , 1
, i j
V x xi j
x
x = y −y
∈
( )
( ) ( )
( ),1 ( ),1 ,
or ,
2
1 argmax
2 1
, p q
U x x U x
xp q p q
x
x = y −y
∈
∈
( ) ( ) ( )
{
1 2 1 3 2 3}
1 x ,x , x ,x , x ,x
U =
( )
{
4 5}
2 x , x
U =
( ) ( ) ( ) ( ) ( ) ( )
{
1 4 1 5 2 4 2 5 3 4 3 5}
2 ,
1 x ,x , x ,x , x ,x , x ,x , x ,x , x ,x
V =
SIR kernel
output space input space
Input data
(p,0)
p y
x = y( )p,1
W1
x2
x1
x3
x4
x5
( )1,1
y
( )2,1
y
( )3,1
y
( )4,1
y
( )5,1
y
(
3 4)
( )3,1 ( )4,12
,x 1 y y
x = − −
Erep
( )
( ) ( ) ( )
{
1 2 1 3 2 3}
1 x ,x , x ,x , x ,x
U =
( )
{
4 5}
2 x , x
U =
ICONIP 2008 Liou, C.-Y. 17
SIR kernel
( ) ( )
1 2
1 1
1
, ,
W x x E
W x x W E
s r rep q
p att
∂
− ∂
∂
← ∂
∇ η η
1 1
1 W W
W ← −∇
Input dat a
(p,0)
p y
x = y( )p,1
W1
In this work, we set,
1 . 0 ,
01 .
0 2
1 = η =
η
Update by the following two equations,
This means that the force of repelling is stronger than attracting.
Two-Class Problem
• The border of the data is
• All input values are in the range [-1,1].
( )
1 3 1 210
1 x x
x + =
ICONIP 2008 Liou, C.-Y. 19
Two-Class Problem
• The number of neuron of SIR kernel is five, nm=5.
• The supervised BP uses two hidden layers which consists of five neurons, nMLP1=nMLP2=5.
• SVM kernel:K( )u,v =
(
uTv+1)
3supervised BP SIR kernel SVM
Three-Class Problem
ICONIP 2008 Liou, C.-Y. 21
Three-Class Problem
• SOM is used for analyzing the output of each layer. The y is the output of each
layer.
• The class color of input pattern are
plotted on the winner neuron.
Three-Class Problem
ICONIP 2008 Liou, C.-Y. 23
Three-class problem, n m =5
Real World Data
• Patterns in a whole dataset are divided into 5 partitions.
• The testing accuracy is the average of the 5 -fold cross validation.
• The SVM uses a Gaussian kernel.
• The parameters, C and gamma are in
the list.
ICONIP 2008 Liou, C.-Y. 25
Real World Data
10 30
0.05 50
(5,1) (20,5)
Parkinsons
10 30
0.05 50
(5,1) (30,7)
Wisconsin Breast Cancer
10 20
0.05 50
(5,3) (11,5)
iris
C
Supervised BP SVM
SIR kernel
(nm, Lmax) γ nMLP
1
n2MLP
(n1c,n2c)
Real World Data
92.82%
88.20%
91.28%
99.87%
98.33%
100%
Parkinsons
96.42%
95.57%
96.00%
97.53%
98.89%
100%
Wisconsin Breast Cancer
96.00%
94.66%
97.33%
97.50%
99.67%
100%
iris
Supervised SVM SIR kernel BP
Supervised SVM SIR kernel BP
Testing Accuracy Training Accuracy
ICONIP 2008 Liou, C.-Y. 27
Summary
• Class to point, guaranteed,
• Widely separated class points
• Weights by design or training
• Class labels are not used.
• SIR kernel can be used in SVM.
• Hairy network techniques can be used in the calibration sector.
• Suitable for multiple classes problem.
C Y L =
Thank You
Implementation of the MLP Kernel
Cheng-Yuan Liou* and Wei-Chen Cheng
Department of Computer Science and Information Engineering
National Taiwan University Republic of China
*cyliou@csie.ntu.edu.tw