-Artificial Neural Network- Chapter 5 Back Propagation Network

(1)

-Artificial Neural Network-

Chapter 5 Back Propagation Network

朝陽科技大學

資訊管理系

李麗華教授

(2)

Introduction (1)

• BPN = Back Propagation Network

• BPN is a layered feedforward supervised network.

• BPN provides an effective means of allowing a computer to examine data patterns that may be incomplete or noisy.

• BPN can take various type of input, i.e., binary data or real data.

• The output of BPN is depending on the transfer function used.

(1) If the sigmoid function is used, then the output 0≤y ≤1 (2) If the hyperbolic Tangent function is used,

then the output : -1 ≤y ≤1

(3)

朝陽科技大學李麗華教授 3

Introduction (2)

Architecture:

X_n Y_j

X₁

X₂

‧‧

‧

Y₁

Y₂

H₁

H₂

H_h

θ₁

θ_h θ₂

‧‧

‧

‧‧

‧

(4)

Introduction (3)

•Input layer: [X₁,X₂,….X_n].

•Hidden layer: can have more than one layer.

• derive: net₁, net₂, …net_h; transfer output H₁, H₂,…,H_h,

H_h will be used as the input to derive the result for output layer

•Output layer: [Y₁,…Y_j].

•Weights: W_ij.

•Transfer function: Nonlinear  Sigmoid function

(*) The nodes in the hidden layers organize themselves in a way that net j

j

e

net

f

_

  1 ) 1

(

(5)

Introduction (4)

•Application of BPN is quite broad.

– Pattern Recognition (樣本識別; 字母識別) – Prediction (股巿預測)

– Classification (客群分類) – Learning (資料學習)

– Control (回饋與控制) – CRM (客服分群服務)

(6)

Processing Steps (1)

The processing steps can be briefly described as follows.

1. Based on the problem domain, set up the network.

2. Randomly generate weights W_ij.

3. Feed a training set, [X₁,X₂,….X_n], into BPN.

4. Compute the weighted sum and apply the transfer function on each node in each layer. Feeding the

transferred data to the next layer until the output layer is reached.

5. The output pattern is compared to the desired output and an error is computed for each unit.

(7)

Processing Steps (2)

6. Feedback the error back to each node in the hidden layer.

7. Each unit in hidden layer receives only a portion of total errors and these errors then feedback to the input layer.

8. Go to step 4 until the error is very small.

9. Repeat from step 3 again for another training set.

(8)

Computation Processes(1/10)

•The detailed computation processes of BPN.

1. Set up the network according to the input nodes and the output nodes required. Also, properly choosing the hidden layers and nodes.

2. Randomly assigned the weights.

3. Feed the training pattern (set) into the network and do the following computation.

x₁

: :

Ｗ_hj X_i

Ｗ_ih

net_h

X_n

H_h

: :

Y_j

θ₁

θ_h

θ_j

H₁ net₁

Ｗ_nh

(9)

Computation Processes(2/10)

4. Compute from the Input layer to hidden layer for each node.

neth

h h

h

net e f

H net







 

1 ) 1

(

i

h i

ih

X -

W

= 

5. Compute from the hidden layer to output layer for each node.

netj

h j

j

net e f

Y net







 

1 ) 1

(

- H W

=

i

j h

hj



(10)

Computation Processes(3/10)

6. Calculate the total error & find the difference for correction

δ_j=Y_j(1-Y_j)( T_j -Y_j) δ_h=H_h(1- H_h) Σ

jW_hj δ_j

7. ΔW_hj=ηδ_j H_h ΔΘ_j = -ηδ_j ΔW_ih=ηδ_h X_i ΔΘ_h= -ηδ_h 8. update weights

W_hj=W_hj+ΔW_hj ，W_ih=W_ih+ΔW_ih ， Θ_j= Θ_j + ΔΘ_j， Θ_h= Θ_h + ΔΘ_h

9. Repeat steps 4~8, until the error is very small.

10.Repeat steps 3~9, until all the training patterns are learned.

(11)

EX: Use BPN to solve XOR (1)

• Use BPN to solve the XOR problem

• Let W₁₁=1, W₂₁= -1, W₁₂= -1, W₂₂=1, W₁₃=1, W₂₃=1, Θ₁=1, Θ₂=1,Θ₃=1, η=10

0 1 1

1 1 -1

1 -1 1

0 -1 -1

T X₁ X₂

W₂₃ W₁₃

W₂₂ W₂₁

W₁₂ W₁₁ X₁

X₂

Y₁ H₁

H₂ Θ₁

Θ₂

Θ₃

(12)

EX: BPN Solve XOR (2)

• ΔW₁₂=ηδ₁ X₁ =(10)(-0.018)(-1)=0.18

• ΔW₂₁=ηδ₁ X₂ =(10)(-0.018)(-1)=0.18

• ΔΘ₁ =-ηδ₁ = -(10)(-0.018)=0.18

• 以下為第一次修正後的權重值.

X₂ X₁

0.754 1.18

0.82

0.754

1.915 1.18

0.82

(13)

BPN Discussion

1. Number of hidden nodes increase, the convergence will get slower. But the error can be minimized.

2. The general concept of designing the number of hidden node uses:

# of hidden nodes=(Input nodes + Output nodes)/2, or

# of hidden nodes=(Input nodes * Output nodes)^1/2 3. Usually, 1~2 hidden layer is enough for learning a

complex problem. Too many layers will cause the learning very slow. When the problem is hyper-

dimension and very complex, then an extra layer could be used

4. Learning rate, η, usually set between [0.1, 1.0], but it depends on how fast and how detail the network shall learn.

(14)

The Gradient Steepest Descent Method(SDM) (1)

•The gradient steepest descent method

•Recall:

•We want the difference of computed output and expected output getting close to 0.

•Therefore, we want to obtain so that we can update weights to improve the network results.

j n

i j

n ij

j

W A

net  

^¹

 

 ^ ^ ^ ^ _ ^



j

ij j

j

W

A E T

E ( 1 / 2 ) ( )

²

W

_ij

- 

Wij

E



(15)

The Gradient Steepest Descent Method(SDM) (2)

ij n j k kj k ij

n j

ij n

j n

j n j n

ij j n j n

ij j

W A W W

net

W net net

A A

E W

net net

E W

E





 



 



 







⁾

( (1)

For

) (

) )(

(

1

) 1 ) (

2 ( )

3 (



1

1 1

For (3-2) when n is the hidden layer

( )( )

n k n

k jk

n n n

k k

j k j

net

E E

A net _ ^A ^ ^W

   





 



) ) (

( (2)

For

' n

n j j

n j n

j n

j f net

net net f

net

A 







2

For (3-1): when n is the output layer [1/ 2 ( ) ]

-(Tj- )

n

k k

k n

n n j

j j

T A

E A

A A

 

  

 



(16)

The Gradient Steepest Descent Method(SDM) (3)

1

n 1

j

From (1)(2)(3) we have two types of values:

When n is output layer

=-( - ) ( ) ( (B))

or ( (A))

we get ( ) ( et )

n t n n

j j j i

i j

n n

j i

n t n

j i j

E T A f A A W

A

T A f n











 

代入代入

(17)

The Gradient Steepest Descent Method(SDM) (4)

1 1

1

When n is hidden layer

=-[ ] ( et ) ( (B))

or ( (A))

we get [ ] ( et )

n t n n

k jk j i

i j k

n n

j i

n n t n

j k jk j

k

E W f n A

W

A

W f n



 

 















代入代入

1

n n

j i

ij

ij ij ij

n n

ij j i

j j j

n j

E A

W

W W ΔW

W A

Δ



   

  



 

   

  

     

   



(18)

The Gradient Steepest Descent Method(SDM) (5)

j j

-netj -1

-net -1 -2 -net

2

( ) 1 (1 e )

1

( ) [(1 e ) ] ][-( e )]

1

(1 ) (1 ) 1

1

j

j j j

n

j net

t n

j

netj netj

net net net

j j

f net

e f net

e e

e e e

f(net )( - f(net ))



 

  

  



 

  

  











 ^[



^ ^] ⁽¹ ⁾ ^if ⁿ ^is ^hidden ^layer layer output

is n if )

Y - (1 )Y Y - (T

1

j j

k

ik n

j n

j  W H H



(19)

The Gradient Steepest Descent Method(SDM) (6)

• Learning computation

h

j

j j j j j

1. Compute value of the hidden layer

H ( ) 1

1

2. Compute value of the output layer Y ( ) 1

1

3. =Y (1- Y )(T - Y ) C

h

j

j ih i h

i

h net

j hj h j

i

j net

net W X

f net

e

net W H

f net

e







  

 



  

 





ompute the value difference for correction

h h 1 h hj j

j

δ H ( - H ) W δ



(20)

The Gradient Steepest Descent Method(SDM) (7)

h i

4. H = ompute the value to be updated H

5.

hj j j

ih h

hj hj hj j j j

ih ih ih h h h

W C

W

W W W

-Artificial Neural Network- Chapter 5 Back Propagation Network