DataFlowDesignfortheBackpropagationAlgorithmCheng- Yuan Liou

(1)

Data Flow Design for the Backpropagation Algorithm

Cheng- Yuan Liou∗ and Yen-Ting Kuo

Dept. of Computer Science and Information Engineering, National Taiwan University Supported by National Science Council under Project NSC 90-2213-E-002-092

Abstract

We report a data flow [1] design for the multilayer network. Both back-propagation(BP) [2]

learning and feedforward computing are constructed with a single basic module where each neuron is regarded as a module. This design can be extended to various networks [3][4] .

1 Introduction

Many data flow designs for neural networks have been developed for various purposes with varied successes, such as that for the committee machine [5]. We report our design in this paper and omit a full review of them. For distributed computation [6], the network training procedure have to be deconstructed such that each neuron can be trained separately. A back-propagation algorithm trains network layer by layer doing forward and backward computations. According to the algorithm the updation formulas [3] are

forward computation

y¹_j = σ Ã_m₀

X

i=0

w¹_jix_i+ b¹_j

!

for the j⁰th neuron of the first layer, (1) y_j^l = σ

Ã_m_l−1 X

i=0

w_ji^l y_i^l⁻¹+ b^l_j

!

, {l = 2...L}

for the j⁰th neuron of the l⁰th layer; (2)

∗correspondent, Email:[email protected], Phone:(02)23625336-515

backward coputation

δ^L_j = (d^L_j − y^Lj)(y_j^L)⁰

for the j⁰th neuron of the output layer, (3) δ^l_j= (y_j^l)⁰

mXl+1

i

δ^l+1_i w^l+1_ij , {l = 1...L}

for the j⁰th neuron of the l⁰th hidden layer, (4) w_ji^l = w^l_ji+ ∆w_ji^l = w^l_ji+ ηδ^l_jy_i^l⁻¹

updation equation. (5) In the above equations w denotes the weight between two neurons. d is the desired response.

x is the input. y denotes the neuron’s output.

σ is the active function. η is a tunable learning rate. l denotes the number of layer, where 1 denotes the first hidden layer and L is the output layer. i or j denote the number of neuron in each layer. So, y^l_j is the output of the j⁰th neuron in the l⁰th hidden layer, w^l_jiis the weight between the j⁰th neuron in the l⁰th layer and the i⁰th neuron in the (l − 1)⁰th layer. b^l_j is the j⁰th neuron’s bias. d^l_j is the desired response of the j⁰th neuron in the l⁰th layer. δ^l_j is the j⁰th neuron’s delta value for weight correction. m₀ is the number of neurons in input layer, m_l₋₁ is the number of neurons in the (l − 1)⁰th layer . All neurons use these equations to improve their weights. Each neuron use the outputs of all neurons in the next precedent layer as inputs. We will isolate each neuron with all its weights, inputs, desired response, and output. This allows us to implement the BP algorithm on distribute parallel machine. In the next section, we present the basic module for a single neuron. Then we show a data flow [1] structure for multilayer networks constructed with such basic module.

(2)

Figure 1: Single neuron diagram.

2 The Basic Module

Fig. 1 shows the diagram of a single neuron. The forward equation is:

y^t= σ Ã_N

X

i=0

w_i^tx^t_i+ b

!

(6)

where t denotes time and the active function is temporarily set to the bipolar sigmoid function σ(u) = _1+e²−u − 1. According to the delta learning rule [2], the error-correction function is defined by:

∆w^t= η1

2 (d^t− y^t)(1 − (y^t)²)x^t. (7) And the weight is updated according to:

w^t+1= w^t+ ∆w^t. (8)

Combining Eq. 6,7 and setting F be the updation function for this neuron, we obtain:

(∆w^t, y^t) = F(w^t, x^t, d^t), (9) where F uses w(weight), x(input), and d(desire response) as its inputs and ∆w^t and y^t as it’s outputs. The data structure for this neuron is plotted in Fig. 2. This structure is well known among the engineering society.

Figure 2: One module neuron data structure.

3 The Modular Design for Various Neural Network

We can use this basic module to construct various kinds of neural networks, such as multilayer network [3], multilayer network with jump connection [3], recurrent network [8][9] and self- organized map [4]. In this section we show how to construct a multilayer network and recurrent network.

3.1 Multilayer Network

A multilayer network can be transformed into its modular form in O(n) time, where n is the number of total neurons. With a pointer supporting language, such as C [7], we can allocate a memory space for each neuron and maintain a pointer pointing to it. The algorithm is:

algorithm Modular Transform for L=1 to number of layers

for N=1 to the # of neurons in layer L Neu ←−allocate a memory

space for each neuron.

store all data of this neuron(L,N) in Neu.

set Neu−→forward destination point to neurons in next layer.

set Neu−→backward destination to neurons in precedent layer.

look-up table(N,L) ←−Neu’s location.

end.

(3)

end.

Table 1. Modular Transform Algorithm It is a little cost of memory doing this modular transform. Usually the BP algorithm is im- plemented using a matrix (or an array) to store weights, input vector and output vector. Each entry in the matrix corresponds to a neuron’s relative position in the network. Instead of this matrix, we maintain a pointer for each neuron which contains the synapses to all linked neurons. The key part of the module is that the desired response for each neuron must be given in advance, not only for the neuron in the output layer. Therefore, the BP algorithm (equation 1 to 5) must be reformed in such a way that every hidden neuron can be treated like an independent neuron as long as we can calculate its desired response. For this response, observe the two equations;

δ^L_j = 1

2 (d^L_j − yj^L)¡

1 − (yj^L)²¢

, (10) δ^l_j = 1

2

¡1 − (yj^l)²¢^mX^l+1

i=1

δ^l+1_i w_ij^l+1 (11) Equation 10 and 11 are the backward delta equation in the BP algorithm using the bipolar sigmoid function. Symbols are defined same as preceding section. Equation 10 is for the the neurons in the output layer and equation 11 is for hidden layer. Simplify these two equations, where we regard y_j^L and y_j^l as the same role, we obtain the desired response for each neuron in the hidden layer,

d^l_j= y_j^l+

mXl+1

i=1

δ^l+1_i w^l+1_ij . (12)

According to ∆w_ij^L = ηδ^L_iy^L_i , the delta for the neuron i in the output layer is:

δ^L_i =∆w_ij^L

ηy^L_i (13)

Substitute equation 13 into 12 the desired response for the j⁰th hidden neuron is:

d^l_j= y^l_j+

mXl+1

i=1

∆w^l+1_ij

ηy^l+1_i w^l+1_ij , {l+1 = 2....L}. (14)

Get pattern X and feed

forward

Compute cycle error E

Adjust weights for output layer using

F(w,x,d)

More hidden layers?

Adjust weights for hidden layer using F(w,x,d) Initialize

weights W for all neurons

Get desire using Eq(14)

More patterns?

E<Emax?

E=0 yes

no yes

no

STOP no yes

Figure 3: Modified BP training flow of modular design network.

With equation 14 we obtain all neurons’ desired responses no matter what layer it belonging to.

Therefore each neuron can be treated separately.

To our knowledge this equation has not been dis- cussed before.

In figure 3, we illustrate a flow chart of the modular design for the BP algorithm. The main diﬀerence between the formal BP algorithm [2] is that we calculate each neuron’s desired response before adjust its weights. Figure 5 shows an example of a 1-3-2 modular design network.

Similar to the multilayer feed forward network, a multilayer network can have jump connection (see figure 4) from lower level to higher level. It’s forward and backward equations are similar to the BP equations 1 to 5. Its modular

(4)

Figure 4: An example of a multilayer network with jump connection.

design is similar to that for the the multilayer feed forward network.

3.2 Recurrent Network

A recurrent network [8][9] uses its outputs as its inputs. We show the modular design for training a recurrent network in Figu re 6. The trai n- ing procedure starts by feeding an input vector x into the modular network. The desired response is set to a target sequence. The output of the network is feedback to itself as the next input in each iteration. The procedure will be continued until the error reduced to a satisfiable range. We treat recurrent network as a normal feed forward network, where we connect it’s output destination back to itself.

The modular design is particularly useful for the data flow machine. It is believed to achieve high degree of parallel computation. The main achievement is that we can decompose neural networks into small modules which enable us to feed each module into multi-speed processors that can conform the spirit of data flow machine.

The self-organization neural network can cope with the data flow machine structure with less modifications and we omit its discussion.

References

[1] Veen, Arthur H, (1986). Dataflow Machine Architecture. ACM Computing Surveys, Vol.

18, No. 4, Dec., pp. 365-396

[2] D.E. Rumelhart, G.E. Hinton, and R.J.

Williams (1968). Learning Representa- tions by Back-propagation Errors. Na- ture(London), Vol. 323, pp. 533-536.

[3] C.-Y. Liou. (2001). Lecture Notes on Neural Networks. National Taiwan University, 526 U1180.

(http://red.csie.ntu.edu.tw//NN/index.html) [4] Kohonen, T. (1982). Self-organized Forma-

tion of Topologically Correct Feature Maps.

Biological Cybernetics 43, pp. 59-59.

[5] Volker Tresp, (2001). Committee Ma- chines. Handbook for Neural Network Signal Processing, Yu Hen and Jenq-Neng (eds.), CRC Press.

[6] McClelland, T. L., D. E. Rumelhart, and the PDP Research Group. (1986). Parallel Dis- tributed Processing. Cambridge: The MIT Press.

[7] Worthington, Steve (1988). C programming.

Boston, MA : Boyd & Fraser Pub. Co.

[8] Hopfield, J. J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proc. Natl. Acad.

Sci. 79: 2554-58.

[9] Hopfield, J. J. (1984). Neurons with Graded Response Have Collective Computational Properties Like Those of Two State Neurons.

Proc Natl. Acad. Sci. 81: 3088-3092.

(5)

Figure 5: Training pro cedure of mo dular design network. The values in step 1 are randomly generated. Succeeding steps change its values according to initial step.

(6)

Fi gure 6: The t rainin g pro cedure of recurrent network usi ng mo dular d es ign. The valu es in s tep 1 are randomly generated. Succeeding steps change its values according to initial step.