Data Flow Design for the Backpropagation Algorithm
Cheng- Yuan Liou∗ and Yen-Ting Kuo
Dept. of Computer Science and Information Engineering, National Taiwan University Supported by National Science Council under Project NSC 90-2213-E-002-092
Abstract
We report a data flow [1] design for the multi- layer network. Both back-propagation(BP) [2]
learning and feedforward computing are con- structed with a single basic module where each neuron is regarded as a module. This design can be extended to various networks [3][4] .
1 Introduction
Many data flow designs for neural networks have been developed for various purposes with varied successes, such as that for the committee ma- chine [5]. We report our design in this paper and omit a full review of them. For distributed computation [6], the network training procedure have to be deconstructed such that each neuron can be trained separately. A back-propagation algorithm trains network layer by layer doing for- ward and backward computations. According to the algorithm the updation formulas [3] are
forward computation
y1j = σ Ãm0
X
i=0
w1jixi+ b1j
!
for the j0th neuron of the first layer, (1) yjl = σ
Ãml−1 X
i=0
wjil yil−1+ blj
!
, {l = 2...L}
for the j0th neuron of the l0th layer; (2)
∗correspondent, Email:[email protected], Phone:(02)23625336-515
backward coputation
δLj = (dLj − yLj)(yjL)0
for the j0th neuron of the output layer, (3) δlj= (yjl)0
mXl+1
i
δl+1i wl+1ij , {l = 1...L}
for the j0th neuron of the l0th hidden layer, (4) wjil = wlji+ ∆wjil = wlji+ ηδljyil−1
updation equation. (5) In the above equations w denotes the weight between two neurons. d is the desired response.
x is the input. y denotes the neuron’s output.
σ is the active function. η is a tunable learning rate. l denotes the number of layer, where 1 denotes the first hidden layer and L is the out- put layer. i or j denote the number of neuron in each layer. So, ylj is the output of the j0th neuron in the l0th hidden layer, wljiis the weight between the j0th neuron in the l0th layer and the i0th neuron in the (l − 1)0th layer. blj is the j0th neuron’s bias. dlj is the desired response of the j0th neuron in the l0th layer. δlj is the j0th neuron’s delta value for weight correction. m0 is the number of neurons in input layer, ml−1 is the number of neurons in the (l − 1)0th layer . All neurons use these equations to improve their weights. Each neuron use the outputs of all neu- rons in the next precedent layer as inputs. We will isolate each neuron with all its weights, in- puts, desired response, and output. This allows us to implement the BP algorithm on distribute parallel machine. In the next section, we present the basic module for a single neuron. Then we show a data flow [1] structure for multilayer net- works constructed with such basic module.
Figure 1: Single neuron diagram.
2 The Basic Module
Fig. 1 shows the diagram of a single neuron. The forward equation is:
yt= σ ÃN
X
i=0
witxti+ b
!
(6)
where t denotes time and the active function is temporarily set to the bipolar sigmoid function σ(u) = 1+e2−u − 1. According to the delta learn- ing rule [2], the error-correction function is de- fined by:
∆wt= η1
2 (dt− yt)(1 − (yt)2)xt. (7) And the weight is updated according to:
wt+1= wt+ ∆wt. (8)
Combining Eq. 6,7 and setting F be the upda- tion function for this neuron, we obtain:
(∆wt, yt) = F(wt, xt, dt), (9) where F uses w(weight), x(input), and d(desire response) as its inputs and ∆wt and yt as it’s outputs. The data structure for this neuron is plotted in Fig. 2. This structure is well known among the engineering society.
Figure 2: One module neuron data structure.
3 The Modular Design for Various Neural Network
We can use this basic module to construct var- ious kinds of neural networks, such as multi- layer network [3], multilayer network with jump connection [3], recurrent network [8][9] and self- organized map [4]. In this section we show how to construct a multilayer network and recurrent network.
3.1 Multilayer Network
A multilayer network can be transformed into its modular form in O(n) time, where n is the num- ber of total neurons. With a pointer supporting language, such as C [7], we can allocate a mem- ory space for each neuron and maintain a pointer pointing to it. The algorithm is:
algorithm Modular Transform for L=1 to number of layers
for N=1 to the # of neurons in layer L Neu ←−allocate a memory
space for each neuron.
store all data of this neuron(L,N) in Neu.
set Neu−→forward destination point to neurons in next layer.
set Neu−→backward destination to neurons in precedent layer.
look-up table(N,L) ←−Neu’s location.
end.
end.
end.
Table 1. Modular Transform Algorithm It is a little cost of memory doing this modu- lar transform. Usually the BP algorithm is im- plemented using a matrix (or an array) to store weights, input vector and output vector. Each entry in the matrix corresponds to a neuron’s relative position in the network. Instead of this matrix, we maintain a pointer for each neuron which contains the synapses to all linked neu- rons. The key part of the module is that the desired response for each neuron must be given in advance, not only for the neuron in the out- put layer. Therefore, the BP algorithm (equa- tion 1 to 5) must be reformed in such a way that every hidden neuron can be treated like an independent neuron as long as we can calculate its desired response. For this response, observe the two equations;
δLj = 1
2 (dLj − yjL)¡
1 − (yjL)2¢
, (10) δlj = 1
2
¡1 − (yjl)2¢mXl+1
i=1
δl+1i wijl+1 (11) Equation 10 and 11 are the backward delta equa- tion in the BP algorithm using the bipolar sig- moid function. Symbols are defined same as preceding section. Equation 10 is for the the neurons in the output layer and equation 11 is for hidden layer. Simplify these two equations, where we regard yjL and yjl as the same role, we obtain the desired response for each neuron in the hidden layer,
dlj= yjl+
mXl+1
i=1
δl+1i wl+1ij . (12)
According to ∆wijL = ηδLiyLi , the delta for the neuron i in the output layer is:
δLi =∆wijL
ηyLi (13)
Substitute equation 13 into 12 the desired re- sponse for the j0th hidden neuron is:
dlj= ylj+
mXl+1
i=1
∆wl+1ij
ηyl+1i wl+1ij , {l+1 = 2....L}. (14)
Get pattern X and feed
forward
Compute cycle error E
Adjust weights for output layer using
F(w,x,d)
More hidden layers?
Adjust weights for hidden layer using F(w,x,d) Initialize
weights W for all neurons
Get desire using Eq(14)
More patterns?
E<Emax?
E=0 yes
no yes
no
STOP no yes
Figure 3: Modified BP training flow of modular design network.
With equation 14 we obtain all neurons’ desired responses no matter what layer it belonging to.
Therefore each neuron can be treated separately.
To our knowledge this equation has not been dis- cussed before.
In figure 3, we illustrate a flow chart of the modular design for the BP algorithm. The main difference between the formal BP algorithm [2] is that we calculate each neuron’s desired response before adjust its weights. Figure 5 shows an ex- ample of a 1-3-2 modular design network.
Similar to the multilayer feed forward net- work, a multilayer network can have jump con- nection (see figure 4) from lower level to higher level. It’s forward and backward equations are similar to the BP equations 1 to 5. Its modular
Figure 4: An example of a multilayer network with jump connection.
design is similar to that for the the multilayer feed forward network.
3.2 Recurrent Network
A recurrent network [8][9] uses its outputs as its inputs. We show the modular design for train- ing a recurrent network in Figu re 6. The trai n- ing procedure starts by feeding an input vector x into the modular network. The desired response is set to a target sequence. The output of the network is feedback to itself as the next input in each iteration. The procedure will be continued until the error reduced to a satisfiable range. We treat recurrent network as a normal feed forward network, where we connect it’s output destina- tion back to itself.
The modular design is particularly useful for the data flow machine. It is believed to achieve high degree of parallel computation. The main achievement is that we can decompose neural networks into small modules which enable us to feed each module into multi-speed processors that can conform the spirit of data flow machine.
The self-organization neural network can cope with the data flow machine structure with less modifications and we omit its discussion.
References
[1] Veen, Arthur H, (1986). Dataflow Machine Architecture. ACM Computing Surveys, Vol.
18, No. 4, Dec., pp. 365-396
[2] D.E. Rumelhart, G.E. Hinton, and R.J.
Williams (1968). Learning Representa- tions by Back-propagation Errors. Na- ture(London), Vol. 323, pp. 533-536.
[3] C.-Y. Liou. (2001). Lecture Notes on Neural Networks. National Taiwan University, 526 U1180.
(http://red.csie.ntu.edu.tw//NN/index.html) [4] Kohonen, T. (1982). Self-organized Forma-
tion of Topologically Correct Feature Maps.
Biological Cybernetics 43, pp. 59-59.
[5] Volker Tresp, (2001). Committee Ma- chines. Handbook for Neural Network Signal Processing, Yu Hen and Jenq-Neng (eds.), CRC Press.
[6] McClelland, T. L., D. E. Rumelhart, and the PDP Research Group. (1986). Parallel Dis- tributed Processing. Cambridge: The MIT Press.
[7] Worthington, Steve (1988). C programming.
Boston, MA : Boyd & Fraser Pub. Co.
[8] Hopfield, J. J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proc. Natl. Acad.
Sci. 79: 2554-58.
[9] Hopfield, J. J. (1984). Neurons with Graded Response Have Collective Computational Properties Like Those of Two State Neurons.
Proc Natl. Acad. Sci. 81: 3088-3092.
Figure 5: Training pro cedure of mo dular design network. The values in step 1 are randomly generated. Succeeding steps change its values according to initial step.
Fi gure 6: The t rainin g pro cedure of recurrent network usi ng mo dular d es ign. The valu es in s tep 1 are randomly generated. Succeeding steps change its values according to initial step.