4 Learning from Examples
4.6 Generalized Net Description of BP Algorithm
In this section, we will construct the generalized net description of the backpropa-gation algorithm of multilayer feedforward neural networks, the section is mostly base on works by Krawczak and Aladjov (2002) and Krawczak (2003b, 2003e).
We are interested in modelling the learning capabilities of the neural network, and so we will construct the generalized net representation of the neural networks that can describe the changes of the weights between neurons. In the considera-tions some of the absent elements will be involved in the description of the proper characteristic function Φ, which generates the new tokens characteristics. Addi-tionally, the inputs of the neural network are treated also as tokens.
Three main parts can be distinguished in the description of the neural network learning process. The first part describes the process of simulation or propagation;
in the second part the performance index of learning is introduced, while the third part describes the operations that are necessary to change the states of neurons (by changing the connections – i.e. weights).
Let us consider the generalized net representation of the backpropagation algo-rithm shown in Fig. 4.3. Each neuron or a group of neurons in the neural network (e.g. a layer or all neurons in the considered network) is represented by a token of α -type. The tokens of this type enter the net through the place X and have the 1 following initial characteristics
( )
i( )l =NN l i( )
l fi( )l yα 1, , , (4.43)
for i
( )
l =1,2,...,N( )
l , l=0,1,...,L, where 1NN
the neural network identifier,
( )
li
the number of the token (neuron) associated with the l-th layer, l
the present layer number,
( ) ( ) ( ) ( ) ( )
( )
−=
−
−
− 1
1 1
1 1 1 1
l N
l i
l l i l i
l x w
f (4.44)
is an activation function of the i-th neuron associated with the l-th layer of the neural network.
It should be mentioned that the characteristic (4.43) has a different form than that described by e.g. (3.36), because there we were interested in modelling of the neural network simulation process, when there are adjusted weights between neu-rons and for a given network input and we are interested in the network output - in that case the tokens represented the stages of the neurons (the values of the neu-rons’ outputs). If we are interested in modelling the learning capabilities of the neural network, then we must construct the generalized net representation of the neural network that can describe the changes of the weights between neurons.
Here the inputs of the neural network are treated also as tokens, for which the activation function fi( )0
()
⋅ =1.0, for i( )
0 =1,2,...,N( )
0 .The basic generalized net description of the backpropagation algorithm con-tains six transitions, see Fig. 4.3, which will be described one by one.
Every token αi( )l , i
( )
0 =1,2,...,N( )
l , l=0,1,...,L, is transferred from the place X1 to the place X2 as well as X3 via the transition Z1. We assume that the tokens are transferred sequentially according to increasing indexes( )
l N( )
li =1,2,..., for given l=0,1,...,L, in order to be aggregated with other tokens of the same level l into one new token α( )l , representing the whole layer
l, according the following conditions of transition Z1 X2
X3
{
1, 2}{
, 2, 3}
,1 X X X X
Z = X1
2 ,
V1 V1,3 ,∨
(
X1,X2)
(4.45)X2 2 ,
V2 V2,3 where
=
¬
= 1,3
2 ,
1 V
V “if there is only one token αi( )l in the place X1”, i.e.
(
j( )l ∈KX) (
prY i( )l ≠ prY j( )l j( ) ( )
l ≠il)
∀ 2 2 ,
1 α α
α (4.46)
(where
X1
K is a set of all tokens entering the net from the place X1,
( ) ( )
l jl N( )
li , =1,2,..., )
2 =
,
V2 “if there is more than one token αi( )l and αj( )l associated with the l-th layer”, i.e.
( ) ( )
(
αil ∈KX1&αjl ∈KX2) (
pr2Yαi( )l = pr2Yαj( )l)
∃ (4.47)
3 =
,
V2 ”if all tokens αi( )l , i
( )
l =1,2,...,N( )
l , have been combined into one token”i.e.
( ) ( )
(
αil ∈KX1&αkl ∈KX2) (
pr2Yαi( )l = pr2Yαk( )l)
&¬∃
( ) ( )
(
il ∈KX jl ∈KX) (
prY i( )l = prY j( )l i≠ j)
¬∃ & 2 2 ,
1
1 α α α
α .
(4.48)
As far as we are not interested in network topologies there is no need of consider-ing the separate neurons or even separate layers, in this way all α -type tokens are aggregated into just one. Here, we are interested in changing the characteristics of the neurons, therefore the whole neural network is represented by one transition
Z1 with three places X1,X2,X3.
First, we will aggregate the tokens associated with neurons related to one layer.
This aggregation is done in the transition Z1 in the following manner. For each layer, l=0,1,...,L, the characteristics of the separated tokes representing neurons
(4.43) are further processed with the matrices of connection weights, the neuron activation functions, the neuron outputs and so on (as it was shown in details in Chap. 3), in order to construct a new token representing the whole layer.
Fig. 4.3 The generalized net rpresentatiom of the backpropagation algorithm
The new token α( )l associated with the l-th layer according to the condition (4.48) is transferred from the place X2 to the place X3, and has the following characteristic
( )
( )l =NN l[
N( )
l]
F( )l yα 1, , 1, , (4.49)
for l=0,1,2,...,L, where 1
NN
the neural network identifier, l
the layer number,
[
1,N( )
l]
denotes N
( )
l tokens (neurons) arranged in a sequence, starting form the first and ending at N( )
l , associated with the l-th layer,Z5
m5
X8
X9
m6
X10
m7
Z6
X7
Z4
m4
X6
X5
Z3
n2
n1
m3
m2
X4
m1
Z2
Z1
X1
X2
Z5
( )
[ ]
Tis a vector of the activation functions of the neurons associated with the l-th layer of the neural network.
The procedure described above is repeated for all layers, and in result in the place X3
we obtain L tokens, the representation of the procedure of generating the neural network output.
It should be emphasized here that the essential information about neurons’
connectivity is contained in the characteristic function Φ of the transition Z1. The second transition Z2 is devoted to introduction of the performance index of the learning process. This kind of information is associated with the β-type token. The token β enters the input place m1 with the following initial
the neural network identifier, E
performance index of the neural network learning, Emax
threshold value of the performance index, which must be reached.
The transition Z2 has the following form X4
for l=0,1,2,...,L, where 1
NN
the neural network identifier, l
the layer number,
[
1,N( )
l]
is a vector of the activation functions of the neurons associated with the l-th layer of the neural network
( )l W
denotes the aggregated initial weights connecting the neurons of the
( )
l−1 -stlayer with the l-th layer neurons.
The β token obtains now the following characteristic in place m2
( )
=NN1,0,Emaxyβ . (4.55)
Then, we will consider the transition Z3, in which the new tokens of γ-type are introduced. The token γp, p=1 ,2 ,P, where p is the number of the training pattern, enters the place n1 with the initial characteristic
( )
=X D p all layers are calculated sequentially layer by layer. The transition Z3 describes the process of signal propagation within the neural network}, , , , { }, , , , , ,
{ 4 5 9 2 7 1 5 6 3 2
3 X X X m m n X X m n
Z = X5
X6 m3 n2
X4 V4,5 V4,6 false false X5
V5,5 V5,6 false false X9
V9,5 V9,6 false false , m2 false false true false m7 false false true false n1 false false false true
∨
∧( (X4,X5,X9),(m2,m7),n1) (4.57) where
=
=
= 5,5 9,5
5 ,
4 V V
V “the previous layer does not have defined outputs”,
5 , 4 6 , 9 6 , 5 6 ,
4 V V V
V = = =¬ ,
2=
,
V1 “all layers’ outputs have assigned values for the current pattern”.
In the place X the tokens of 5 α -type, α( )l , l=0,1,2,...,L, obtain the new characteristics as follows
( )
( )l =NN l[
N( )
l]
F( ) ( ) ( )l Wl Xl yα 1, , 1, , , , (4.58)
where X( )l =
[
x1( )l,x2( )l ,...,xN(l)]
T, l=1,2,...,L, is the vector of outputs of neu-rons associated with the l -th layer, related to the nominal weights W( )l ,L l=1,2,..., .
In the place X there are tokens with the following characteristics, which con-6 tain calculated neuron outputs for the pattern p
( )
( )l =NN l[
N( )
l]
F( ) ( )l Wl Xp( )l yα 1, , 1, , , , (4.59)
calculated for the nominal values of the weights W( )l and states X( )l , l=1,2,...,L. In the place m3 the token β preserves its characteristic as
( )
=NN1,0,Emaxyβ , and in the place n2 the token γ also does not change its characteristic and remains as y
( )
γp =Xp(0),Dp,p.The next transition Z4 describes the first stage of the estimation and weight adjustment process, which is related to the performance index computation, and has the following form
X7 m4
As a result of computations performed within the transition Z4, the token β ob-tains the new value of performance index in the place m4,
( )
=NN1,E′,EmaxIn the next transition
X8
the delta factors, described in Sect. 4.3, are computed in the following way
(
( 1)) (
( ))
The next transition Z6, describing the process of weight adjustment, has the
V8 ”there are still unused patterns”,
9
V5 “if the performance index is below the given threshold Emax”,
6
with updated weight connections
( )l
[
w( )l w( )l wNl]
T are calculated in the following way)
In the place m7 the β token obtains the characteristic y
( )
β =NN1,E,Emax,which is not final.
The final values of the weights satisfying the predefined stop condition are de-noted by W( )l* =pr5NN1,l,
[
1,N( )
l]
,F( ) ( )l,W′l , where the characteristics of the α -type tokens in the place X10 are described by( )
( )l =NN l[
N( )
l]
F( ) ( )l W′l yα 1, , 1, , , (4.70)
and the β token characteristic in the place m6 is described by
( )
=NN1,E′,Emaxy β (4.71)
while the final value of the performance index is equal E*= pr2NN1,E′,Emax. The here developed generalized net representation of the backpropagation al-gorithm describes the main features of the gradient descent based learning algo-rithms. This representation allows for modifying and testing other algorithms by changing a relatively small portion of the generalized net formal description.
The generalized net methodology has been applied to describe functioning of other types of neural netwoks, see e.g. Krawczak, Atanassov and Sotirov (2010), Sotirov (2003, 2005), Sotirov and Krawczak (2003, 2006, 2008a, 2008b), Sotirov, Krawczak and Kodogiannis (2006, 2007).
M. Krawczak: Multilayer Neural Networks, SCI 478, pp. 95–121.
DOI: 10.1007/978-3-319-00248-4_5 © Springer International Publishing Switzerland 2013
Learning as a Control Process
5.1 Introduction
The commonly used algorithm for the multilayer neural networks learning, the backpropagation algorithm described in the Chap. 4, is a gradient descent method for searching minimum of a performance index of learning. The performance index, being the measure of neural network learning quality, is a multimodal func-tion. Application of this kind of algorithms causes frequent stopping at a local minimum. Various modifications of this algorithm still cannot avoid local minimal points. Until now, in practice, the only way of trying to find the near global optimum solution is to perform computation several times with different initial weight values and then to choose the best solution.
The backpropagation algorithm does not use the special layered structure of the multilayer networks. In this chapter we propose a new global algorithm for neural networks learning. The algorithm is based on the dynamic programming principle introduced by Bellman in the early 1950s (Bellman 1972, Bertsekas 1995), and allows, at least theoretically, for finding of the global minimum of the learning error. The learning of a multilayer neural network is considered as a spe-cial case of the multistage optimal control problem, first proposed by Krawczak and Mizukami (1994), and developed by Krawczak (e.g. 1995a, 1995b, 1999b, 2000a, 2001a, 2001b, 2004d, 2004g, 2005b, 2006a). The gist of the new algorithm for learning of multilayer neural networks consists of aggregating neurons within separate layers and then considering such a system as a particular multistage opti-mal control problem. Thus, layers become stages, while weights - controls. The problem of optimal weight adjustment is converted into a problem of optimal control.
The multistage optimal control problem can be solved by application of the dynamic programming (Bryson and Ho 1969, Cruz (1977, Roitenberg 1978, Lu-enberger 1984). For the new algorithm the return functions for each layer are de-fined, and minimization of these functions is performed layer by layer, starting from the last layer. This approach gives a real possibility of performing global optimisation. There are obstacles to the application of dynamic programming; one
is the curse of dimensionality – the computational burden, and the second is the memory requirement, growing exponentially with the state and control dimension-ality. Fortunately, there is a way to avoid this kind of difficulties by introducing some approximation of the return functions, see Jacobson and Mayne (1979) or Yakowitz Rutherford (1984). They proposed a method to approximate the return function by considering the second-order terms in the Taylor expansion of the functions. It seems that there is possibility using the first-order method only but with application of the conjugate gradient algorithm, which converges to the in-verse of the proper Hessian matrix.
In some sense, it is an application of the idea of the neuro-dynamic program-ming for the neural network learning, process introduced by Bertsekas and Tsitsik-lis (1996). The term “neuro” is equivalent in this context to any kind of function approximation.
5.2 Multistage Neural Systems
Let us assume each neuron is given by the following expressions
(
())
)
(l pjl
pj f net
x = (5.1)
( ) ( 1)
) 1 (
1 1
) ( ) 1 ( )
( −
−
=
−
−= pil
l N
l i
l j l i l
pj w x
net (5.2)
where xpj( )l is the scalar output of the j -th neuron, j
( )
l =1,2,...,N(l), situated within the layer l, l=1,2,...,L, the index p=1,2,...,P indicates the number of a pattern, and f(
netpj(l))
is the differentiable activation function of the neuron j( )
l ,while netpj(l) is the input to the neuron j
( )
l coming from the layer( )
l−1 . The notation used is a little bit different than that used in the previous chapter in order to emphasize the stage-wise nature of the network.Fig. 5.1 shows a multilayer neural network with distinct layers. Now, let us aggregate neurons situated within each layer ,l l=0,1,2,...,L, in a way de-scribed by the following expressions
( )
l[
x( )l x( )l xNl]
TX = 1 , 2 ,..., () for l=0,1,2,...,L (5.3)
( )
l[
w( )l w( )l wN l]
TW −1 = 1 −1 , 2 −1 ,..., (−1) (5.4)
where wj( )l−1 =
[
wj1( )l−1,wj2( )l−1,...,wjN(l−1)]
T, for l=1,2,...,L,( )
l 1 1,2,...,N(l)j − = .
Using Equ. 5.3 and 5.4, we can rewrite Equ. 5.1 for the whole layer (stage) in the form
Fig. 5.1 A multilayer neural network with arranged neurons within each layer
( )
l =F(
W( ) ( )
l−1,X l−1)
X for l=1,2,...,L (5.5)
where X
( )
l denotes the aggregated output of the layer l, while W( )
l−1 denotesthe aggregated weights connecting the l-th layer with the
( )
l−1 -st layer, and( )
l−1X is the aggregated output of the
( )
l−1 -st layer.Equ. 5.5 expresses the dynamics of the multistage system depicted in Fig. 5.2.
The system is assumed to have L stages, and the evolution of the system’s state through these stages, X
( ) ( )
0,X 1,...,X( )
L , is governed by the equation, similar to (5.1), of the form( )
l F(
W( ) ( )
l X l)
X +1 = , for l=0,1,...,L−1 (5.6) Layer L-1
Layer 1 Layer 2 Layer L
Layer 0 Inputs Outputs
) 0
1(
x x2(0) xN(0)(0)
)
1(L
x x2(L) xN(L)(L)
where F
( )
l ,l=0,1,2,...,L, is a N( )
l -dimensional vector of functions built of the separate activation functions of the neurons situated within the l -th layer. In the system theory nomenclature it is said that X( )
l+1 denotes the output of the sys-tem in the( )
l+1 -st stage, while W( )
l and X( )
l denote the control and the input to the system associated with the( )
l+1 -st stage, respectively.The performance index is denoted by E
( )
=
−
= P
p
p
p X L
D E
1
2
2
1 (5.7)
where D , p p=1,2,...,P, is a N
( )
L dimensional vector of the desired network outputs, while P is the number of training patterns.This form of the performance index, in which only the output stage of the sys-tem is involved, is said in the control terminology to be in Mayer form (Sethi and Thompson, 1981).
Under the definitions (5.6) and (5.7), it is possible to define the problem of weight adjustment as an optimisation problem in a precise manner, namely the problem is to find the sequence of controls W
( )
l , l=0,1,...,L−1, that minimize the performance index (5.7) subject to the state transition equation (5.6).Fig. 5.2 A multilayer neural network as a multistage system Layer 0
Inputs
) 0 ( X
) 2 (L− X
) 1 ( X
) 2 ( X
) 1 (L− X
) (L X
Layer L-1
Layer 1 Layer 2 Layer L
) 1 (L− W
) 2 (L− W
) 1 ( W
) 0 ( W Outputs
5.3 Dynamic Programming Solution
The dynamic programming solution is based on two principles that are strictly related to the structure of the problem, (Bertsekas 1995), in our case the learning problem of multilayer neural networks, (Saratchandran 1991, Krawczak and Mi-zukami 1994, Krawczak 1995a, 1999a, 2000a, 2001a, 2002b, 2002c).
There are two fundamental principles determining the multistage systems; one is the principle of causality and second - the principle of optimality.
The first principle states: the state X
( )
l and the sequence of controls[
l,r−1]
=[
W( ) ( )
l ,W l+1,...,W( )
r−1]
W
uniquely determine the state X
( )
r , it follows directly from (5.6). According to this principle there exists a transition function G(
X( )
l ,W[
l,r−1] )
, which de-scribes the state( )
r G(
X( )
l W[
l r]
l r)
X = , , −1, , . (5.8)
In this way the initial state X
( )
0 (the inputs to the neural network) and the control sequence W[
0,L−1]
uniquely determine the sequence of states[ ]
L[
X( ) ( )
X X( )
L]
X1, = 1, 2,...,
and the performance index (5.7) can be defined as a function of the initial state
( )
0X - and of the sequence of controls W
[
0,L−1]
.It assures that there exists a function V
(
X( )
0,W[
0,L−1] )
that determines the value of the performance index in the form( ) [ ]
(
0, 0, −1)
=V X W L
E . (5.9)
The principle of optimality allows for spreading of the performance index into two parts
( ) [ ]
(
X 0,W0,L−1)
=E1(
X( )
0,W[
0,l−1] )
+E2(
X( )
l ,W[
l,L−1] )
E . (5.10)
The first term in (5.10) is a response of the system to the sequence of controls
[
0, 1]
* l−
W that minimises E , the second term completes the optimisation proc-1 ess from the state l to the state L - due to the application of the controls
[
, 1]
* l L−
W , which minimize E . 2
These two principles due to Bellman determine the dynamic programming methodology. The consequence of these principles is the principle of optimal feedback control, which requires the optimal control at any stage to be a function of the state at this stage. Within the optimal control theory, this dependence is defined as the optimal feedback control which states that at the stage l, control
( )
lW may be expressed as a function of the state X
( )
l( )
l W(
X( )
l l)
W* = * , . (5.11)
From the optimal feedback control principle (Dyer and McReynolds 1970) it fol-lows that there exists the return function, which in the neural networks case has the form (Krawczak 2000a)
( ) [ ]
optimal return function described by( ) [ ]
For large-scale systems, like neural networks, optimisation in (5.13), i.e. finding of the optimal controls W*
[
0,L−1]
is a very troublesome problem. The minimiza-tion in (5.13), according to the principle of optimality (Bertsekas 1995), can be treated as a stage-by-stage process. This minimization process for the whole net-work can be written as( ) [ ]
or due to the principle of optimality (5.10)
( ) [ ]
process of minimizing of Equ. 5.15 can be performed in a recursive way and can be rewritten in the following form:for the last L-th layer
( )
(
X L)
E(
X( )
L)
V =
for the (L-1)-st layer
( ) [ ]
(
X L−1,W*L−1)
= min( )−1V(
X( ) (
L ,V L−1) )
V W L
. . .
for the l-th layer
( ) [ ]
(
X l ,W*l)
=min( )V(
X( ) (
l+1,V L−1) )
V
l W
. . .
for the 0-th layer
( ) [ ]
(
X 0,W*0,L−1)
=min( )0V(
X( ) ( )
1,W 0,W*[
1,L−1] )
V
W . (5.16)
According to Equ. 5.16, the optimisation process runs backwards starting from the output stage (the output layer) and ending at the 0-th stage (the input layer). Any stage (layer) can be described by a transition function (5.6)
( )
l F[
W( ) ( )
l X l]
X +1 = , for l=0,1,...,L−1
which expresses the output X
( )
l+1of the( )
l+1 -st layer as a function of weights( )
lW and the output of the previous layer X
( )
l . Substituting (5.15) into (5.16), for−1
=L
l we get the following backward transition equation
( ) ( )
(
1, 1)
min( ( (
1) (
, 1) ) )
) 1 (
* − = − −
− W L − E F X L W L
L X V
L W
. (5.17)
Minimization of (5.17) with respect to W
(
L−1)
subject to V(
X( )
L)
=E(
X( )
L)
,will give the optimal values of the controls (weights) W*
(
L−1)
for the(
L−1)
-ststage (layer). For the
(
L−2)
-nd stage (layer) the optimisation process looks like( ) ( )
(
2 , 2)
min( ( (
2) (
, 2)
, *(
1) ) )
) 2 (
* − = − − −
−
−
L W L W L X F V L
W L X V
L W
. (5.18)
For any stage (layer) l the optimisation can be noted in the following form
( ) ( )
(
,)
min( ( ( ) ( )
, , *( )
1) )
) (
*l = V F X l W l W l+
W l X V
l W
. (5.19)
The main feature of the above equation is that the return function related to the l-th layer describes l-the learning error just transformed to l-the l-l-th layer. The optimal values of the controls W*
( )
l for any layer l , 1l=0,1,2,...,L− are obtained by minimization of the transformed learning error( ) ( ) [ ]
(
X l ,W l ,W*l+1,L−1)
V
of the
( )
l+1 -st stage which is expressed in terms of X( )
l and W( )
l .The return function V
(
X( ) ( )
l ,W l ,W*[
l+1,L−1] )
can be obtained as a se-quence, calculated in a backward manner,( ) [ ]
(
X l ,W*l,L−1)
=V(
X( )
l+1,W*[
l+1,L−1] )
V (5.20)
and at the last stage
( ) [ ]
( ) ( )
=
−
=
−
−
P
p
p
p X L
D L
L W L X V
1
* 2
2 1 1 , 1
, . (5.21)
The recursive relation (5.20) is valid for any arbitrary control W
[
l,L−1]
(Dyer and McReynolds 1970), not only for the optimal( ) [ ]
(
X l ,W l,L−1)
=V(
X( )
l+1,W[
l+1,L−1] )
V (5.22)
Minimization of the return function for the stage L can be performed in different ways. For more about the different optimisation methods, which can be applied to (5.17), see Bertsekas and Tsitsiklis (1996).
Generally, it is very difficult to find the solution of the dynamic programming equations. The only way to avoid the so-called curse of dimensionality is to apply some approximation technique for the return functions, in order to find the ap-proximate solutions of the problem.
5.4 Return Function Approximations
In this section we consider a class of approximation, which is based on the Taylor expansion of the return functions. This class is sometimes called the differential dynamic programming (Yakowitz and Rutherford 1984). The term "differential dynamic programming" refers to nonlinear programming procedures based on dynamic programming. The idea of this kind of optimisation procedures was men-tioned by Bellman and Dreyfus (1962), and then developed by Mayne (1966), Dyer and McReynolds (1970), Jacobson and Mayne (1970), Ohno (1978), Larson and Korsak (1970). For the learning process of the neural networks, this method-ology was introduced by the present author (Krawczak 1999b, 2000a, 2000b, 2001a, 2001b).
The differential dynamic programming method is a successive approximation technique. The procedure is initiated with some nonoptimal control W
[
0,L−1]
, called the nominal control, which generates the nominal trajectory X ,[ ]
0 Lthrough the recursive formula (5.6). Within each iteration, a successor control
[
0, −1]
′ L
W is determined, which in result generates the performance index E
( )
W′of a lower value than E
( )
W . For the differentiable return functions V(
X,W)
it ispossible to derive a function V
(
X,W)
consisting of a linear or quadratic part ofthe Taylor series expansion V
(
X′,W′)
of V(
X,W)
, the Taylor expansion being done about the nominal trajectory and control.Respectively, we deal with the first order differential dynamic programming and the second order differential dynamic programming.
5.4.1 First Order Differential Dynamic Programming
The method is based on the first order expansion of the return function
( ) [ ]
where η is some positive constant (in the neural network field called the learning parameter), we obtain the return function
( ) [ ]
Instead of consideration of the return function for the whole system
( ) [ ]
(
X 0 ,W 0,L−1)
V , the backward properties (5.22) can be applied for the ap-proximation of the return functions related to each stage.
For any i<l the gradient of the return function
that can be written as
This property is obvious because control W
( )
l cannot have any influence on the states X( )
i for i<l.By considering the partial derivatives of the return function, described by Equ.
5.22, with respect to the state X
( )
lEqu. 5.26 and 5.27 can be rewritten in a shorter form
( )
l F( ) ( ) ( )
l V l F lUsing Equ. 5.26 and 5.27, the gradients of the return functions, required for Equ.
5.23 in order to derive the first order approximation, can be computed as a se-quence of the equations performed from the last stage to the inputs.
The first order differential dynamic programming algorithm can be formulated
The first order differential dynamic programming algorithm can be formulated