Contents lists available atScienceDirect
Automatica
journal homepage:www.elsevier.com/locate/automatica
Adaptive critic motion control design of autonomous wheeled mobile robot by
dual heuristic programming
IWei-Song Lin
∗,
Ping-Chieh Yang
Department of Electrical Engineering, National Taiwan University, Taiwan
a r t i c l e i n f o
Article history:
Received 21 January 2007 Received in revised form 2 September 2007 Accepted 19 March 2008 Available online 10 October 2008 Keywords:
Adaptive critic
Approximate dynamic programming Mobile robot
Neural networks
a b s t r a c t
Autonomous wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis or by domain expert to build control rules. This paper presents an adaptive critic motion control design, which enables WMR to autonomously generate the control ability by learning through trials. The design consists of an adaptive critic velocity control loop and a self-learning posture control loop. The neural networks in the velocity neuro-controller (VNC) are corrected with the dual heuristic programming (DHP) adaptive critic method. Designer simply expresses the control objective by specifying the primary utility function then VNC will attempt to fulfill it through incremental optimization. The posture neuro-controller (PNC) learns by approximating the specialized inverse velocity model of WMR so as to map planned positions to suitable velocity commands. Supervised drive supplies variant velocity commands for PNC and VNC to set up their neural weights. During autonomous drive, while PNC halts learning VNC keeps on correcting its neural weights to optimize the control performance. The proposed design is evaluated on an experimental WMR. The results show that the DHP adaptive critic design is a useful base of autonomous control.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Autonomous wheeled mobile robots (WMR) rely on using sen-sors to percept their surroundings and use a motion controller to drive automatically (Chen & Redmill, 2004;Maurette,2003;
Mey-rowitz,Blidberg, & Michelson, 1996). In the motion control, WMR should be capable of performing trajectory tracking, path following and stabilization. However, WMR is a nonholonomic dynamic sys-tem with intrinsic nonlinearity, and commonly with unmodeled disturbance and unstructured, unmodeled dynamics (Greenwood, 1988). Unless its mass is negligible (Lee, Leung, & Tam, 1999), the motion control should deal with the complex dynamics (Bloch, Reyhanoglu, & McClamroch, 1992;Kanayama, Kimura, Miyazaki, & Noguchi, 1990). Conventionally, this control design relies on en-gineers to analyze the WMR system so as to synthesize the ap-propriate controller (Colbaugh, Barany, & Glass, 1998;Park, Cho, & Lee, 2001;Tsai, Wu, Chang, & Wang, 2002). But usually diffi-culties arise from absence of accurate WMR model. Fuzzy control
I This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Jae-Bok Song under the direction of Editor Mituhiko Araki.
∗Corresponding author. Tel.: +886 2 33663638; fax: +886 2 23638247.
E-mail addresses:[email protected](W.-S. Lin), [email protected](P.-C. Yang).
design may skip building the model but needs domain expert to construct the fuzzy rules (Lee, Adams, & Ryoo, 1997;Pawlowski, Kozlowski, & Wroblewski, 2001). Controllers based on neural net-works or neuro-fuzzy netnet-works may construct the control func-tion by learning training samples (Fierro & Lewis, 1998; Jang & Sun, 1995;Narendra & Parthasathy, 1990). But preparing appro-priate training samples usually needs an existing controller (Gu & Hu, 2002). Alternatively, the adaptive critic motion control design presented in this paper enables WMR to develop the control abil-ity autonomously. Neither domain expert to build control rules nor existing controller to generate training samples is required.
In our laboratory, an experimental WMR has been developed and its mathematical model has been formulated and identified (Lin, Huang, Chuang, & Liu, 2004). A hierarchical fuzzy control system has been implemented and shown able to conduct the motion of WMR (Lin, Huang, & Chuang, 2005). Furthermore, the experimental WMR has been equipped with a stereovision module to enable autonomous path finding and collision avoidance (Lin, Chuang, & Tien, 2005). This paper assumes the stereovision module foresees nearest path and the WMR system must generate the motion control function entirely through learning by trials. Essentially, this extends the definition of autonomous robot to autonomous development of the control ability. The idea is to obtain the control ability by learning through trials to fulfill the control objective. Neural networks are chosen as the basic learning model. Trials, actually supervised trials for the sake of safety,
0005-1098/$ – see front matter©2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2008.03.029
Fig. 1. A schematic top view of the mobile platform.
supply training inputs to set up the neural weights. Eventually, the motion control function is built without reference to any existing controller.
The dual heuristic programming (DHP) adaptive critic tech-nique (Prokhorov & Wunsch, 1997; Werbos, 1992), which ap-proximates dynamic programming, is invoked to develop the learning mechanism. Multilayered perceptrons (MLP) are used to construct the posture neuro-controller (PNC) and the velocity neuro-controller (VNC). PNC learns to map planned positions to suitable velocity commands. VNC learns to conduct the WMR mo-tion so as to track the velocity commands. Supervised drive of WMR in variant velocities supplies training inputs for PNC and VNC to set up their neural weights. During autonomous drive, while PNC halts learning, VNC is corrected to optimize the control per-formance. The proposed design is successfully evaluated on the experimental WMR. The principal contribution is to develop an au-tonomous control design scheme for mobile robots based on the DHP adaptive critic method.
This paper is organized as follows: Section 2 illustrates the architecture of the adaptive critic motion control system of WMR. Section3presents the design of the DHP adaptive critic motion controller. Section4validates the proposed design on the experimental WMR. Section5is the conclusion.
2. Architecture of adaptive critic motion control system of WMR
The interested autonomous WMR has a four-wheeled mecha-nism as shown inFig. 1andTable 1. While the front wheels are passive, the rear wheels are motorized independently to give the differential rotation configuration. Such a WMR with stereovision module has been assembled in our laboratory for studying naviga-tion and monaviga-tion control (Lin et al.,2004;Lin,Huang et al.,2005;
Lin,Chuang et al.,2005). The experimental WMR is completely au-tonomous because data are elaborated without any external aid, and its sensors are the encoders attached to the motorized wheels and the stereovision module to find the path.
Using Lagrange formalism, the dynamical model of WMR is obtained as (Lin et al.,2004;Yun & Yamamoto, 1993)
M
(
q)˙
R+
C(
q, ˙
q)
R+
F(˙
q) +
G(
q) + τ
d=
B(
q)
u (1)where q
= [
x,
y, θ, φ
l, φ
r]
Tis the generalized coordinate vector tocharacterize WMR, R
= [
v, ω]
Tin whichv
is the linear velocityand
ω
is the angular velocity, u= [
τ
l, τ
r]
Tare the input torquesw .
Im=0.002 kg m2 Inertia of single motorized wheel and rotor set about a
diameter
v Linear velocity of WMR
ω Angular velocity of WMR
θ Orientation of WMR
˙
ϕl Angular velocity of left motorized wheel
˙
ϕr Angular velocity of right motorized wheel
generated by the left and right motors. The parameter matrices in
(1)are M
(
q) =
m+
2Iw r2 0 0 I+
2b 2I w r2
,
C(
q, ˙
q) =
0− ˙
θ
mcd˙
θ
mcd 0,
B(
q) =
2 r 2 r−
2b r 2b r
where m
=
mc+
2mw, and F(˙
q)
, G(
q)
andτ
dare unknown termscorresponding to frictional, gravitational and disturbed forces, respectively. To conduct the WMR motion needs implementing velocity and trajectory tracking control. Hierarchical fuzzy control was shown a feasible approach (Lin, Chuang et al., 2005), but needs domain experts to construct the fuzzy rules. Alternatively, this paper seeks to build the motion control function entirely through learning by trials. The innovative design is called the adaptive critic motion control system, which consists of mainly a self-learning posture control loop and an adaptive critic velocity control loop.
Fig. 2illustrates the design concept. The stereovision module finds a forward path. According to the forward path, feedback positions and physical limitations of WMR, the path planner calculates the planed positions. PNC which approximates the specialized inverse velocity model (Narendra & Parthasathy, 1990) of WMR maps planned positions to suitable velocity commands. Actually, VNC is a DHP adaptive critic design which invokes incremental optimization to generate the ability of velocity control through learning. Learning begins with supervised drive to set up the neural weights in VNC and PNC. Hence, the supervised drive should excite the WMR dynamics sufficiently in the interested working domain so that the learning would be complete. During autonomous drive, while PNC halts learning VNC is corrected successively to optimize the control performance.
3. Design of DHP adaptive critic motion controller 3.1. Adaptive critic velocity neuro-controller
Adaptive critic methods are usually practiced with model-based learning structures such as neural or neuro-fuzzy networks. They have common roots as generalizations of dynamic program-ming for neural reinforcement learning approaches and have a capability of optimization over time under conditions of noise,
Fig. 2. Architecture of the adaptive critic motion control system.
Fig. 3. Architecture of the DHP adaptive critic velocity neuro-controller. The solid
lines indicate signal paths, the dashed lines indicate data paths, and the round rectangular blocks represent neural networks.
uncertainty, and nonlinearity (Werbos,1992,2004). Heuristic dy-namic programming (HDP), dual heuristic programming (DHP), and globalized dual heuristic programming (GDHP), and their ac-tion dependent companions are the main categories of adaptive critic designs (Prokhorov & Wunsch, 1997). They can be differ-entiated by the critic output. HDP uses the critic to estimate the value function in the Bellman equation of dynamic programming. In DHP, the critic approximates the derivative value function to fa-cilitate the computation in the gradient correcting rule. The critic in GDHP estimates both the value function and its derivatives. DHP was shown to have a superior performance to HDP and no observ-able improved performance by GDHP (Lendaris & Shannon, 1998;
Prokhorov, Santiago, & Wunsch, 1995). In addition, incremental optimization based on dynamic programming is rigorous in theory. Stability of a trained DHP adaptive critic control system is governed by the optimal control theory in the sense of dynamic program-ming (Bertsekas, 2005).
As illustrated inFig. 3, VNC contains blocks called the action network, critic network, shadow critic network, plant model and primary utility. The action network is responsible for producing suitable control signals while the critic and shadow critic networks form the adaptive critic to critique the action performance. The plant model can be either mathematical formulations or neural approximation of the WMR dynamics.
3.1.1. Neural computing of VNC
The action, critic and shadow critic networks are each implemented with three-layer perceptrons (Haykin, 1999). These
Fig. 4. Architecture of the three-layer perceptrons.
neural networks have the common architecture as shown inFig. 4. In the neural architecture, each hidden neuron has a hyperbolic tangent activation function to obtain output as
¯
yj(
n) =
a tanh b IX
i=0w
ji(
n)¯
xi(
n)
!
,
x¯
0(
n) =
1,
(
a,
b) >
0 (2)where n denotes time sequence. Each output neuron has a linear activation function to obtain output as
¯
zk(
n) =
c JX
j=0w
kj(
n)¯
yj(
n),
y¯
0(
n) =
1,
c>
0.
(3)The partial derivatives pertaining to the neural architecture are derived as follows:
∂ ¯
zk(
n)
∂w
kj(
n)
=
cy¯
j(
n)
(4)∂ ¯
zk(
n)
∂w
ji(
n)
=
bc aw
kj(
n)[
a− ¯
yj(
n)][
a+ ¯
yj(
n)]¯
xi(
n)
(5)∂ ¯
zk(
n)
∂¯
xi(
n)
=
JX
j=1 bc aw
kj(
n)[
a− ¯
yj(
n)][
a+ ¯
yj(
n)]w
ji(
n)
.
(6) Usually,(4)and(5)are called the sensitivity functions and(6)is called the Jacobian function. DHP adaptive critic design needs these quantities to evaluate the correcting rules.
3.1.2. Plant model and Jacobian quantities
InFig. 3, the plant model is used to predict the immediate future states and calculate certain partial derivatives pertaining to the plant. It can be either the mathematical model or neural approx-imation of the plant dynamics. Since DHP adaptive critic design al-lows using partial or qualitative plant model (Shannon, 1999) and the WMR model is known (Lin et al., 2004). The plant model in VNC is implemented with(1)but neglecting the unknown terms corre-sponding to the frictional, gravitational and disturbed forces. From
(1), the simplified model equations are derived as below.
˙
R
= −M
−1(
q)
C(
q, ˙
q)
R+
M−1(
q)
B(
q)
u.
(7) Rewrite(7)as the following nonlinear mappings:˙
Ri
=
fi(
R,
u),
i=
1,
2, . . . ,
S.
(8)Then for the operating point
(
Rn,
un)
at sampling time tn, thefirst-order approximation of(7)is obtained as
˙
(
n 1) = ¯
(
n)
(
n) + ¯
(
n)
(
n) + ¯
(
n)
(10) where A¯
(
n) =
eA(n)∆, B¯
(
n) = R
0∆eA(n)tdt B(
n)
, D¯
(
n) =
R
∆0 e
A(n)tdt D
(
n)
, and where∆represents sampling period. In VNC,the plant model uses(10)to predict the states and calculates the following Jacobian quantities
∂
Ri(
n+
1)
∂
Rj(
n)
= ¯
Aij(
n),
∂
Ri(
n+
1)
∂
uk(
n)
= ¯
Bik(
n).
(11)3.1.3. Correcting the action network
In Fig. 3, U
(
n)
is the primary utility function defined by according to the specific application context. Since the objective of VNC is to control WMR to track the velocity command as closely as possible, the primary utility function is defined asU
(
n) =
0.
25(v(
n) − v
d(
n))
2+
0.
25(ω(
n) − ω
d(
n))
2 (12)where
(v
d(
n), ω
d(
n))
is the velocity command. To achieve thecontrol objective, the neural weights in the action network must be corrected to minimize not only the present value but also the sum of all future values of U
(
n)
. According to dynamic programming (Bellman, 1957), this goal can be achieved by minimizing the secondary utility function, i.e. value function, expressed asJ
(
n) =
∞
X
k=0
η
kU(
n+
k) =
U(
n) + η
J(
n+
1)
(13)where
η
, 0< η ≤
1 is a discount factor. Thus, using the gradient descent method, a suitable correcting rule of the action network is1wkm
(
n)
=
α
∂
J(
n)
∂w
km(
n)
=
α
∂
J(
n)
∂
uk(
n)
∂
uk(
n)
∂w
km(
n)
=
α
∂
U(
n)
∂
uk(
n)
+
η
∂
J(
n+
1)
∂
uk(
n)
∂
uk(
n)
∂w
km(
n)
=
α
∂
U(
n)
∂
uk(
n)
|
{z
}
Utility+
η
X
sλ
◦ s(
n+
1)
|
{z
}
Critic∂
Rs(
n+
1)
∂
uk(
n)
|
{z
}
Model
∂
uk(
n)
∂w
km(
n)
|
{z
}
Action (14)where
α
is the learning rate andw
km(
n)
is the mth neural weightassociated with the kth output of the action.
3.1.4. Correcting the shadow critic network and the critic network
In(14),
λ
◦s(
n+
1) = ∂
J(
n+
1)/∂
Rs(
n+
1)
is unknown. DHPdesign embodies in estimating this quantity by the adaptive critic which is composed of the shadow critic and critic networks. They estimate the partial derivatives of the secondary utility function at present and immediate future sampling times as
λ
s(
n) =
∂
J(
n)
∂
Rs(
n)
,
s=
1,
2, . . . ,
S (shadow critic) (15)λ
◦ s(
n+
1) =
∂
J(
n+
1)
∂
Rs(
n+
1)
,
s=
1,
2, . . . ,
S (critic) (16)where K denotes the dimension of the control vector u
(
n)
. Since(12)shows U
(
n)
is independent of u(
n)
,(17)can be rewritten asλ
◦ s(
n) =
∂
U(
n)
∂
Rs(
n)
|
{z
}
utility+
η
SX
s0=1
λ
s0◦(
n+
1)
|
{z
}
Critic
∂
Rs0(
n+
1)
∂
Rs(
n)
|
{z
}
Model+
KX
k=1
∂
Rs0(
n+
1)
∂
uk(
n)
|
{z
}
Model∂
uk(
n)
∂
Rs(
n)
|
{z
}
Action
.
(18)In(18),
λ
s0◦(
n+
1)
is the output of the critic network,∂
Rs0(
n+
1
)/∂
Rs(
n)
and∂
Rs0(
n+
1)/∂
uk(
n)
are the Jacobian functions ofthe plant model,
∂
uk(
n)/∂
Rs(
n)
is the Jacobian function of theaction network, and U
(
n)
is a known function, therefore,λ
◦s(
n)
can be calculated. The adaptive critic in DHP learns by updating the shadow critic network so that
λ(
n)
tracksλ
◦(
n)
. Hence, anerror measure for correcting the shadow critic network can be formulated as E
(
n) =
0.
5X
sλ
s(
n) − λ
◦s(
n)
2.
(19)Then the gradient correcting rule is
1wsm
(
n) = β
∂
E(
n)
∂w
sm(
n)
=
β λ
s(
n) − λ
◦s(
n)
∂λ
s(
n)
∂w
sm(
n)
(20)where
β
is the learning rate andw
sm is the mth neural weightassociated with the sth output of the shadow critic network. The critic network duplicates the corresponding neural weights of the shadow critic network therefore no correcting rule is needed. But for learning convergence, duplication is made only for every several (typically five) sampling times.
3.2. Self-learning posture neuro-controller
PNC consists of two neural networks mentioned as the linear and angular PNC to map planned positions to linear and angular velocity commands. The self-learning mechanism is constructed by identifying the specialized inverse velocity model of WMR as shown in Fig. 5. For learning convergence, the specialized inverse velocity model and PNC are organized as standalone neural networks. The neural architecture is shown inFig. 4. The linear PNC has twelve inputs organized from two planned positions and five feedback positions as follows:
[
x(
n+
2) −
x(
n+
1),
y(
n+
2) −
y(
n+
1),
x(
n+
1) −
x(
n),
y(
n+
1) −
y(
n),
x(
n−
1) −
x(
n),
y(
n−
1) −
y(
n),
x
(
n−
2) −
x(
n−
1),
y(
n−
2) −
y(
n−
1),
x(
n−
3)
−
x(
n−
2),
y(
n−
3) −
y(
n−
2),
x(
n−
4) −
x(
n−
3),
Fig. 5. Scheme of learning the specialized inverse velocity model.
Actually,(21)contains multi-step displacements to imply the velocity, acceleration and jerk for PNC to determine the outputs. The output of the linear PNC is the linear velocity command
v
d(
n)
and
v
d(
n+
1)
, wherev
d(
n)
is active andv
d(
n+
1)
is dummy.Similarly, the angular PNC has eighteen inputs organized as below
[
x(
n+
2) −
x(
n+
1),
y(
n+
2) −
y(
n+
1),
θ(
n+
2) − θ(
n+
1),
x(
n+
1) −
x(
n),
y(
n+
1) −
y(
n), θ(
n+
1) − θ(
n),
x(
n−
1) −
x(
n),
y(
n−
1) −
y(
n), θ(
n−
1) − θ(
n)
x(
n−
2) −
x(
n−
1),
y(
n−
2) −
y(
n−
1), θ(
n−
2)
−
θ(
n−
1),
x(
n−
3) −
x(
n−
2),
y(
n−
3) −
y(
n−
2),
θ(
n−
3) − θ(
n−
2),
x(
n−
4) −
x(
n−
3),
y(
n−
4) −
y(
n−
3),
θ(
n−
4) − θ(
n−
3)]
T.
(22)The output of the angular PNC is the angular velocity command
ω
d(
n)
andω
d(
n+
1)
, whereω
d(
n)
is active andω
d(
n+
1)
is dummy. Fig. 5shows the scheme of learning the specialized inverse velocity model. The specialized inverse, which is not necessarily the complete inverse, covers simply the working domain excited by supervised drive. Therefore, no singularity of WMR would be encountered. Certainly, supervised drive must supply rich enough, safe velocity commands to encompass the working domain requested in autonomous drive. At the end of supervised drive, PNC duplicates the neural weights in the specialized inverse velocity model. During autonomous drive, the neural weights in PNC are kept constant so that VNC can incrementally optimize the velocity control.Backpropagation with Levenberg Marquardt algorithm (LM) (Wilamowski, 2003) is used to correct the specialized inverse velocity model. Define the error of the velocity inferred by the specialized inverse velocity model as
e
(
n)
= [
(v
d(
n−
1) − v
infer(
n−
1)), (v
d(
n−
2) − v
infer(
n−
2)),
(ω
d(
n−
1) − ω
infer(
n−
1)), (ω
d(
n−
2) − ω
infer(
n−
2))]
T.
(23) For the usage of the LM algorithm, e
(
n)
are collected for m sampling times. Then the error measure is constructed asε(
W) =
0.
5ETE (24)where E
= [e
T(
n),
eT(
n−
1), . . . ,
eT(
n−
m+
1)]
T. Then the neural weights in the specialized inverse velocity model are corrected asWk+1
=
Wk− [G
TG+
ξ
I]−1GTE (25)where G
= ∇
(ε(
W))
and I is the identity matrix. When using(25), the scalarξ
is decreased after each successful step, i.e. reduction inFig. 6. Planning a smooth path with the arc-line algorithm.
the error measure, and increased only when a tentative step would increase the error measure. This provides a switching capability between the Gauss–Newton algorithm and the steepest descent method.
3.3. Path planner
The stereovision module is responsible to locate the target and find a collision-free path. According to the viewed path and considering the physical limitations of WMR, the path planner plans a feasible, smooth path and calculates planned positions for next two steps. The arc-line algorithm (Nelson, 1989) is used to smooth the path. As illustrated inFig. 6, this algorithm replaces the line segments around the intersection of two straight lines with a smooth curve. First, the start point S
(
xs,
ys, θ
s)
on the firstline, the end point E
(
xe,
ye, θ
e)
on the second line, the intersectionpoint I
(
xi,
yi, θ
i)
of these two lines, and the angle(φ
d=
θ
i−
θ
e)
between these two lines are found. Then a value of curvature
(γ )
is assigned to find the transition point T(
xt,
yt, θ
t)
on the firstline, the distance
γ
tan(φ
d/
2)
to the intersection point, and thecenter point C
(
xc,
yc)
. Finally, the original straight line segmentsare replaced by the arc starts at point T .
As shown in Fig. 7, denote the physical limitations of WMR on maximum displacement and steering-angle as dmaxand
φ
max.By constructing a displacement vector from present position
(
xp,
yp, θ
p)
to a target position(
xb,
yb)
selected on the plannedpath, the desired displacement dpand steering angle
φ
pcan bedetermined. Then the planed linear and angular positions are calculated as
xp
(
n+
1) =
xp(
n) +
dpcos(φ
p+
θ
p)
yp(
n+
1) =
yp(
n) +
dpsin(φ
p+
θ
p)
θ
p(
n+
1) = θ
p(
n) + φ
p.
(26)
When they violate the physical limitations, the maximum allowable values are used.
4. Validation of DHP adaptive critic motion control design
In the following validation, VNC of the experimental WMR is implemented as below. The action network has four inputs
[
v(
n), ω(
n), v
d(
n), ω
d(
n)]
T and two outputs corresponding towheel’s driving torques
[
τ
l(
n), τ
r(
n)]
T. The shadow critic networkhas four inputs and two outputs denoted by
[
v(
n), ω(
n), v
d(
n),
ω
d(
n)]
T and[
λ
1(
n), λ
2(
n)]
T, respectively. The critic network isa duplicate of the shadow critic network except the inputs and outputs are
[
v(
n+
1), ω(
n+
1), v
d(
n), ω
d(
n)]
T and[
λ
◦1(
n+
1
), λ
◦2
(
n+
1)]
T, respectively. The number of hidden neurons in eachneural network is chosen by experience. The parameter values in the activation functions of (2) and (3) are a
,
b,
c=
1. All neural weights in the action, critic and shadow critic networks are initialized with values chosen randomly in the range[−
0.
1,
0.
1]
.Fig. 7. Planning a feasible position.
The validation begins with supervised drive and followed by autonomous drive. Supervised drive supplies 1000 sets of velocity commands calculated from the following equations:
v
d(
i) =
0.
5 cos 6π(
i−
1)
1000+
π
+
1 e−0.001i,
i=
1,
2, . . . ,
1000w
d(
i) =
cos 6π(
i−
1)
1000+
π
+
1 sin i−
1 40,
i=
1,
2, . . . ,
1000.
(27)These velocity commands are fed sequentially into VNC to train the neural networks for 500 cycles. It should be noticed that supervised drive is responsible to supply rich enough, safe velocity commands without the corresponding control torques. Here rich enough velocity commands mean all possibilities covering the working domain requested in autonomous drive. Therefore, each training cycle is actually a trial to generate appropriate control torques by optimizing the secondary utility function. In the mean time of each trial, the neural weights in VNC and PNC are corrected. Hence, PNC is simply equivalent to the specialized inverse velocity model excited by supervised drive. After finishing 500 training cycles, the performance of VNC and PNC is examined. Finally, the trained WMR system is turned into autonomous drive and tested by tracking a right-turn path and a decaying sinusoidal path.
4.1. Performance of VNC
Figs. 8aand8bcompare the actual linear velocity with desired value in the 50th and 500th training cycles. The velocity error in the 500th cycle is significantly smaller than that of in the 50th cycle. The result in the angular velocity is similar but not presented. After finishing the 500 training cycles, the WMR system is commanded to track the following velocity pattern
v
t2(
i) =
0.
002i 1≤
i≤
300 0.
6 300<
i≤
600 1 600<
i≤
700 3.
1−
0.
003i 700<
i≤
900 0.
4 900<
i≤
1000 (28)w
t2(
i) =
−
0.
002i 1≤
i≤
300−
0.
6 300<
i≤
600−
1 600<
i≤
700 0.
003i−
3.
1 700<
i≤
900−
0.
4 900<
i≤
1000.
Figs. 9a and 9b shows both linear and angular velocity tracking are accurate. Apparently, the DHP adaptive critic learning algorithm converges and appropriate VNC is obtained.
Fig. 8a. Result of linear velocity tracking in the 50th cycle.
Fig. 8b. Result of linear velocity tracking in the 500th cycle.
4.2. Performance of PNC
The trained WMR system is turned into autonomous drive and commanded to track a sequence of positions calculated with
y
(
i) =
cos 33π
1000x(
i)
,
i=
1,
2, . . . ,
1000.
(29) During autonomous drive, the outputs of PNC are recorded. The recorded values corresponding to the linear and angular velocities are presented as the dashed curve (actual) inFigs. 10aand10b. The solid curve (desired) is obtained by using the fuzzy posture controller designed byLin, Huang et al.(2005). Both curves are close to each other. Obviously, PNC performs as well as the fuzzy posture controller. The difference is that while the fuzzy posture controller was built by domain expert, PNC is obtained entirely by machine learning.4.3. Performance of the trained WMR system
The trained WMR system is turned into autonomous drive and commanded to track a right-turn path and then a decaying sinusoidal path. The limitations on the posture control are dmax
=
Fig. 9a. Result of linear velocity tracking.
Fig. 9b. Result of angular velocity tracking.
Fig. 10a. Comparing the linear PNC output (actual) with that of using the fuzzy
posture controller (desired).
Case 1: Tracking a right-turn path.
Fig. 11 compares the results of tracking a right-turn path without and with the arc-line algorithm. Without the arc-line algorithm, the WMR system makes a nice right-turn as the dotted curve shown inFig. 11. But due to PNC has no knowledge of the
Fig. 10b. Comparing the angular PNC output (actual) with that of using the fuzzy
posture controller (desired).
Fig. 11. Results of tracking a right-turn path with and without using the arc-line algorithm.
right-turn until it occurs. Large overshoot is found around the right-turn. On the other hand, when the arc-line algorithm with curvature
γ =
1.
1 is involved, the dashed curve inFig. 11shows the overshoot disappears. This means the stereovision and path planner enable shaping the tracking path.Case 2: Tracking a decaying sinusoidal path.
Fig. 12shows the result of tracking a decaying sinusoidal path described by y
(
i) =
1.
1 sin 7π
26x(
i)
e−0.35x(i),
i=
1,
2, . . . ,
1000.
(30) The dashed curve shows the path tracking is accurate. The dotted curve is obtained by using the fuzzy posture controller (Lin, Huang et al., 2005) instead of PNC. It seems because of optimization the DHP adaptive critic motion control design performs better than that built by domain expert.5. Conclusion
The DHP adaptive critic motion control design unveiled au-tonomous development of control ability. Eventually, it minimized the engineering task in analyzing and synthesizing the system dy-namics to obtain an appropriate controller. Detailed formulations
Fig. 12. Result of tracking a decaying sinusoidal path.
of the DHP adaptive critic motion control design were presented and explained. VNC corrected the neural weights by incremental optimization while PNC learned by approximating the specialized inverse velocity system. Simply the primary utility function was required to define the control objective. Neither existing controller nor representative training samples nor control rules built by do-main experts was required. The proposed design was evaluated on the experimental WMR and successful results were obtained.
Acknowledgments
The authors would like to thank the editor, the associate editor and the reviewers for their valuable comments. The authors gratefully acknowledge National Science Council of Taiwan on grant NSC94-2213-E-002-049 and National Taiwan University on grant NTU 95R0036-07.
References
Bellman, R. E. (1957). Dynamic programming. Princeton Univ. Press.
Bertsekas, D. P. (2005). Dynamic programming and optimal control: Vol. I (3rd ed.). Belmont, MA: Athena Scientific, pp. 369–373.
Bloch, A. M., Reyhanoglu, M., & McClamroch, N. H. (1992). Control and stabilization of nonholonomic dynamic systems. IEEE Transactions on Automatic Control, 37, 1746–1757.
Chen, Q., & Redmill, K. (2004). Ohio state university at the 2004 DARPA grand challenge: Developing a completely autonomous vehicle. IEEE Intelligent Systems, (Sept./Oct.), 8–11.
Colbaugh, R., Barany, E., & Glass, K. (1998). Adaptive control of nonholonomic robotic systems. Journal of Robotic Systems, 15(16), 365–393.
Fierro, R., & Lewis, F. L. (1998). Control of a nonholonomic mobile robot using neural networks. IEEE Transactions on Neural Networks, 9(13), 589–600.
Greenwood, D. T. (1988). Principles of dynamics. Prentice Hall.
Gu, D., & Hu, H. (2002). Neural predictive control for a car-like mobile robot. International Journal of Robotics and Autonomous Systems, 39(2–3), 1–15. Haykin, S. (1999). Neural networks a comprehensive foundation (2nd ed.). Prentice
Hall International, Inc, pp. 156–169.
Jang, J. S., & Sun, C. T. (1995). Neuro-fuzzy modeling and control. Proceedings of the IEEE, 83(3), 378–406.
Kanayama, Y., Kimura, Y., Miyazaki, F., & Noguchi, T. (1990). A stable tracking control method for an autonomous mobile robot. Proceedings of IEEE International Conference on Robotics and Automation, 1, 384–389.
Lee, S., Adams, T. M., & Ryoo, B. (1997). A fuzzy navigation system for mobile construction robots. Automation in Construction, 6, 97–107.
Lee, T. H., Leung, F. H. F., & Tam, P. K. S. (1999). Position control for wheeled mobile robots using a fuzzy logic controller. In Proceedings of the 25th annual conference of the IEEE industrial electronics society 2 (pp. 525–528).
Lendaris, G. G., & Shannon, T. T. (1998). Application considerations for the DHP methodology. In Proceedings of the international joint conference on neural networks (pp. 1013–1018). Anchorage: IEEE Press.
Lin, W. S., Huang, C. L., Chuang, M. K., & Liu, G. C. (2004). Modeling a wheeled mobile robot for autonomous navigation design. In IASTED international conference on modeling, identification and control (pp. 275–280).
Transactions on Industrial Electronics, 36(12), 330–337.
Park, K. H., Cho, S. B., & Lee, Y. W. (2001). Optimal tracking control of a nonholonomic mobile robot. In Proceedings ISIE 2001 IEEE international symposium on industrial electronics: Vol. 3 (pp. 2073–2076).
Pawlowski, S., Kozlowski, K., & Wroblewski, W. (2001). Fuzzy logic implementation in mobile robot control. In Proceedings of the second international workshop on robot motion and control (pp. 65–70).
Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007.
Prokhorov, D., Santiago, R., & Wunsch, D. (1995). Adaptive critic designs: A case study for neuro-control. Neural Networks, 8, 1367–1372.
Shannon, T. T. (1999). Partial, noisy and qualitative models for adaptive critic-based neuro-control. In Proceedings of international conference on neural networks. (pp. 2271–2275).
Tsai, P. S., Wu, T. F., Chang, F. R., & Wang, L. S. (2002). Tracking control of nonholonomic mobile robot using hybrid structure. In The 6th world multiconference on systemics, cybernetics and informatics (presented). Werbos, P. (1992). Approximate dynamic programming for real-time control and
neural modeling. In White, & Sofge (Eds.), Handbook of intelligent control (pp. 493–525). New York: Van Nostrand Reinhold.
Werbos, P. (2004). ADP: Goals, opportunities and principles. In J. Si, A. G. Barto, W. B. Powell, & D. Wunsch, II (Eds.), IEEE press series on computational intelligence, Handbook of learning and approximate dynamic programming (pp. 3–44). Wilamowski, B. M. (2003). Neural network architectures and learning. In
Proceedings of IEEE international conference on industrial technology, 1.1 (pp. 10–12).
Yun, X., & Yamamoto, Y. (1993). Internal dynamics of a wheeled mobile robot. In Proceeding of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1288–1293).
Wei-Song Lin is, for seventeen times, the recipient of the
National Science Council Awards for exceptional achieve-ment in research. From 1996 to 2002, he led the sensor calibration team of Ocean Color Imager aboard Formosa-1 satellite, the first scientific satellite of Taiwan, and re-ceived Success Award from National Space Program Of-fice. In 2001, he received Teaching Award from Ministry of Education of Taiwan for contribution to engineering ed-ucation. As a consultant to Taipower Company, he con-tributed to computerized instrumentation and control of the fourth nuclear power plant of Taiwan. In collabora-tion with his colleagues, he won the Best Paper Award in the Ninth Conference on Image Processing and Pattern Recognition in 1996. He is a subject of 2006 Who’s Who in Science and Engineering, 2007 Who’s Who in Asia, and 2008 Who’s Who in the World. He received the M.S. degree in electrical engineering from National Cheng Kung University in 1975, and the Ph.D. degree in electrical engineering from National Taiwan University in 1982. He began his career with Chunghwa Tele-com Laboratories to develop package switching network. He pioneered in micro-processor education with Chunghwa Telecom Training Institute in 1979. He cur-rently holds a Professor position with the Department of Electrical Engineering of National Taiwan University. His research interests include autonomous control; embedded computing controller design; neural-fuzzy systems; the use of ap-proximate dynamic programming in control; active safety control of by-wire electrical vehicle; energy management of fuel-cell powered vehicle; the use of com-putational stereo in surveillance and navigation; and multi-spectral electromag-netic sensing.
Ping-Chieh Yang was born in Taipei City in 1981.
He received the bachelor degree in power mechanical engineering from National Tsing Hua University in 2003, and the M.S. degree in electrical engineering from National Taiwan University in 2005. His research interests are autonomous mobile robot and the use of approximate dynamic programming. He currently holds an engineer position with National Instruments Company.