Adaptive critic motion control design of autonomous wheeled mobile robot by dual heuristic programming

(1)

Contents lists available atScienceDirect

Automatica

journal homepage:www.elsevier.com/locate/automatica

Adaptive critic motion control design of autonomous wheeled mobile robot by

dual heuristic programming

I

Wei-Song Lin

∗

,

Ping-Chieh Yang

Department of Electrical Engineering, National Taiwan University, Taiwan

a r t i c l e i n f o

Article history:

Received 21 January 2007 Received in revised form 2 September 2007 Accepted 19 March 2008 Available online 10 October 2008 Keywords:

Adaptive critic

Approximate dynamic programming Mobile robot

Neural networks

a b s t r a c t

Autonomous wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis or by domain expert to build control rules. This paper presents an adaptive critic motion control design, which enables WMR to autonomously generate the control ability by learning through trials. The design consists of an adaptive critic velocity control loop and a self-learning posture control loop. The neural networks in the velocity neuro-controller (VNC) are corrected with the dual heuristic programming (DHP) adaptive critic method. Designer simply expresses the control objective by specifying the primary utility function then VNC will attempt to fulfill it through incremental optimization. The posture neuro-controller (PNC) learns by approximating the specialized inverse velocity model of WMR so as to map planned positions to suitable velocity commands. Supervised drive supplies variant velocity commands for PNC and VNC to set up their neural weights. During autonomous drive, while PNC halts learning VNC keeps on correcting its neural weights to optimize the control performance. The proposed design is evaluated on an experimental WMR. The results show that the DHP adaptive critic design is a useful base of autonomous control.

1. Introduction

Autonomous wheeled mobile robots (WMR) rely on using sen-sors to percept their surroundings and use a motion controller to drive automatically (Chen & Redmill, 2004;Maurette,2003;

Mey-rowitz,Blidberg, & Michelson, 1996). In the motion control, WMR should be capable of performing trajectory tracking, path following and stabilization. However, WMR is a nonholonomic dynamic sys-tem with intrinsic nonlinearity, and commonly with unmodeled disturbance and unstructured, unmodeled dynamics (Greenwood, 1988). Unless its mass is negligible (Lee, Leung, & Tam, 1999), the motion control should deal with the complex dynamics (Bloch, Reyhanoglu, & McClamroch, 1992;Kanayama, Kimura, Miyazaki, & Noguchi, 1990). Conventionally, this control design relies on en-gineers to analyze the WMR system so as to synthesize the ap-propriate controller (Colbaugh, Barany, & Glass, 1998;Park, Cho, & Lee, 2001;Tsai, Wu, Chang, & Wang, 2002). But usually diffi-culties arise from absence of accurate WMR model. Fuzzy control

I _{This paper was not presented at any IFAC meeting. This paper was} recommended for publication in revised form by Associate Editor Jae-Bok Song under the direction of Editor Mituhiko Araki.

∗_{Corresponding author. Tel.: +886 2 33663638; fax: +886 2 23638247.}

E-mail addresses:[email protected](W.-S. Lin), [email protected](P.-C. Yang).

design may skip building the model but needs domain expert to construct the fuzzy rules (Lee, Adams, & Ryoo, 1997;Pawlowski, Kozlowski, & Wroblewski, 2001). Controllers based on neural net-works or neuro-fuzzy netnet-works may construct the control func-tion by learning training samples (Fierro & Lewis, 1998; Jang & Sun, 1995;Narendra & Parthasathy, 1990). But preparing appro-priate training samples usually needs an existing controller (Gu & Hu, 2002). Alternatively, the adaptive critic motion control design presented in this paper enables WMR to develop the control abil-ity autonomously. Neither domain expert to build control rules nor existing controller to generate training samples is required.

In our laboratory, an experimental WMR has been developed and its mathematical model has been formulated and identified (Lin, Huang, Chuang, & Liu, 2004). A hierarchical fuzzy control system has been implemented and shown able to conduct the motion of WMR (Lin, Huang, & Chuang, 2005). Furthermore, the experimental WMR has been equipped with a stereovision module to enable autonomous path finding and collision avoidance (Lin, Chuang, & Tien, 2005). This paper assumes the stereovision module foresees nearest path and the WMR system must generate the motion control function entirely through learning by trials. Essentially, this extends the definition of autonomous robot to autonomous development of the control ability. The idea is to obtain the control ability by learning through trials to fulfill the control objective. Neural networks are chosen as the basic learning model. Trials, actually supervised trials for the sake of safety,

(2)

Fig. 1. A schematic top view of the mobile platform.

supply training inputs to set up the neural weights. Eventually, the motion control function is built without reference to any existing controller.

The dual heuristic programming (DHP) adaptive critic tech-nique (Prokhorov & Wunsch, 1997; Werbos, 1992), which ap-proximates dynamic programming, is invoked to develop the learning mechanism. Multilayered perceptrons (MLP) are used to construct the posture neuro-controller (PNC) and the velocity neuro-controller (VNC). PNC learns to map planned positions to suitable velocity commands. VNC learns to conduct the WMR mo-tion so as to track the velocity commands. Supervised drive of WMR in variant velocities supplies training inputs for PNC and VNC to set up their neural weights. During autonomous drive, while PNC halts learning, VNC is corrected to optimize the control per-formance. The proposed design is successfully evaluated on the experimental WMR. The principal contribution is to develop an au-tonomous control design scheme for mobile robots based on the DHP adaptive critic method.

This paper is organized as follows: Section 2 illustrates the architecture of the adaptive critic motion control system of WMR. Section3presents the design of the DHP adaptive critic motion controller. Section4validates the proposed design on the experimental WMR. Section5is the conclusion.

2. Architecture of adaptive critic motion control system of WMR

The interested autonomous WMR has a four-wheeled mecha-nism as shown inFig. 1andTable 1. While the front wheels are passive, the rear wheels are motorized independently to give the differential rotation configuration. Such a WMR with stereovision module has been assembled in our laboratory for studying naviga-tion and monaviga-tion control (Lin et al.,2004;Lin,Huang et al.,2005;

Lin,Chuang et al.,2005). The experimental WMR is completely au-tonomous because data are elaborated without any external aid, and its sensors are the encoders attached to the motorized wheels and the stereovision module to find the path.

Using Lagrange formalism, the dynamical model of WMR is obtained as (Lin et al.,2004;Yun & Yamamoto, 1993)

M

(

q

)˙

R

+

C

(

q

, ˙

q

)

R

+

F

(˙

q

) +

G

(

q

) + τ

d

=

B

(

q

)

u (1)

where q

= [

x

,

y

, θ, φ

l

, φ

r

]

Tis the generalized coordinate vector to

characterize WMR, R

= [

v, ω]

T_{in which}

_v

_{is the linear velocity}

and

ω

is the angular velocity, u

= [

τ

_l

, τ

_r

]

T_{are the input torques}

w .

Im=0.002 kg m2 Inertia of single motorized wheel and rotor set about a

diameter

v Linear velocity of WMR

ω Angular velocity of WMR

θ Orientation of WMR

˙

ϕl Angular velocity of left motorized wheel

˙

ϕr Angular velocity of right motorized wheel

generated by the left and right motors. The parameter matrices in

(1)are M

(

q

) =







m

+

2Iw r2 0 0 I

+

2b 2_I w r2





 ,

C

(

q

, ˙

q

) =

0

− ˙

θ

mcd

˙

θ

mcd 0

,

B

(

q

) =







2 r 2 r

−

2b r 2b r







where m

=

mc

+

2mw, and F

(˙

q

)

, G

(

q

)

and

τ

dare unknown terms

corresponding to frictional, gravitational and disturbed forces, respectively. To conduct the WMR motion needs implementing velocity and trajectory tracking control. Hierarchical fuzzy control was shown a feasible approach (Lin, Chuang et al., 2005), but needs domain experts to construct the fuzzy rules. Alternatively, this paper seeks to build the motion control function entirely through learning by trials. The innovative design is called the adaptive critic motion control system, which consists of mainly a self-learning posture control loop and an adaptive critic velocity control loop.

Fig. 2illustrates the design concept. The stereovision module finds a forward path. According to the forward path, feedback positions and physical limitations of WMR, the path planner calculates the planed positions. PNC which approximates the specialized inverse velocity model (Narendra & Parthasathy, 1990) of WMR maps planned positions to suitable velocity commands. Actually, VNC is a DHP adaptive critic design which invokes incremental optimization to generate the ability of velocity control through learning. Learning begins with supervised drive to set up the neural weights in VNC and PNC. Hence, the supervised drive should excite the WMR dynamics sufficiently in the interested working domain so that the learning would be complete. During autonomous drive, while PNC halts learning VNC is corrected successively to optimize the control performance.

3. Design of DHP adaptive critic motion controller 3.1. Adaptive critic velocity neuro-controller

Adaptive critic methods are usually practiced with model-based learning structures such as neural or neuro-fuzzy networks. They have common roots as generalizations of dynamic program-ming for neural reinforcement learning approaches and have a capability of optimization over time under conditions of noise,

(3)

Fig. 2. Architecture of the adaptive critic motion control system.

Fig. 3. Architecture of the DHP adaptive critic velocity neuro-controller. The solid

lines indicate signal paths, the dashed lines indicate data paths, and the round rectangular blocks represent neural networks.

uncertainty, and nonlinearity (Werbos,1992,2004). Heuristic dy-namic programming (HDP), dual heuristic programming (DHP), and globalized dual heuristic programming (GDHP), and their ac-tion dependent companions are the main categories of adaptive critic designs (Prokhorov & Wunsch, 1997). They can be differ-entiated by the critic output. HDP uses the critic to estimate the value function in the Bellman equation of dynamic programming. In DHP, the critic approximates the derivative value function to fa-cilitate the computation in the gradient correcting rule. The critic in GDHP estimates both the value function and its derivatives. DHP was shown to have a superior performance to HDP and no observ-able improved performance by GDHP (Lendaris & Shannon, 1998;

Prokhorov, Santiago, & Wunsch, 1995). In addition, incremental optimization based on dynamic programming is rigorous in theory. Stability of a trained DHP adaptive critic control system is governed by the optimal control theory in the sense of dynamic program-ming (Bertsekas, 2005).

As illustrated inFig. 3, VNC contains blocks called the action network, critic network, shadow critic network, plant model and primary utility. The action network is responsible for producing suitable control signals while the critic and shadow critic networks form the adaptive critic to critique the action performance. The plant model can be either mathematical formulations or neural approximation of the WMR dynamics.

3.1.1. Neural computing of VNC

The action, critic and shadow critic networks are each implemented with three-layer perceptrons (Haykin, 1999). These

Fig. 4. Architecture of the three-layer perceptrons.

neural networks have the common architecture as shown inFig. 4. In the neural architecture, each hidden neuron has a hyperbolic tangent activation function to obtain output as

¯

yj

(

n

) =

a tanh b I

X

i=0

w

ji

(

n

)¯

xi

(

n

)

!

,

x

¯

0

(

n

) =

1

,

(

a

,

b

) >

0 (2)

where n denotes time sequence. Each output neuron has a linear activation function to obtain output as

¯

zk

(

n

) =

c J

X

j=0

w

kj

(

n

)¯

yj

(

n

),

y

¯

0

(

n

) =

1

,

c

>

0

.

(3)

The partial derivatives pertaining to the neural architecture are derived as follows:

∂ ¯

zk

(

n

)

∂w

kj

(

n

)

=

cy

¯

j

(

n

)

(4)

∂ ¯

zk

(

n

)

∂w

ji

(

n

)

=

bc a

w

kj

(

n

)[

a

− ¯

yj

(

n

)][

a

+ ¯

yj

(

n

)]¯

xi

(

n

)

(5)

∂ ¯

zk

(

n

)

∂¯

xi

(

n

)

=

J

X

j=1

bc a

w

kj

(

n

)[

a

− ¯

yj

(

n

)][

a

+ ¯

yj

(

n

)]w

ji

(

n

)

.

(6) Usually,(4)and(5)are called the sensitivity functions and(6)

is called the Jacobian function. DHP adaptive critic design needs these quantities to evaluate the correcting rules.

3.1.2. Plant model and Jacobian quantities

InFig. 3, the plant model is used to predict the immediate future states and calculate certain partial derivatives pertaining to the plant. It can be either the mathematical model or neural approx-imation of the plant dynamics. Since DHP adaptive critic design al-lows using partial or qualitative plant model (Shannon, 1999) and the WMR model is known (Lin et al., 2004). The plant model in VNC is implemented with(1)but neglecting the unknown terms corre-sponding to the frictional, gravitational and disturbed forces. From

(1), the simplified model equations are derived as below.

˙

R

= −M

−1

(

q

)

C

(

q

, ˙

q

)

R

+

M−1

(

q

)

B

(

q

)

u

.

(7) Rewrite(7)as the following nonlinear mappings:

˙

Ri

=

fi

(

R

,

u

),

i

=

1

,

2

, . . . ,

S

.

(8)

Then for the operating point

(

Rn

,

un

)

at sampling time tn, the

first-order approximation of(7)is obtained as

˙

(4)

(

n 1

) = ¯

(

n

)

(

n

) + ¯

(

n

)

(

n

) + ¯

(

n

)

(10) where A

¯

(

n

) =

eA(n)∆, B

¯

(

n

) = R

₀∆eA(n)tdt B

(

n

)

, D

¯

(

n

) =

R

∆

0 e

A(n)t_{dt D}

₍

_n

₎

_{, and where}_∆_{represents sampling period. In VNC,}

the plant model uses(10)to predict the states and calculates the following Jacobian quantities

∂

Ri

(

n

+

1

)

∂

Rj

(

n

)

= ¯

Aij

(

n

),

∂

Ri

(

n

+

1

)

∂

uk

(

n

)

= ¯

Bik

(

n

).

(11)

3.1.3. Correcting the action network

In Fig. 3, U

(

n

)

is the primary utility function defined by according to the specific application context. Since the objective of VNC is to control WMR to track the velocity command as closely as possible, the primary utility function is defined as

U

(

n

) =

0

.

25

(v(

n

) − v

d

(

n

))

2

+

0

.

25

(ω(

n

) − ω

d

(

n

))

2 (12)

where

(v

d

(

n

), ω

d

(

n

))

is the velocity command. To achieve the

control objective, the neural weights in the action network must be corrected to minimize not only the present value but also the sum of all future values of U

(

n

)

. According to dynamic programming (Bellman, 1957), this goal can be achieved by minimizing the secondary utility function, i.e. value function, expressed as

J

(

n

) =

∞

X

k=0

η

k_U

₍

_n

₊

_k

_{) =}

_U

₍

_n

_{) + η}

_J

₍

_n

₊

₁

₎

₍₁₃₎

where

η

, 0

< η ≤

1 is a discount factor. Thus, using the gradient descent method, a suitable correcting rule of the action network is

1wkm

(

n

)

=

α

∂

J

(

n

)

∂w

km

(

n

)

=

α

∂

J

(

n

)

∂

uk

(

n

)

∂

uk

(

n

)

∂w

km

(

n

)

=

α

∂

U

(

n

)

∂

uk

(

n

)

+

η

∂

J

(

n

+

1

)

∂

uk

(

n

)

∂

u_k

(

n

)

∂w

km

(

n

)

=

α







∂

U

(

n

)

∂

uk

(

n

)

|

{z

}

Utility

+

η

X

s

λ

◦ s

(

n

+

1

)

|

{z

}

Critic

∂

Rs

(

n

+

1

)

∂

uk

(

n

)

|

{z

}

Model







∂

uk

(

n

)

∂w

km

(

n

)

|

{z

}

Action (14)

where

α

is the learning rate and

w

km

(

n

)

is the mth neural weight

associated with the kth output of the action.

3.1.4. Correcting the shadow critic network and the critic network

In(14),

λ

◦_s

(

n

+

1

) = ∂

J

(

n

+

1

)/∂

Rs

(

n

+

1

)

is unknown. DHP

design embodies in estimating this quantity by the adaptive critic which is composed of the shadow critic and critic networks. They estimate the partial derivatives of the secondary utility function at present and immediate future sampling times as

λ

s

(

n

) =

∂

J

(

n

)

∂

Rs

(

n

)

,

s

=

1

,

2

, . . . ,

S (shadow critic) (15)

λ

◦ s

(

n

+

1

) =

∂

J

(

n

+

1

)

∂

Rs

(

n

+

1

)

,

s

=

1

,

2

, . . . ,

S (critic) (16)

where K denotes the dimension of the control vector u

(

n

)

. Since

(12)shows U

(

n

)

is independent of u

(

n

)

,(17)can be rewritten as

λ

◦ s

(

n

) =

∂

U

(

n

)

∂

Rs

(

n

)

|

{z

}

utility

+

η

S

X

s0=1











λ

s0◦

(

n

+

1

)

|

{z

}

Critic







∂

Rs0

(

n

+

1

)

∂

Rs

(

n

)

|

{z

}

Model

+

K

X

k=1







∂

Rs0

(

n

+

1

)

∂

uk

(

n

)

|

{z

}

Model

∂

uk

(

n

)

∂

Rs

(

n

)

|

{z

}

Action























.

(18)

In(18),

λ

s0◦

(

n

+

1

)

is the output of the critic network,

∂

Rs0

(

n

+

1

)/∂

Rs

(

n

)

and

∂

Rs0

(

n

+

1

)/∂

uk

(

n

)

are the Jacobian functions of

the plant model,

∂

uk

(

n

)/∂

Rs

(

n

)

is the Jacobian function of the

action network, and U

(

n

)

is a known function, therefore,

λ

◦_s

(

n

)

can be calculated. The adaptive critic in DHP learns by updating the shadow critic network so that

λ(

n

)

tracks

λ

◦

₍

_n

₎

_{. Hence, an}

error measure for correcting the shadow critic network can be formulated as E

(

n

) =

0

.

5

X

s

λ

s

(

n

) − λ

◦s

(

n

)

2

.

(19)

Then the gradient correcting rule is

1wsm

(

n

) = β

∂

E

(

n

)

∂w

sm

(

n

)

=

β λ

_s

(

n

) − λ

◦_s

(

n

)

∂λ

s

(

n

)

∂w

sm

(

n

)

(20)

where

β

is the learning rate and

w

sm is the mth neural weight

associated with the sth output of the shadow critic network. The critic network duplicates the corresponding neural weights of the shadow critic network therefore no correcting rule is needed. But for learning convergence, duplication is made only for every several (typically five) sampling times.

3.2. Self-learning posture neuro-controller

PNC consists of two neural networks mentioned as the linear and angular PNC to map planned positions to linear and angular velocity commands. The self-learning mechanism is constructed by identifying the specialized inverse velocity model of WMR as shown in Fig. 5. For learning convergence, the specialized inverse velocity model and PNC are organized as standalone neural networks. The neural architecture is shown inFig. 4. The linear PNC has twelve inputs organized from two planned positions and five feedback positions as follows:

[

x

(

n

+

2

) −

x

(

n

+

1

),

y

(

n

+

2

) −

y

(

n

+

1

),

x

(

n

+

1

) −

x

(

n

),

y

(

n

+

1

) −

y

(

n

),

x

(

n

−

1

) −

x

(

n

),

y

(

n

−

1

) −

y

(

n

),

x

(

n

−

2

) −

x

(

n

−

1

),

y

(

n

−

2

) −

y

(

n

−

1

),

x

(

n

−

3

)

−

x

(

n

−

2

),

y

(

n

−

3

) −

y

(

n

−

2

),

x

(

n

−

4

) −

x

(

n

−

3

),

(5)

Fig. 5. Scheme of learning the specialized inverse velocity model.

Actually,(21)contains multi-step displacements to imply the velocity, acceleration and jerk for PNC to determine the outputs. The output of the linear PNC is the linear velocity command

v

d

(

n

)

and

v

d

(

n

+

1

)

, where

v

d

(

n

)

is active and

v

d

(

n

+

1

)

is dummy.

Similarly, the angular PNC has eighteen inputs organized as below

[

x

(

n

+

2

) −

x

(

n

+

1

),

y

(

n

+

2

) −

y

(

n

+

1

),

θ(

n

+

2

) − θ(

n

+

1

),

x

(

n

+

1

) −

x

(

n

),

y

(

n

+

1

) −

y

(

n

), θ(

n

+

1

) − θ(

n

),

x

(

n

−

1

) −

x

(

n

),

y

(

n

−

1

) −

y

(

n

), θ(

n

−

1

) − θ(

n

)

x

(

n

−

2

) −

x

(

n

−

1

),

y

(

n

−

2

) −

y

(

n

−

1

), θ(

n

−

2

)

−

θ(

n

−

1

),

x

(

n

−

3

) −

x

(

n

−

2

),

y

(

n

−

3

) −

y

(

n

−

2

),

θ(

n

−

3

) − θ(

n

−

2

),

x

(

n

−

4

) −

x

(

n

−

3

),

y

(

n

−

4

) −

y

(

n

−

3

),

θ(

n

−

4

) − θ(

n

−

3

)]

T

.

(22)

The output of the angular PNC is the angular velocity command

ω

d

(

n

)

and

ω

d

(

n

+

1

)

, where

ω

d

(

n

)

is active and

ω

d

(

n

+

1

)

is dummy. Fig. 5shows the scheme of learning the specialized inverse velocity model. The specialized inverse, which is not necessarily the complete inverse, covers simply the working domain excited by supervised drive. Therefore, no singularity of WMR would be encountered. Certainly, supervised drive must supply rich enough, safe velocity commands to encompass the working domain requested in autonomous drive. At the end of supervised drive, PNC duplicates the neural weights in the specialized inverse velocity model. During autonomous drive, the neural weights in PNC are kept constant so that VNC can incrementally optimize the velocity control.

Backpropagation with Levenberg Marquardt algorithm (LM) (Wilamowski, 2003) is used to correct the specialized inverse velocity model. Define the error of the velocity inferred by the specialized inverse velocity model as

e

(

n

)

= [

(v

_d

(

n

−

1

) − v

infer

(

n

−

1

)), (v

d

(

n

−

2

) − v

infer

(

n

−

2

)),

(ω

d

(

n

−

1

) − ω

infer

(

n

−

1

)), (ω

d

(

n

−

2

) − ω

infer

(

n

−

2

))]

T

.

(23) For the usage of the LM algorithm, e

(

n

)

are collected for m sampling times. Then the error measure is constructed as

ε(

W

) =

0

.

5ETE (24)

where E

= [e

T

(

n

),

eT

(

n

−

1

), . . . ,

eT

(

n

−

m

+

1

)]

T. Then the neural weights in the specialized inverse velocity model are corrected as

Wk+1

=

Wk

− [G

TG

+

ξ

I]−1GTE (25)

where G

= ∇

(ε(

W

))

and I is the identity matrix. When using(25), the scalar

ξ

is decreased after each successful step, i.e. reduction in

Fig. 6. Planning a smooth path with the arc-line algorithm.

the error measure, and increased only when a tentative step would increase the error measure. This provides a switching capability between the Gauss–Newton algorithm and the steepest descent method.

3.3. Path planner

The stereovision module is responsible to locate the target and find a collision-free path. According to the viewed path and considering the physical limitations of WMR, the path planner plans a feasible, smooth path and calculates planned positions for next two steps. The arc-line algorithm (Nelson, 1989) is used to smooth the path. As illustrated inFig. 6, this algorithm replaces the line segments around the intersection of two straight lines with a smooth curve. First, the start point S

(

xs

,

ys

, θ

s

)

on the first

line, the end point E

(

xe

,

ye

, θ

e

)

on the second line, the intersection

point I

(

xi

,

yi

, θ

i

)

of these two lines, and the angle

(φ

d

=

θ

i

−

θ

e

)

between these two lines are found. Then a value of curvature

(γ )

is assigned to find the transition point T

(

xt

,

yt

, θ

t

)

on the first

line, the distance

γ

tan

(φ

d

/

2

)

to the intersection point, and the

center point C

(

xc

,

yc

)

. Finally, the original straight line segments

are replaced by the arc starts at point T .

As shown in Fig. 7, denote the physical limitations of WMR on maximum displacement and steering-angle as dmaxand

φ

max.

By constructing a displacement vector from present position

(

xp

,

yp

, θ

p

)

to a target position

(

xb

,

yb

)

selected on the planned

path, the desired displacement dpand steering angle

φ

pcan be

determined. Then the planed linear and angular positions are calculated as

xp

(

n

+

1

) =

xp

(

n

) +

dpcos

(φ

p

+

θ

p

)

yp

(

n

+

1

) =

yp

(

n

) +

dpsin

(φ

p

+

θ

p

)

θ

p

(

n

+

1

) = θ

p

(

n

) + φ

p

.

(26)

When they violate the physical limitations, the maximum allowable values are used.

4. Validation of DHP adaptive critic motion control design

In the following validation, VNC of the experimental WMR is implemented as below. The action network has four inputs

[

v(

n

), ω(

n

), v

d

(

n

), ω

d

(

n

)]

T and two outputs corresponding to

wheel’s driving torques

[

τ

_l

(

n

), τ

r

(

n

)]

T. The shadow critic network

has four inputs and two outputs denoted by

[

v(

n

), ω(

n

), v

d

(

n

),

ω

d

(

n

)]

T and

[

λ

1

(

n

), λ

2

(

n

)]

T, respectively. The critic network is

a duplicate of the shadow critic network except the inputs and outputs are

[

v(

n

+

1

), ω(

n

+

1

), v

d

(

n

), ω

d

(

n

)]

T and

[

λ

◦1

(

n

+

1

), λ

◦

2

(

n

+

1

)]

T, respectively. The number of hidden neurons in each

neural network is chosen by experience. The parameter values in the activation functions of (2) and (3) are a

,

b

,

c

=

1. All neural weights in the action, critic and shadow critic networks are initialized with values chosen randomly in the range

[−

0

.

1

,

0

.

1

]

.

(6)

Fig. 7. Planning a feasible position.

The validation begins with supervised drive and followed by autonomous drive. Supervised drive supplies 1000 sets of velocity commands calculated from the following equations:

v

d

(

i

) =

0

.

5

cos

6

π(

i

−

1

)

1000

+

π

+

1

e−0.001i

,

i

=

1

,

2

, . . . ,

1000

w

d

(

i

) =

cos

6

π(

i

−

1

)

1000

+

π

+

1

sin

i

−

1 40

,

i

=

1

,

2

, . . . ,

1000

.

(27)

These velocity commands are fed sequentially into VNC to train the neural networks for 500 cycles. It should be noticed that supervised drive is responsible to supply rich enough, safe velocity commands without the corresponding control torques. Here rich enough velocity commands mean all possibilities covering the working domain requested in autonomous drive. Therefore, each training cycle is actually a trial to generate appropriate control torques by optimizing the secondary utility function. In the mean time of each trial, the neural weights in VNC and PNC are corrected. Hence, PNC is simply equivalent to the specialized inverse velocity model excited by supervised drive. After finishing 500 training cycles, the performance of VNC and PNC is examined. Finally, the trained WMR system is turned into autonomous drive and tested by tracking a right-turn path and a decaying sinusoidal path.

4.1. Performance of VNC

Figs. 8aand8bcompare the actual linear velocity with desired value in the 50th and 500th training cycles. The velocity error in the 500th cycle is significantly smaller than that of in the 50th cycle. The result in the angular velocity is similar but not presented. After finishing the 500 training cycles, the WMR system is commanded to track the following velocity pattern

v

t2

(

i

) =











0

.

002i 1

≤

i

≤

300 0

.

6 300

<

i

≤

600 1 600

<

i

≤

700 3

.

1

−

0

.

003i 700

<

i

≤

900 0

.

4 900

<

i

≤

1000 (28)

w

t2

(

i

) =











−

0

.

002i 1

≤

i

≤

300

−

0

.

6 300

<

i

≤

600

−

1 600

<

i

≤

700 0

.

003i

−

3

.

1 700

<

i

≤

900

−

0

.

4 900

<

i

≤

1000

.

Figs. 9a and 9b shows both linear and angular velocity tracking are accurate. Apparently, the DHP adaptive critic learning algorithm converges and appropriate VNC is obtained.

Fig. 8a. Result of linear velocity tracking in the 50th cycle.

Fig. 8b. Result of linear velocity tracking in the 500th cycle.

4.2. Performance of PNC

The trained WMR system is turned into autonomous drive and commanded to track a sequence of positions calculated with

y

(

i

) =

cos

33

π

1000x

(

i

)

,

i

=

1

,

2

, . . . ,

1000

.

(29) During autonomous drive, the outputs of PNC are recorded. The recorded values corresponding to the linear and angular velocities are presented as the dashed curve (actual) inFigs. 10aand10b. The solid curve (desired) is obtained by using the fuzzy posture controller designed byLin, Huang et al.(2005). Both curves are close to each other. Obviously, PNC performs as well as the fuzzy posture controller. The difference is that while the fuzzy posture controller was built by domain expert, PNC is obtained entirely by machine learning.

4.3. Performance of the trained WMR system

The trained WMR system is turned into autonomous drive and commanded to track a right-turn path and then a decaying sinusoidal path. The limitations on the posture control are dmax

=

(7)

Fig. 9a. Result of linear velocity tracking.

Fig. 9b. Result of angular velocity tracking.

Fig. 10a. Comparing the linear PNC output (actual) with that of using the fuzzy

posture controller (desired).

Case 1: Tracking a right-turn path.

Fig. 11 compares the results of tracking a right-turn path without and with the arc-line algorithm. Without the arc-line algorithm, the WMR system makes a nice right-turn as the dotted curve shown inFig. 11. But due to PNC has no knowledge of the

Fig. 10b. Comparing the angular PNC output (actual) with that of using the fuzzy

posture controller (desired).

Fig. 11. Results of tracking a right-turn path with and without using the arc-line algorithm.

right-turn until it occurs. Large overshoot is found around the right-turn. On the other hand, when the arc-line algorithm with curvature

γ =

1

.

1 is involved, the dashed curve inFig. 11shows the overshoot disappears. This means the stereovision and path planner enable shaping the tracking path.

Case 2: Tracking a decaying sinusoidal path.

Fig. 12shows the result of tracking a decaying sinusoidal path described by y

(

i

) =

1

.

1 sin

7

π

26x

(

i

)

e−0.35x(i)

,

i

=

1

,

2

, . . . ,

1000

.

(30) The dashed curve shows the path tracking is accurate. The dotted curve is obtained by using the fuzzy posture controller (Lin, Huang et al., 2005) instead of PNC. It seems because of optimization the DHP adaptive critic motion control design performs better than that built by domain expert.

5. Conclusion

The DHP adaptive critic motion control design unveiled au-tonomous development of control ability. Eventually, it minimized the engineering task in analyzing and synthesizing the system dy-namics to obtain an appropriate controller. Detailed formulations

(8)

Fig. 12. Result of tracking a decaying sinusoidal path.

of the DHP adaptive critic motion control design were presented and explained. VNC corrected the neural weights by incremental optimization while PNC learned by approximating the specialized inverse velocity system. Simply the primary utility function was required to define the control objective. Neither existing controller nor representative training samples nor control rules built by do-main experts was required. The proposed design was evaluated on the experimental WMR and successful results were obtained.

Acknowledgments

The authors would like to thank the editor, the associate editor and the reviewers for their valuable comments. The authors gratefully acknowledge National Science Council of Taiwan on grant NSC94-2213-E-002-049 and National Taiwan University on grant NTU 95R0036-07.

References

Bellman, R. E. (1957). Dynamic programming. Princeton Univ. Press.

Bertsekas, D. P. (2005). Dynamic programming and optimal control: Vol. I (3rd ed.). Belmont, MA: Athena Scientific, pp. 369–373.

Bloch, A. M., Reyhanoglu, M., & McClamroch, N. H. (1992). Control and stabilization of nonholonomic dynamic systems. IEEE Transactions on Automatic Control, 37, 1746–1757.

Chen, Q., & Redmill, K. (2004). Ohio state university at the 2004 DARPA grand challenge: Developing a completely autonomous vehicle. IEEE Intelligent Systems, (Sept./Oct.), 8–11.

Colbaugh, R., Barany, E., & Glass, K. (1998). Adaptive control of nonholonomic robotic systems. Journal of Robotic Systems, 15(16), 365–393.

Fierro, R., & Lewis, F. L. (1998). Control of a nonholonomic mobile robot using neural networks. IEEE Transactions on Neural Networks, 9(13), 589–600.

Greenwood, D. T. (1988). Principles of dynamics. Prentice Hall.

Gu, D., & Hu, H. (2002). Neural predictive control for a car-like mobile robot. International Journal of Robotics and Autonomous Systems, 39(2–3), 1–15. Haykin, S. (1999). Neural networks a comprehensive foundation (2nd ed.). Prentice

Hall International, Inc, pp. 156–169.

Jang, J. S., & Sun, C. T. (1995). Neuro-fuzzy modeling and control. Proceedings of the IEEE, 83(3), 378–406.

Kanayama, Y., Kimura, Y., Miyazaki, F., & Noguchi, T. (1990). A stable tracking control method for an autonomous mobile robot. Proceedings of IEEE International Conference on Robotics and Automation, 1, 384–389.

Lee, S., Adams, T. M., & Ryoo, B. (1997). A fuzzy navigation system for mobile construction robots. Automation in Construction, 6, 97–107.

Lee, T. H., Leung, F. H. F., & Tam, P. K. S. (1999). Position control for wheeled mobile robots using a fuzzy logic controller. In Proceedings of the 25th annual conference of the IEEE industrial electronics society 2 (pp. 525–528).

Lendaris, G. G., & Shannon, T. T. (1998). Application considerations for the DHP methodology. In Proceedings of the international joint conference on neural networks (pp. 1013–1018). Anchorage: IEEE Press.

Lin, W. S., Huang, C. L., Chuang, M. K., & Liu, G. C. (2004). Modeling a wheeled mobile robot for autonomous navigation design. In IASTED international conference on modeling, identification and control (pp. 275–280).

Transactions on Industrial Electronics, 36(12), 330–337.

Park, K. H., Cho, S. B., & Lee, Y. W. (2001). Optimal tracking control of a nonholonomic mobile robot. In Proceedings ISIE 2001 IEEE international symposium on industrial electronics: Vol. 3 (pp. 2073–2076).

Pawlowski, S., Kozlowski, K., & Wroblewski, W. (2001). Fuzzy logic implementation in mobile robot control. In Proceedings of the second international workshop on robot motion and control (pp. 65–70).

Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007.

Prokhorov, D., Santiago, R., & Wunsch, D. (1995). Adaptive critic designs: A case study for neuro-control. Neural Networks, 8, 1367–1372.

Shannon, T. T. (1999). Partial, noisy and qualitative models for adaptive critic-based neuro-control. In Proceedings of international conference on neural networks. (pp. 2271–2275).

Tsai, P. S., Wu, T. F., Chang, F. R., & Wang, L. S. (2002). Tracking control of nonholonomic mobile robot using hybrid structure. In The 6th world multiconference on systemics, cybernetics and informatics (presented). Werbos, P. (1992). Approximate dynamic programming for real-time control and

neural modeling. In White, & Sofge (Eds.), Handbook of intelligent control (pp. 493–525). New York: Van Nostrand Reinhold.

Werbos, P. (2004). ADP: Goals, opportunities and principles. In J. Si, A. G. Barto, W. B. Powell, & D. Wunsch, II (Eds.), IEEE press series on computational intelligence, Handbook of learning and approximate dynamic programming (pp. 3–44). Wilamowski, B. M. (2003). Neural network architectures and learning. In

Proceedings of IEEE international conference on industrial technology, 1.1 (pp. 10–12).

Yun, X., & Yamamoto, Y. (1993). Internal dynamics of a wheeled mobile robot. In Proceeding of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1288–1293).

Wei-Song Lin is, for seventeen times, the recipient of the

National Science Council Awards for exceptional achieve-ment in research. From 1996 to 2002, he led the sensor calibration team of Ocean Color Imager aboard Formosa-1 satellite, the first scientific satellite of Taiwan, and re-ceived Success Award from National Space Program Of-fice. In 2001, he received Teaching Award from Ministry of Education of Taiwan for contribution to engineering ed-ucation. As a consultant to Taipower Company, he con-tributed to computerized instrumentation and control of the fourth nuclear power plant of Taiwan. In collabora-tion with his colleagues, he won the Best Paper Award in the Ninth Conference on Image Processing and Pattern Recognition in 1996. He is a subject of 2006 Who’s Who in Science and Engineering, 2007 Who’s Who in Asia, and 2008 Who’s Who in the World. He received the M.S. degree in electrical engineering from National Cheng Kung University in 1975, and the Ph.D. degree in electrical engineering from National Taiwan University in 1982. He began his career with Chunghwa Tele-com Laboratories to develop package switching network. He pioneered in micro-processor education with Chunghwa Telecom Training Institute in 1979. He cur-rently holds a Professor position with the Department of Electrical Engineering of National Taiwan University. His research interests include autonomous control; embedded computing controller design; neural-fuzzy systems; the use of ap-proximate dynamic programming in control; active safety control of by-wire electrical vehicle; energy management of fuel-cell powered vehicle; the use of com-putational stereo in surveillance and navigation; and multi-spectral electromag-netic sensing.

Ping-Chieh Yang was born in Taipei City in 1981.

He received the bachelor degree in power mechanical engineering from National Tsing Hua University in 2003, and the M.S. degree in electrical engineering from National Taiwan University in 2005. His research interests are autonomous mobile robot and the use of approximate dynamic programming. He currently holds an engineer position with National Instruments Company.

Adaptive critic motion control design of autonomous wheeled mobile robot by dual heuristic programming

Automatica