Adaptive critic anti-slip control of wheeled autonomous robot

(1)

Adaptive critic anti-slip control of wheeled

autonomous robot

W.-S. Lin, L.-H. Chang and P.-C. Yang

Abstract: When a wheeled autonomous robot drives with wheel slips, the velocity and posture control becomes difficult. An ideal automatic driving control system should be able to comply with changes in slip conditions so as to optimise the control performance. Using dual heuristic pro-gramming and multi-layer perceptron neural networks, an adaptive critic anti-slip control design is developed to achieve this goal. The critic structure enables neural network learning by satisfying the Bellman equation so that the inclination of the action performance can be assessed to improve the control parameters. A slip model of the robot vehicle is derived. The adaptive critic anti-slip control system is verified extensively by computer simulation. The result shows that the perform-ance is significantly better than that of using traditional fuzzy control.

1 Introduction

Wheeled autonomous robots may drive without prior knowledge of slip conditions. An automatic driving control-ler with fixed parameters may in this situation perform poorly or even go out of control. Ideally, the controller should be able to learn the slip conditions, assess the robot’s states and then compensate for the slip effect. This article demonstrates an adaptive critic anti-slip control design that fulfils this requirement by dual heuristic pro-gramming (DHP)-based neural networks. The adaptive critic method is a technological attempt to implement human processes of learning and applying control to achieve a future goal[1]. Humans are motivated by love, fortune, power and so on. The critic structure shapes the controller to satisfy the Bellman equation, the motivation equivalence. The critic method essentially uses compu-tational entity to criticise actions. Then in accordance with the inclination of the action performance, the control unit is improved to approximate optimal control.

Dynamic programming is a mathematical formalism to design an optimal controller for a nonlinear system. Bellman’s [2] principle of optimality allows us to take every step as a starting point to look for the optimal sol-ution. However, the nature of requesting future information makes dynamic programming not directly feasible in prac-tical applications. Alternatively, DHP has been developed as an approximator to implement dynamic programming

[3, 4]. Other than DHP, heuristic dynamic programming

(HDP) and global dual heuristic programming (GDHP) serve the same purpose. They are differentiated by the output of the critic entity. In the DHP method, the critic pro-duces the derivative of the Bellman equation [5, 6]. However, in HDP, the critic’s output is the criteria function

of the Bellman equation. In GDHP, both the Bellman equation and its derivative are calculated. However, DHP was demonstrated in Prokhorov et al. [7] and Venayagamoorthy et al.[8]to have a superior performance to HDP and there was no observable improved performance by GDHP.

This article embodies adaptive critic anti-slip control with DHP and multi-layer perceptron (MLP) neural networks. For anti-slip control, it is assumed that the frictional forces at the wheel – road contacts are depen-dent on surface condition and location. Therefore a driving wheel may purely roll, roll with slip, or spin to challenge the control design. The adaptive critic anti-slip control is shown as able to learn the action performance so as to modify the network parameters and approximate optimal control. Optimal anti-slip control optimises a utility function under variant slips. The adaptive critic anti-slip design is integrated with the velocity control as a single system. A slip model of the robot vehicle is derived. Extensive simulation studies verify the usefulness of the proposed design. Control performance is compared with that using the traditional fuzzy control method[9].

2 Slip model of the robot vehicle and the anti-slip control problem

The robot vehicle under consideration has two passive front wheels and two independent, motorised rear wheels.

Fig. 1 shows a schematic top view. Manoeuvring the

driving wheels (rear wheels) can change the linear and angular velocities of the vehicle. Such a robot vehicle equipped with stereovision guidance for autonomous navi-gation has been implemented and demonstrated by Lin et al.[10]. Without considering the slip condition, a mathe-matical model of the vehicle has been derived and verified experimentally in Lin et al.[11]. Lin et al.[9]have devel-oped a hierarchical fuzzy control system for automatic drive. Using MLP neural networks and DHP, this article presents the adaptive critic anti-slip control system for automatic drive.

#The Institution of Engineering and Technology 2006 doi:10.1049/iet-cta:20050341

Paper first received 13th September 2005 and in revised form 23rd January 2006 The authors are with the Department of Electrical Engineering, National Taiwan University, Taiwan, Republic of China

(2)

2.1 General model of the robot vehicle

A wheeled robot vehicle is a typical non-holonomic mech-anical system[12]and the literature[11, 13 – 15]has shown that vehicle dynamics can generally be described as

M ðqÞ€q þ Cðq; _qÞ _q þ Fð _qÞ þ GðqÞ þtd¼BðqÞu þ ATðqÞl ð1Þ and constrained by

AðqÞ_q ¼ 0 ð2Þ

where q [ <n1 is the generalised coordinate vector, M(q) [ <nn is a symmetric, positive definite inertia matrix, C(q; _q) [ <nn is the centripetal and Coriolis matrix, F(_q) [ <n1 denotes the surface friction, G(q) [ <n1is the gravitational vector,td[ <n1denotes

the bounded unknown disturbance including unstructured, unmodelled dynamics, B(q) [ <nr represents the input transformation matrix, u [ <n1denotes the input torques, A(q) [ <mnis a full rank matrix associated with the con-straints and l[ <m1 is the Lagrange multiplier or the vector of constraint forces.

Assume Z(q) [ <n(n2m) is a set of smooth linearly independent vector fields spanning the null space of A(q). Then there exists an auxiliary vector time function R(t) [ <n(n2m), such that for all t

_q ¼ ZðqÞRðtÞ ð3Þ

where R does not necessarily have any physical signifi-cance. With (1), (2) and (3), the following reduced order dynamic model without the Lagrange multiplier is obtained

M ðqÞ _R þ Cðq; _qÞR þ Fð_qÞ þ GðqÞ þ td¼ BðqÞu ð4Þ where M ðqÞ ¼ Z0ðqÞM ðqÞZðqÞ [ <rr Cðq; _qÞ ¼ Z0ðqÞðM ðqÞ _ZðqÞ þ Cðq; _qÞZðqÞÞ [ <rr Fð_qÞ ¼ Z0ðqÞFð_qÞ [ <r1 GðqÞ ¼ Z0ðqÞGðqÞ [ <r1 td ¼Z0ðqÞtd [ <r1

2.2 Slip model of the robot vehicle

Consider the vehicle shown inFig. 1and use the following notations: p ¼ [x y u]0 _{denotes a posture vector; b is the}

half-width of the axle of the driving wheels; d is the dis-placement from the point P along the Xcaxis to the centre

of mass; r is the radius of driving wheels; mc is the

weight of the body (i.e. excluding the driving wheels and their associated rotors); mwis the mass of a single driving

wheel (i.e. taking the associated rotor into account); Ic is

the moment of inertia of the body; Iw is the moment of

inertia of each driving wheel about the axle; and Imis the

moment of inertia of each driving wheel about a wheel diameter. Assume the vehicle moves by satisfying the following conditions.

Condition 1: The vehicle only moves in the direction normal to the axle of driving wheels (i.e. no lateral motion). Condition 2: Each driving wheel rolls with slip only in the longitudinal direction.

Then the slip model of the vehicle can be described in the forms of (3) and (4) and the following parameters are obtained:

1. Kinematics of the slip model: Conditions 1 and 2 put the following constraints

_y cosu_x sinu¼0 ð5Þ

_x cosuþ_y sinub_u¼rr_lw__l ð6Þ _x cosuþ_y sinuþb_u¼rr_rw__r ð7Þ whererlandrr, 0 ,rl,rr1 represent the anti-slip factors

associated with left and right driving wheels, respectively. Anti-slip factor is defined as the percentage of a wheel’s driving force reflected effectively by the road friction. Generally, it is dependent on wheel and road conditions and may vary with locations. When an anti-slip factor varies within [0, 1], the corresponding driving wheel may purely roll, roll with slip, or spin. However, in deriving the slip model, 0 ,rl,rr1 is assumed.

The linear velocity n and angular velocity v are com-puted as

n ¼ _x cosuþ_y sinu; v¼ _u ð8Þ The relationship between (w˙l,w˙r) and (n,v) is obtained by

substituting (8) into (6) and (7)

rrlw_l¼n bv; rrrw_r ¼n þ bv ð9Þ Take the generalised coordinate vector q ¼ [x, y,u,wl,wr]T

and construct the auxiliary vector R ¼ [vv]T. The constraints (5)–(7) can be organised in the following matrix form

AðqÞ ¼ sinu cosu 0 0 0 cosu sinu b rr_l 0 cosu sinu b 0 rr_r 2 4 3 5 ð10Þ

Then the parameter matrix in (3) of the slip model is

ZðqÞ ¼ cosu 0 sinu 0 0 1 1 rr_l b rr_l 1 rr_r b rr_r 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 ð11Þ

2. Dynamics of the slip model: Using Lagrange formalism, the parametric matrices M, C and B in (1) of the slip model

(3)

are obtained as follows (the derivations are not presented) M ðqÞ ¼ m 0 mcd sinu 0 0 0 m mcd cosu 0 0 mcd sinu mcd cosu I 0 0 0 0 0 Iw 0 0 0 0 0 Iw 2 6 6 6 6 6 4 3 7 7 7 7 7 5 Cðq; _qÞ ¼ 0 0 mcd _ucosu 0 0 0 0 mcd _usinu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 BðqÞ ¼1 r r_lcosu r_rcosu r_lsinu r_rsinu r_lb r_rb r 0 0 r 2 6 6 6 6 6 4 3 7 7 7 7 7 5 ð12Þ

The unmodelled dynamics and disturbance is

hðq; _qÞ ¼ Fð_qÞ þ GðqÞ þtd ð13Þ Accordingly, the parametric matrices in (4) of the slip model are M ðqÞ ¼ m þIw r2 1 r2 l þ 1 r2 r bIw r2 1 r2 l þ 1 r2 r bIw r2 1 r2 l þ 1 r2 r I þb 2_I w r2 1 r2 l þ 1 r2 r 2 6 6 6 4 3 7 7 7 5 ð14Þ Cðq; _qÞ ¼ Iw r2_r l d dt 1 r_l þ Iw r2_r r d dt 1 r_r mcd _u bIw r2_r l d dt 1 r_l þbIw r2_r r d dt 1 r_r 2 6 6 6 4 mcd_u bIw r2_r l d dt 1 r_l þbIw r2_r r d dt 1 r_r b2_I w r2_r l d dt 1 r_l þb 2_I w r2_r r d dt 1 r_r 3 7 7 7 5 ð15Þ BðqÞ ¼ 1 r rlþ 1 r_l 1 r rrþ 1 r_r b r rlþ 1 r_l b r rrþ 1 r_r 2 6 6 6 4 3 7 7 7 5 ð16Þ

The unmodelled term (13) becomes

hðq; _qÞ ¼ Z0_{hðq; _qÞ ¼ Z}0_{½Fð_qÞ þ GðqÞ þ}_t

d ð17Þ The linear and angular velocities (v, v) are estimated as below v v ¼ r 2 r 2 r 2b r 2b 2 6 4 3 7 5 rlvl r_rvr ð18Þ

wherevl and vrdenote the angular velocities of left and

right driving wheels respectively. Equations (11), (14) – (16) and (18) show that the slip model is nonlinearly depen-dent on the anti-slip factors. Furthermore, the anti-slip factors are unknown, unmeasurable and generally vary with the vehicle location and road condition. The anti-slip control problem is stated to find an optimal velocity control-ler for a system described by the slip model in which the slip

is unknown and unmeasurable. For the convenience of computer simulation, we assume that the anti-slip factor is a function r(x, y) of simply the vehicle location. Then rl

andrrcan be calculated as follows

r_l¼rðxl; ylÞ ¼rðx þ b sinu; y b cosuÞ ð19Þ

r_r¼rðxr; yrÞ ¼rðx b sinu; y þ b cosuÞ ð20Þ Specifically, whenr(x, y) ¼ constant, (15) becomes

Cðq; _qÞ ¼ 0 mcd _u mcd _u 0

ð21Þ

3 Adaptive critic anti-slip control design

The automatic drive of a wheeled autonomous robot needs basically a navigation system to learn the environment and plan the desired path, a posture controller to infer the desired linear and angular velocities and a velocity control-ler to ensure that the vehicle drives with the desired linear and angular velocities. The navigation system could request the vehicle to accomplish a task such as avoiding an obstacle or tracking a trajectory, as illustrated in

Fig. 2. However, unknown wheel slips challenge severely

the velocity control system. They may reduce the control performance or even make the system fail to accomplish the desired task. Adaptive critic anti-slip design aims at optimising the velocity control under variant wheel slips.

In the optimal control context, the control objective is to optimise the primary utility function and Bellman’s principle of optimality allows us to take every step as a starting point to look for the optimal solution. No matter the wheel slips are, the objective of the adaptive critic anti-slip control system is to follow the velocity command as closely as possible. Therefore the primary utility function is chosen as

U ðtÞ ¼svfvðtÞ vdðtÞg2þsvfvðtÞ vdðtÞg2 ð22Þ where (v,v) and (vd,vd) are velocities and desired velocities,

respectively, and (sv,sv) are weights to balance the linear and angular velocity errors. The secondary utility function, Bellman’s equation and Bellman recursion, is accordingly

J ðtÞ ¼X 1

k¼0

gk_{U ðt þ kÞ ¼ U ðtÞ þ}_g_{J ðt þ 1Þ} _ð23Þ whereg, 0 ,g1 discounts the significance of the utility in the future[2]. By satisfying the Bellman equation, the adap-tive critic anti-slip control system approximates optimal velocity control.

(4)

3.1 Neural networks and the learning structure As shown inFig. 3, the DHP algorithm is implemented with an on-line learning structure of neural networks. This struc-ture consists of the action network, critic network, verifica-tion network and vehicle model (not the slip model). The MLP neural network and backpropagation algorithm [16] are chosen to implement the action, critic and verification networks. The architecture of the action network is designed as illustrated inFig. 4. It has four inputs (v(t), v(t), vd(t),

vd(t)) and two outputs (tl(t), tr(t)) corresponding to the

torque commands. The hidden layer has three nodes (neurons) and each with a hyperbolic-tangent activation function. The activation function has a gain value of 30. Computational formulas of the action network are listed as below r1¼v; r2¼v; r3¼vd; r4¼vd ð24Þ f1ðxÞ ¼ tanhðxÞ; f2ðxÞ ¼ 30 tanhðxÞ ð25Þ netj1¼ X4 k¼1 W1ð j;kÞrkþB1ð j;1Þ; Tj¼f1ðnetj1Þ; j ¼ 1; 2; 3 ð26Þ netj2¼ X3 k¼1 W2ð j;kÞTkþB2ð j;1Þ; ui¼f2ðneti2Þ; i ¼ 1; 2 ð27Þ u1¼tl; u2¼tr ð28Þ

The architecture of the verification network is shown in Fig. 5. It has four inputs (v(t),v(t), vd(t),vd(t)), the same

as those of the action network. There are three hidden layer nodes, each with a hyperbolic-tangent activation function and a bias. The output node has a linear activation function. The outputs of the verification network are

l1(t) ﬃ @J(t)/@v(t) and l2(t) ﬃ @J(t)/@v(t). Computational

formulas of the verification network are listed as

r1¼v; r2¼v; r3¼vd; r4¼vd ð29Þ f1ðxÞ ¼ tanhðxÞ; f2ðxÞ ¼ x ð30Þ netj1¼ X4 k¼1 W1ð j;kÞrkþB1ð j;1Þ; Tj¼f1ðnetj1Þ; j ¼ 1; 2; 3 ð31Þ li¼ X3 k¼1 W2ði;kÞTkþB2ði;1Þ; i ¼ 1; 2 ð32Þ l1¼ @J ðtÞ @vðtÞ; l2 ¼ @J ðtÞ @vðtÞ ð33Þ

The critic network is identical to the verification network, except the inputs and outputs. The critic network has (v(t þ 1), v(t þ 1), vd(t), vd(t)) as inputs, and produces

the predictive quantities l1(t þ 1) ﬃ @J(t þ 1)/@v(t þ 1)

andl2(t þ 1) ﬃ @J(t þ 1)/@v(t þ 1).

The vehicle model predicts the quantities R(t þ 1), @R(t þ 1)/@R(t) and @R(t þ 1)/@u(t), where R(t) ¼ [v(t),

v(t)]T. The vehicle model can be of mathematical or neural/fuzzy type. In this article, the mathematical model is used. The model equation of the vehicle (no wheel slip) is obtained by substitutingrl¼rr¼ 1 into the slip model.

The linearisation, sampled-data form of the model equa-tion with on-line parameter update implements the vehicle model.

3.2 Weight update rules

The backpropagation algorithm and gradient-descent method are adopted to develop the weight update rules of the action and verification networks. The critic network does not have weight update rules but copies weight values from the verification network for every fifth sampling time. As we have the vehicle model to predict the states one-step ahead, the Bellman recursion can

Fig. 3 Adaptive critic anti-slip control system with the DHP learning algorithm; the action and verification networks each has a weight update rule; the critic network copies weight values from the verification network

Fig. 4 Architecture of the action network

(5)

substitute for the error measure required in the backpropa-gation algorithm[6]. The weight update rule of the action network is DwaðtÞ ¼ a @J ðtÞ @waðtÞ ¼ aX 2 j¼1 @J ðtÞ @ujðtÞ @ujðtÞ @waðtÞ ð34Þ where a is the learning rate, wa denotes a weight of the

action network @J ðtÞ @ujðtÞ ¼@U ðtÞ @ujðtÞ þg@J ðt þ 1Þ @ujðtÞ and where @J ðt þ 1Þ @ujðtÞ ¼X 2 s¼1 @J ðt þ 1Þ @Rsðt þ 1Þ |fflfflfflfflfflffl{zfflfflfflfflfflffl} Critic output @Rsðt þ 1Þ @ujðtÞ |ffl{zffl} Model output

The weight update rule of the verification network is obtained by supervised learning through backpropagation. Based on the Bellman recursion, the desired outputl8 of the verification network is estimated as follows

l sðtÞ ¼ @J ðtÞ @RsðtÞ ¼@U ðtÞ @RsðtÞ þg@J ðt þ 1Þ @RsðtÞ ; s ¼ 1; 2 ð35Þ where @J ðt þ 1Þ @RsðtÞ ¼X 2 k¼1 @J ðt þ 1Þ @Rkðt þ 1Þ |fflfflfflfflfflffl{zfflfflfflfflfflffl} Critic output @Rkðt þ 1Þ @RsðtÞ |fflffl{zfflffl} Model output þX 2 k¼1 X2 j¼1 @J ðt þ 1Þ @Rkðt þ 1Þ |fflfflfflfflfflffl{zfflfflfflfflfflffl} Critic output @Rkðt þ 1Þ @ujðtÞ |ffl{zffl} Model output @ujðtÞ @RsðtÞ

Then the error measure is taken as eðtÞ ¼X

2

s¼1

flsðtÞ lsðtÞg2 ð36Þ where l1(t) ¼ @J(t)/@v(t) and l2(t) ¼ @J(t)/@v(t) are the

verification outputs. The weight update rule is DwvðtÞ ¼ h 2 @eðtÞ @wvðtÞ ¼ hX 2 s¼1 lsðtÞ |ﬄ{zﬄ} verification output l_sðtÞ @lsðtÞ @wvðtÞ ð37Þ

wherehis the learning rate, and wvdenotes a weight of the

verification network. The training algorithm of the adaptive critic anti-slip control system is summarised in the follow-ing steps:

Step 1. Obtain (v(t),v(t), vd(t),vd(t)) from the vehicle and

posture controller, and apply to the action network to produce the torque command (tl(t),tr(t));

Step 2. Apply the torque command (tl(t), tr(t)) to the

vehicle;

Step 3. Measure (v(t),v(t)) and apply (tl(t),tr(t)) to run the

vehicle model to evaluate R(t þ 1), @R(t þ 1)/@R(t) and @R(t þ 1)/@u(t);

Step 4. Apply (v(t þ 1),v(t þ 1), vd(t),vd(t)) to the critic

network to obtain l(t þ 1), and apply (v(t), v(t), vd(t),

vd(t)) to the verification network to obtainl(t);

Step 5. Calculate the desired outputl8(t) of the verification network;

Step 6. Calculate weight updates of the action network and update the weights according to (34);

Step 7. Calculate weight updates of the verification network and update the weights according to (37);

Step 8. The critic network copies the weight values of the verification network for every fifth sampling time.

4 Simulation results

The vehicle parameters in Lin et al. [10] are listed in Table 1and adopted in the following computer simulations. Anti-slip factor is expressed as a known functionr(x, y). For comparison, the posture controller has a fuzzy logic design, the same as in Lin et al.[9]. It accepts position error (ex, ey)

as inputs to produce the desired linear and angular velocities (vd,vd).

4.1 Simulation 1: Self-learning from scratch Ability of learning from scratch means updating to the neural weights can begin and continue without human help. This is essential in an autonomous robot. In this simu-lation, all the neural weights are initialised randomly in the range [20.1, 0.1]. The primary utility function is (22) with (sv,sv) ¼ (0.25, 0.25). The discount factor in (23) is

g¼ 1. The velocity commands are vd¼ 2 sinf(p/600)kTg,

vd¼ 1.5 cosf(p/1200)kTg. The sampling interval is

T ¼ 0.01 s. The learning process takes 10 000 sampling times. The results show that the linear and angular velocity errors decrease quickly along with increase in sampling times. The weight values in the action and verification networks converge after training for 3000 sampling times. Thereafter the system demonstrates good velocity tracking.

After the self-learning from scratch, the vehicle is commanded to drive on roads with variant slip conditions. The road is divided into five zones and each zone has a specified anti-slip factor as shown in

Figs. 6 and 7. Three roads are studied in the following

simulations:

Road 1: Anti-slip factors in all zones equal 1 (no slip). Road 2: In zones 2 and 4, the anti-slip factor equals 0.6 and in other zones equals 1.

Road 3: In zones 1, 3 and 5, the anti-slip factors equal 0.6 and in other zones equals 1.

The desired trajectory attempts to lead the vehicle to make a left turn. The results obtained from the adaptive critic anti-slip control are compared with those of the hierarchical fuzzy control[9].

Table 1: Mechanical figures of the vehicle

b, m d, m r, m wc, m mc, kg mw, kg Ic, kg m2 _{Iw, kg m}2 _{Im, kg m}2

(6)

4.2 Simulation 2: Trajectory tracking under variant wheel slips

Fig. 6shows the left-turn trajectories obtained by applying the hierarchical fuzzy control to drive the vehicle on roads 1, 2 and 3 respectively. In spite of maintaining stable moving, the tracking errors vary in different roads and the maximum distance between two trajectories is as large as 1.08 m. This reveals that fuzzy control can handle unknown, nonlinear dynamics to obtain stable control but lacks an ability to maintain it.

In the adaptive critic anti-slip control, the neural weights obtained in simulation 1 are used as initial values and then the learning and control begins.Fig. 7 shows the left-turn trajectories of the vehicle driving on roads 1, 2 and 3, respectively. The results show that the responsive trajectories in all three roads are very close. The maximum distance between two trajectories is 0.23 m, much smaller than that of the hierarchical fuzzy control. This confirms that the adaptive critic anti-slip control can comply with changes in the wheel slip to approximate optimal control.

4.3 Simulation 3: Velocity response under variant wheel slips

According to the slip conditions, each trajectory inFigs. 6 and7 is divided into five segments. For road 2 case, the five segments from the beginning to the end of a trajectory have slip values (rl,rr) asr1 ¼ (0.6, 1.0),r2 ¼ (0.6, 0.6),

r3 ¼ (1.0, 1.0), r4 ¼ (0.6, 0.6), r5 ¼ (1.0, 1.0), respect-ively. In segment r1, left and right driving wheels have unequal anti-slip factors. This difference disturbs mainly the angular velocity control. Examining segment r1 in Fig. 8, it is found that the fuzzy velocity control leaves large errors uncorrected, as no slip is assumed. In contrast, segment r1 in Fig. 9 shows the adaptive critic anti-slip control can adapt the action parameters to correct the error quickly.

In segmentsr2 tor5, the anti-slip factors at the left and right driving wheels are equal, but values change from segment to segment. When the vehicle goes from one segment to another, changes in the anti-slip factors disturb mainly the linear velocity control. Fig. 10 shows that at the beginning of eachr2 tor5 segment, large linear velocity

Fig. 7 Left-turn trajectories with the adaptive critic anti-slip control in simulation 2

Fig. 8 Angular velocity of road 2 case with the fuzzy velocity control in simulation 3

Fig. 9 Angular velocity of road 2 case with the adaptive critic anti-slip control in simulation 3

Fig. 6 Left-turn trajectories with the fuzzy velocity control in simulation 2

(7)

error presents. But the error dies out very soon after the action parameters being improved by the DHP learning algorithm. In other words, the adaptive critic anti-slip control system can surely comply with changes in the wheel slip.

5 Conclusion

A wheeled autonomous robot may encounter road con-ditions resulting in wheel slips. The wheel slip modifies the commanded forces unpredictably and challenges the accuracy and stability of the motion control. Without appro-priate anti-slip control, the robot may lose tracking the desired trajectory. In this context, it has been shown that the DHP adaptive critic design enabled the robot control system to adjust its control parameters automatically by learning to satisfy the Bellman equation. The DHP adaptive critic design was implemented with an MLP neural network structure to achieve anti-slip velocity control. The resulting system was demonstrated to be able to improve the control performance significantly under variant wheel slips. In con-trast, in spite of stable control, traditional fuzzy velocity control was shown to be unable to maintain high perform-ance under such conditions. Ideally, the DHP adaptive critic design aimed at learning and control to satisfy the Bellman equation. But in practice, the parameters in the action network were updated merely a small step for each sampling time to obtain learning convergence. Therefore optimal control obtained only in the steady-state situation, in which the control environment did not change and the action parameters reached stable values. Otherwise, the DHP adaptive critic design simply kept on improving

the action parameters and no optimal control was guaranteed in the immediate sampling time.

6 Acknowledgments

The financial support for this research from the National Science Council of Taiwan, Republic of China under grants NSC93-2218-E002-106 and NSC94-2218-E002-049 is gratefully acknowledged.

7 References

1 Werbos, P.J.: ‘Approximate dynamic programming for real-time control and neural modeling’ in White, D.A., and Sofge, D.A. (Eds.): ‘Handbook of intelligent control: neural, fuzzy and adaptive approaches’ (Van Nostrand Reinhold, New York, NY, USA, 1992), pp. 493 – 525

2 Bellman, R.E.: ‘Dynamic programming’ (Princeton University Press, 1957)

3 Prokhorov, D., and Wunsch, D.: ‘Adaptive critic designs’, IEEE Trans. Neural Netw., 1997, 8, pp. 997 – 1007

4 Prokhorov, D.L., and Feldkamp, L.: ‘Analyzing for Lyapunov stability with adaptive critics’, IEEE Trans. Syst. Man Cybern., 1998, 2, pp. 1658 – 1661

5 Lendaris, G., Paintz, C., and Shannon, T.: ‘More on training strategies for critic and action neural nets in dual heuristic programming method’. Proc. IEEE Conf. on Systems, Man and Cybernetics, Orlando, October 1997, pp. 3067 – 3072

6 Lendaris, G., and Shannon, T.: ‘Application considerations for the DHP methodology’. Proc. IEEE Int. Joint Conf. on Neural Networks, Anchorage, AK, 1998, pp. 1013 – 1018

7 Prokhorov, D., Santiago, R., and Wunsch, D.: ‘Adaptive critic designs: a case study for neurocontrol’, Neural Netw., 1995, 8, pp. 1367 – 1372

8 Venayagamoorthy, G.K., Harley, R.G., and Donald, C.W.D.C.: ‘Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator’, IEEE Trans. Neural Netw., 2002, 13, (3), pp. 764 – 773

9 Lin, W.-S.L., Huang, C.-L., and Chuang, M.-K.: ‘Hierarchical fuzzy control for autonomous navigation of wheeled robots’, IEE Proc., Control Theory Appl., 2005, 152, (5), pp. 598 – 606

10 Lin, W.-S.L., Chuang, M.-K., and Tien, G.: ‘Autonomous mobile robot navigation using stereovision’. Proc. IEEE Int. Conf. on Mechatronics, Taipei, Taiwan, 2005, pp. 410 – 415

11 Lin, W.-S.L., Huang, C.-L., Chuang, M.-K., and Liu, G.-C.:

‘Modeling a wheeled mobile robot for autonomous

navigation design’. Proc. IASTED Int. Conf. on Modeling, Identification and Control, Grindelwald, Switzerland, February 2004, pp. 275 – 280

12 Greenwood, D.T.: ‘Principles of dynamics’ (Prentice-Hall, 1988) 13 Tsai, P.-S., Wu, T.-F., Chang, F.-R., and Wang, L.-S.W.: ‘Tracking

control of nonholonomic mobile robot using hybrid structure’. Presented at 6th World Multiconf. on Systemics, Cybernetics and Informatics, Orlando, Florida, 2002

14 Yun, X., and Yamamoto, Y.: ‘Internal dynamics of a wheeled mobile robot’. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 1993, pp. 1288 – 1293

15 Lewis, F.L., Abdallah, C.T., and Dawson, D.M.: ‘Control of robot manipulators’ (MacMillan, New York, 1993)

16 Haykin, S.: ‘Neural networks: a comprehensive foundation’ (Prentice-Hall International, Inc., 1990), Ch. 4

Fig. 10 Linear velocity of road 2 case with the adaptive critic anti-slip control in simulation 3