• 沒有找到結果。

Chapter 5 Control Illustration

5.2 Tandem Pendulum Control System

Although the ISRL-SAEAs can obtain better performance than other methods in an inverted pendulum control system; however, in such classic setup of a pendulum balancing construct TNFS models with different numbers of rul

method, it is observed that DMCS can reduce CPU tim

from suitable groups can be selected to cooperate for generating better solutions. As shown in from suitable groups can be selected to cooperate f

from suitable groups can be selected to cooperate for generating better solutions. As shown in from suitable groups can be selected to cooperate f

Table 5.12, the SAG-SEFA (Type IV) performs better method, it is observed that DMCS can reduce CPU tim method, it is observed that DMCS can reduce CPU tim construct TNFS models with different numbers of rul method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim method, it is observed that DMCS can reduce CPU tim method, it is observed that DMCS can reduce CPU tim method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f from suitable groups can be selected to cooperate f method, it is observed that DMCS can reduce CPU tim from suitable groups can be selected to cooperate f

problem, the task is too easy to find solutions quickly through random search. About this, in this example, a variety of extensions to a basic cart-pendulum balancing problem have been suggested. In [58]-[60], the author proposed several variations of an inverted pendulum control system. The most challenging extension of an inverted pendulum control system ([58]-[60]) is a tandem pendulum control system, where two pendulums of different length must be balanced synchronously. Therefore, a tandem pendulum control system is used to evaluate the proposed ISRL-SAEAs. As shown in Fig. 5.16, a tandem pendulum control system is the problem of learning how to balance two pendulums. There are four state variables in the system:θi, the angle of the ith pendulum;θɺ , the angular velocity of the ith i pendulum. The only control action is u, which is the amount of force applied to cart to move it toward left or right. The system fails when the pendulum falls past a certain angle ( 36±  is used here).

l

2

θ

2

F

mg2

θ

1

l

1

mg1

Figure 5. 16: The tandem pendulum control system.

The motion equations of the tandem pendulum control system ([58]-[60]) are described as follow:

( ) ( )

sin cos , 1, 2

i i i i i i i i

Jθɺɺ =m gl θ −m l u θ i= (5.12) where for θidenotes the angle between ith pendulum and the vertical, Ji is the inertia moment

θ

with respect to the pivot point, mi is the mass of the ith pendulum, li is the distance between the center of mass and the pivot, g is the gravity acceleration, and u is the acceleration of the cart which is used as the control input.

By setting the potential energy of ith pendulum at the vertical to be 0, its energy consists of the kinetic energy of rotation with respect to the pivot and the potential energy is expressed as the following equation:

( )

2

1 1 1

( , ) 1 (cos 1)

i i i 2 i i

E θ θɺ = Jθɺ +m gl θ − (5.13) A control strategy attempts to drive E1 and E2 to zero is obtained according to the Lyapunov function as show below:

1 2

, 1, 2.

2 i

V = E i= (5.14) Using the fallowing equation:

( )

i i i icos i

Eɺ = −m gl uθɺ θ . (5.15) We can obtain

Vɺ= −Gu, (5.16) where

=

= 2

1

) cos(

i

i i igl m

G θɺ θ . (5.17) The parameters used for the tandem pendulum control system are shown in Table 5.13.

Table 5. 13: The parameters for the tandem pendulum control system.

Parameters Description Value

θ Angle of the pendulum [-36, 36] deg.

u Force applied to cart [-10, 10] N

l1 Half length of 1st pendulum 0.5m

l2 Half length of 2nd pendulum 0.05m

m1 mass of the 1st pendulum 0.1kg

m2 mass of the 2nd pendulum 0.01kg

i i i i i

)

Eii m gl uii ii ii ii . (5.15)

, (5.16)

i i i θicos

( ( (

θi

i i i i i

i i i i

( )

i

i i i i i

Eiiiiiiiii m gl uiiiiiiiii iiiiiiiii θθθθθiiiiiiiiicoscoscoscos

( ( (

θθθθθiiiiiiiii

Eiiii m gl uiiii iiii θiiii θiiii

The purpose of this control task is to determine a sequence of forces applying to the cart to balance the pendulums upright, and maintains the cart as stationary as possible. Hence, we define a goal set comprising near-upright and near-stationary states as

{ }

agent based on Lyapunov analysis is proposed as follows:

( ) ( )

control law satisfies the Theorem 4.1 defined in [32]. Denote the state of the environment at time step t =

(

q qt, ɺt

)

=st.Theorem 4.1 tells that the agent will bring the environment to G 1 within L s( ) /0 ∆ steps, and remain the environment in the set { : (s L st+1)≤L s( )}t . In our simulation, since the descent step size can not be ensured. Theorem 4.1 can be reduced to the form that the agent will bring the environment to G and remain the environment until it 1 achieves G eventually, if the controlling time step is long enough. 2

In the simulation of the tandem pendulum control system, the original successful region of the variables is−36 ≤θi ≤36. The strict successful region of θ is described in Eq. 5.13.

The constraints on the variables are−36 ≤θ ≤36 and -10N≤ u≤ 10N. A control strategy agent based on Lyapunov analysis is proposed as fol

( )

( ) , if E , < 0

is deemed successful if it can meet the control goal (θ and θɺ decade to zero). The four

input variables (θ1, θɺ1, θ2, θɺ2) and the output u(t) are normalized between 0 and 1, and u(t): [-10, 10]. The four normalized state variables are used as inputs to the TNFC. The values are floating-point numbers assigned to the SAEAs initially. The fitness function is defined according to Eq. 4.1-4.5.

In this example, the performance evaluation of the SAEAs consists of the HEA, SACG-SE, and SAG-SEFA. In the following sections, the performances of three methods are discussed.

5.2.1 Evaluating performance of the HEA

The initial parameters of the proposed ISRL-HEA in this example are determined by parameter exploration ([104). The parameters set for the ISRL-HEA are shown in Table 5.14.

Table 5. 14: The initial parameters of ISRL-HEA before training.

Parameters Value Parameters Value

Nc 80 Stable TimeSteps_ 5000

Crossover Rate 0.4 Thres TimeStep_ 1000

Mutation Rate 0.25 ERSTimes 50

[σmin,σmax] [0, 2] A 10

[mmin,mmax] [0, 2] λ 0.01

[wmin,wmax] [-20, 20] η 7

[Rmin, Rmax] [3, 12] Generations 500

_

Thres StableTimeSteps 500

In this example, the coding of a rule in a chromosome is the form in Fig. 3.1 in Section 3.1. A total of thirty runs were performed. Each run started at the different initial state (θɺ are i set for 0, θi are set randomly within a predefined range). The learning curves of ISRL-HEA are shown in Fig. 5.17. In this figure, there are thirty runs each run represents that how soon the TNFC can meet the goal state. The fitness value is defined according to Eqs. 4.1-4.5. The higher fitness value by the end of each run represents that the sooner the plant meets the goal

Table 5. 14: The initial parameters of ISRL-HEA bef

Parameters Value Parameters Value

0.25

Parameters Value Parameters Value

Table 5. 14: The initial parameters of ISRL-HEA bef

Parameters Value Parameters Value

0.4

Parameters Value Parameters Value

Parameters Value Parameters Value

80 0.4 80

Parameters Value Parameters Value

Parameters Value Parameters Value

Parameters Value Parameters Value

0.4

set. When the ISRL-HEA is stopped, the best string from the population in the final generation is selected and applied on the testing phase of a tandem pendulum control system.

After performing the MCGA, the final average number of rules is 6.

The testing results, which consist of the pendulums angle and the pendulums angular velocity (in degrees/seconds) are shown in Fig. 5.18. A total of thirty runs were executed in the testing phase. Each line in Fig. 5.18 represents a single run that starts form a different initial state. The results shown in this figure are the first 1,000 of 6,000 control time steps (Thres TimeStep +_ Stable TimeSteps ). As shown in Fig. 5.18, the ISRL-HEA successfully _ controlled the tandem pendulum control system in all thirty runs (the pendulums angle, pendulums angular velocity decrease to 0).

Figure 5. 17: The learning curves of the ISRL-HEA.

(a) (b)

(c) (d)

Figure 5. 18: Control results of the tandem pendulum control system using the ISRL-HEA. (a) Angle of the first pendulum. (b) Angle of the second pendulum. (c) Angular velocity of the first pendulum. (d) Angular velocity of the second pendulum.

As well as Section 5.1.1, the R-SE ([29]) and R-GA ([26]) were applied to this example to compare to the performance of the ISRL-HEA. The parameters are found using the method given in [104]. In R-SE ([29]) and (R-GA [26]), the reinforcement signal is designed based on time-step reinforcement architecture ([18]-[20]). The fitness function in R-SE and R-GA to train the TNFC is defined according to

Fitness_Value =TIME_STEP (5.19) where TIME_STEP represents how long the experiment is a “success” in one generation. In this example, Eq. 5.19 represents how long before the pendulum falls apart a certain angle ( 36±  is used here). A control strategy is deemed successful if it can balance pendulums for 6,000 time steps.

The simulation was carried out for 30 runs. The testing results of the R-SE and R-GA are shown in Figs. 5.19 and 5.20. The results shown in these figures are the first 1,000 of 6,000 control time steps. As shown in Figs. 5.19 and 5.20, not every line meets the control goal (θi and θɺ decay to zero). It’s obvious that the ISRL-HEA obtains better result when compared i

with [29] and [26], since θi and θɺ of the ISRL swing in a smaller range near zero. i given in [104]. In R-SE ([29]) and (R-GA [26]), the

time-step reinforcement architecture ([18]-[20]). The fitness function in R-SE and R-GA to train the TNFC is defined according to

time-step reinforcement architecture ([18]-[20]). T time-step reinforcement architecture ([18]-[20]). T time-step reinforcement architecture ([18]-[20]). T given in [104]. In R-SE ([29]) and (R-GA [26]), the given in [104]. In R-SE ([29]) and (R-GA [26]), the time-step reinforcement architecture ([18]-[20]). T train the TNFC is defined according to

time-step reinforcement architecture ([18]-[20]). T time-step reinforcement architecture ([18]-[20]). T time-step reinforcement architecture ([18]-[20]). T given in [104]. In R-SE ([29]) and (R-GA [26]), the given in [104]. In R-SE ([29]) and (R-GA [26]), the given in [104]. In R-SE ([29]) and (R-GA [26]), the

train the TNFC is defined according to

(a) (b)

(c) (d)

Figure 5. 19: Control results of the tandem pendulum control system using the R-SE. (a) Angle of the first pendulum. (b) Angle of the second pendulum. (c) Angular velocity of the first pendulum. (d) Angular velocity of the second pendulum.

(a) (b)

(c) (d)

Figure 5. 19: Control results of the tandem pendulum control system using the R-SE. (a) Angle of the f (c) (d)

(c) (d)

Figure 5. 19: Control results of the tandem pendulum control system using the R-SE. (a) Angle of the f (c) (d)

(c) (d) (c) (d) (c) (d) (c) (d) (c) (d) (c) (d)

(c) (d)

Figure 5. 20: Control results of the tandem pendulum control system using the R-GA. (a) Angle of the first pendulum. (b) Angle of the second pendulum. (c) Angular velocity of the first pendulum. (d) Angular velocity of the second pendulum.

In the further simulation, we select the best-trained individual of the proposed ISRL-HEA, R-GA and R-SE in the training phase, and extend the control time steps to 100,000 in the testing phase. The simulation results, which consist of the pendulums angle and pendulums angular velocity, are shown in Fig. 5.21. Each line in Fig. 5.21 represents the result of the last 1000 time steps in a run that starts from the different initial state. As shown in Fig. 5.21 (e)-(l), not every line meets the control goal G1 in the R-SE and R-GA. Moreover, the pendulums angle may swing outside the boundary at the last 1000 time steps. However, in the proposed ISRL-HEA, each line can meet the control goal G1 and the pendulums are kept upright during the last 1000 time steps. The percentage that the R-SE and the R-GA controls the plant to G1 are 53% (with 14 runs that the plant unreach G1 and 4 out of 13 runs that the pendulum swings outside the boundary) and 50% (with 15 runs that the plant unreach G1 and 5 out of 14 runs that the pendulum swings outside the boundary) respectively. However, in the ISRL-HEA, the percentage that the plant remains in G1 during the last 1000 time steps is 100%. It’s obvious that the ISRL allows the pendulums angle, the pendulums angular velocity and the cart velocity to swing a small range near zero and stabilize the control system.

ISRL-HEA, R-GA and R-SE in the training phase, and 100,000 in the testing phase. The simulation result

and pendulums angular velocity, are shown in Fig. 5

result of the last 1000 time steps in a run that starts from the different initial state. As shown and pendulums angular velocity, are shown in Fig. 5

and pendulums angular velocity, are shown in Fig. 5 100,000 in the testing phase. The simulation result 100,000 in the testing phase. The simulation result and pendulums angular velocity, are shown in Fig. 5 result of the last 1000 time steps in a run that st and pendulums angular velocity, are shown in Fig. 5 100,000 in the testing phase. The simulation result and pendulums angular velocity, are shown in Fig. 5 and pendulums angular velocity, are shown in Fig. 5 100,000 in the testing phase. The simulation result 100,000 in the testing phase. The simulation result 100,000 in the testing phase. The simulation result 100,000 in the testing phase. The simulation result

(a) (b)

(c) (d)

(e) (f) (c) (d) (c) (d) (c) (d) (c) (d) (c) (d) (c) (d)

(g) (h)

(i) (j)

(k) (l)

Figure 5. 21: Control results of the tandem pendulum control system. (a) Angle of the first pendulum of ISRL-HEA. (b) Angle of the second pendulum of ISRL-HEA. (c) Angular velocity of the first pendulum of ISRL-HEA. (d) Angular velocity of the second pendulum of ISRL-HEA. (e) Angle of the first pendulum of R-SE. (f) Angle of the second pendulum of R-SE. (g) Angular velocity of the first pendulum of R-SE. (h) Angular velocity of the second pendulum of R-SE. (i) Angle of the first pendulum of R-GA. (j) Angle of the second pendulum of R-GA. (k) Angular velocity of the first pendulum of R-GA. (l) Angular velocity of the second pendulum of R-GA.

The accuracy and CPU time comparison of ISRL-HEA, R-SE, and R-GA are shown in Table 5.15. The ISRL-HEA needs less CPU time than R-SE and R-GA. The reason is that the ISRL adopts a strict restriction in the earlier time steps and evaluates the control system by how soon the plant can meet the control goal. The individuals in ISRL-HEA with better performance mean it controls the plant to the goal set sooner.

The GENITOR ([57]), SANE ([96]), TDGAR ([20]), CQGAF ([43]), ERDSE ([44]), and enforce sub-population (ESP) ([40]) methods have been applied to the same control task and the simulation results are listed in Table 5.15. The accuracy of the controller meets the control goal and keep the pendulum in 100000 time steps and the CPU time are shown in Table 5.15.

A total of thirty runs were executed. Each run started at the different initial state. The initial parameters of these methods ([57], [96], [20], [43], [44], and [40]) are determined according to [104]. In these methods, we determine the size of network structure by executing algorithms with fixed string length for each specification of the size of network structure and then compute the average of the generations. As shown in Table 5.15, the proposed ISRL-HEA is more feasible and effective when compared with other existing models ([26], [29], [57], [96], [20], [43], [44], and [40]).

Table 5. 15: Performance comparison of various existing models.

CPU time

algorithms with fixed string length for each specification of the size of network structure and then compute the average of the generations. As sho

ISRL-HEA is more feasible and effective when compar algorithms with fixed string length for each specif algorithms with fixed string length for each specif algorithms with fixed string length for each specif algorithms with fixed string length for each specif

to [104]. In these methods, we determine the size o

algorithms with fixed string length for each specification of the size of network structure and algorithms with fixed string length for each specif

algorithms with fixed string length for each specif

then compute the average of the generations. As sho algorithms with fixed string length for each specif

then compute the average of the generations. As sho algorithms with fixed string length for each specif

then compute the average of the generations. As sho algorithms with fixed string length for each specif

then compute the average of the generations. As sho then compute the average of the generations. As sho algorithms with fixed string length for each specif

algorithms with fixed string length for each specif

then compute the average of the generations. As sho then compute the average of the generations. As sho algorithms with fixed string length for each specif

algorithms with fixed string length for each specif algorithms with fixed string length for each specif algorithms with fixed string length for each specif

then compute the average of the generations. As sho then compute the average of the generations. As sho algorithms with fixed string length for each specif

then compute the average of the generations. As sho

5.2.2 Evaluating performance of the SACG-SE

The initial parameters of the proposed ISRL-SACG-SE in this example are determined by parameter exploration ([104]). The parameters set for the ISRL-SACG-SE are shown in Table 5.16.

Table 5. 16: The initial parameters of ISRL-SACG-SE before training.

Parameters Value Parameters Value

Nc 30 Stable TimeSteps_ 5000

Crossover Rate 0.35 Thres TimeStep_ 1000

Mutation Rate 0.25 TSSATimes 50

[σmin,σmax] [0, 2] A 10

[mmin,mmax] [0, 2] λ 0.01

[wmin,wmax] [-20, 20] η 7

[Rmin, Rmax] [3, 12] Generations 500

Psize 20 Thres StableTimeSteps_ 500

A total of thirty runs were performed. Each run started at the different initial state (θɺ are i set for 0, θi are set randomly according to the predefined ranges). After performing the TSSA, the final average number of rules is 6. The learning curves of the SACG-SE after thirty runs are shown in Fig. 5.22. In this figure, there are thirty runs each run represents that how soon the TNFC can meet the goal state. When SACG-SE is stopped, the best combination of strings from the groups in the final generation is selected and tested on the tandem pendulum control system.

The simulation was carried out for thirty runs. The simulation results, which consist of the pendulums angle and angular velocity of pendulums, are shown in Fig. 5.23. Each line in Fig. 5.23 represents each run with a different initial state. The results shown in this figure are the first 1,000 of 6,000 control time steps (Thres TimeStep +_ Stable TimeSteps ). As shown _ in Fig. 5.23, the ISRL-SACG-SE successfully controlled the tandem pendulum control system in all thirty runs (the pendulums angle and pendulums angular velocity decrease to 0).

A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

A total of thirty runs were performed. Each run sta A total of thirty runs were performed. Each run sta A total of thirty runs were performed. Each run sta A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

A total of thirty runs were performed. Each run sta A total of thirty runs were performed. Each run sta A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range are set randomly according to the predefined range A total of thirty runs were performed. Each run sta

are set randomly according to the predefined range

As well as ISRL-HEA, we select the best-trained individual of the proposed ISRL-SACG-SE in the training phase, and extend the control time steps to 100,000 in the testing phase. The simulation results, which consist of the pendulums angle and the

As well as ISRL-HEA, we select the best-trained individual of the proposed ISRL-SACG-SE in the training phase, and extend the control time steps to 100,000 in the testing phase. The simulation results, which consist of the pendulums angle and the