• 沒有找到結果。

Chapter 5 Control Illustration

5.1 Inverted Pendulum Control System

5.1.1 Evaluating performance of the HEA

The initial parameters of the proposed ISRL-HEA in this example are determined by parameter exploration ([103]). The first study in parameter exploration was proposed by De Jong ([103]). As shown in [103], a small population size is good for the initial performance, and a large population size is good for long-term performance. Moreover, a low mutation rate is good for on-line performance, and a high mutation rate is good for off-line performance. In [104], the author found from his simulation that the best population size and mutation rate were 30 and 0.01, respectively. How parameters affect the methods in this study are as follows:

1) the population size affects both the final performance and the efficiency of GA’s; 2) the crossover rate deals with the frequency to which the crossover step is applied; 3) the mutation rate deals with the second search step which increases the variability of the population. In this study, the parameters are found using the method given in [104]. Therefore, the number of fuzzy rules has the range from 2 to 20 in increments of 1, the group size has the range from 10 to 100 in increments of 10, the crossover rate has the range from 0.25 to 1 in increments of

Evaluating performance of the HEA

The initial parameters of the proposed ISRL-HEA in

parameter exploration ([103]). The first study in parameter exploration was proposed by De Jong ([103]). As shown in [103], a small population

The initial parameters of the proposed ISRL-HEA in The initial parameters of the proposed ISRL-HEA in

Evaluating performance of the HEA

The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in The initial parameters of the proposed ISRL-HEA in The initial parameters of the proposed ISRL-HEA in The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

parameter exploration ([103]). The first study in p

The initial parameters of the proposed ISRL-HEA in parameter exploration ([103]). The first study in p

parameters listed in the ISRL-HEA are defined as the same way. The parameters set for the proposed ISRL-HEA are shown in Table 5.1.

Table 5.1 : The initial parameters of the ISRL-HEA before training.

Parameters Value Parameters Value

Nc 100 Stable TimeSteps_ 5000

Crossover Rate 0.5 Thres TimeStep_ 1000

Mutation Rate 0.3 ERSTimes 50

[σmin,σmax] [0, 2] A 10

[mmin,mmax] [0, 2] λ 0.01

[wmin,wmax] [-20, 20] η 7

[Rmin, Rmax] [3, 12] Generations 300

_

Thres StableTimeSteps 500

In this example, the coding of a rule in a chromosome is the form in Fig. 3.31 in Section 3.1. A total of thirty runs were performed. Each run started at the different initial state (θɺ and xɺ are set for 0, θ and x are set randomly within a predefined range). The learning curves of ISRL-HEA are shown in Fig. 5.2. In this figure, there are thirty runs each run represents that how soon the TNFC can meet the goal state. The fitness value is defined according to Eqs.

4.1-4.5. The higher fitness value by the end of each run represents that the sooner the plant meets the goal set. When the ISRL-HEA is stopped, the best string from the population in the final generation is selected and applied on the testing phase of the inverted pendulum control system. The results of the probability vectors in MCGA are shown in Fig. 5.3. In this figure, the final average number of rules is 5.

The testing results, which consist of the pendulum angle, pendulum angular velocity (in degrees/seconds), and cart velocity (in meters/seconds) are shown in Fig. 5.4. A total of thirty runs were executed in the testing phase. Each line in Fig. 5.4 represents a single run that starts form a different initial state. The results shown in this figure are the first 1,000 of 6,000 control time steps (Thres TimeStep +_ Stable TimeSteps ). As shown in Fig. 5.4, the _ ISRL-HEA successfully controlled the inverted pendulum control system in all thirty runs 3.1. A total of thirty runs were performed. Each ru

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

that how soon the TNFC can meet the goal state. The of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l 3.1. A total of thirty runs were performed. Each ru

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

that how soon the TNFC can meet the goal state. The of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l are set randomly within a predefined range). The l are set randomly within a predefined range). The l are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

of ISRL-HEA are shown in Fig. 5.2. In this figure,

are set randomly within a predefined range). The l of ISRL-HEA are shown in Fig. 5.2. In this figure,

(the pendulum angle, pendulum angular velocity, and cart velocity decrease to 0).

Figure 5.2: The learning curves of the ISRL-HEA.

Figure 5.3: The probability vectors of the ERS step in the ISRL-HEA.

(a) (b)

(c)

Figure 5.4: Control results of the inverted pendulum control system using the ISRL-HEA in Example 1. (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (c) Velocity of the cart.

The reinforcement symbiotic evolution (R-SE) ([29]) and the reinforcement genetic algorithm (R-GA) ([26]) were applied to the same control task to compare with the performance of the ISRL-HEA. In the simulation of [29] and [26], parameters of learning are found by using the method given in [104]. Therefore, four rules were set for the R-SE and R-GA, the population size ranges from 10 to 250 in increments of 10, the crossover rate ranges from 0.25 to 1 in increments of 0.05, and the mutation rate ranges from 0 to 0.3 in exponential increments. The resulting parameters set for these methods (R-SE and the R-GA) are shown as follows: 1) the population sizes of the R-SE and the R-GA were 170 and 70, respectively; 2) the crossover rates of the R-SE and the R-GA were 0.55 and 0.6, respectively;

3) the mutation rate of the R-SE and the R-GA were 0.08 and 0.02, respectively. In R-SE ([29]) and R-GA ([26]), the reinforcement signal is designed base on time-step reinforcement architecture ([18]-[20]). The fitness function in R-SE and R-GA is defined according to

Fitness_Value =TIME_STEP (5.11) where TIME_STEP represents how long the experiment is a “success” in one generation. In this example, Eq. 5.11 represents how long before the pendulum falls apart a certain angle (|θ |> 12°) or the cart runs into the bounds of its track (|x|>2.4m). A control strategy is performance of the ISRL-HEA. In the simulation of [

found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in ranges from 0.25 to 1 in increments of 0.05, and th found by using the method given in [104]. Therefore found by using the method given in [104]. Therefore performance of the ISRL-HEA. In the simulation of [ found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore found by using the method given in [104]. Therefore found by using the method given in [104]. Therefore found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in R-GA, the population size ranges from 10 to 250 in found by using the method given in [104]. Therefore R-GA, the population size ranges from 10 to 250 in

The simulation was carried out for 30 runs. The testing results of the R-SE and R-GA are shown in Figs. 5.5 and 5.6. The results shown in these figures are the first 1,000 of 6,000 control time steps. As shown in Figs. 5.5 and 5.6, not every line meets the control goal ( xɺ , θ and θɺ decay to zero). It’s obvious that the ISRL-HEA obtains better result when compared with [29] and [26], since the xɺ , θ and θɺ of the ISRL swing in a smaller range near zero.

(a) (b)

(c)

Figure 5.5: Control results of the inverted pendulum control system using the R-SE in Example 1. (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (c) Velocity of the cart.

(a) (b)

(a) (b)

(c)

Figure 5.6: Control results of the inverted pendulum control system using the R-GA in Example 1. (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (d) Velocity of the cart.

In the further simulation, we select the best-trained individual of the ISRL-HEA, R-GA and R-SE in the training phase, and extend the control time steps to 100,000 in the testing phase. The simulation results, which consist of the pendulum angle, angular velocity of the pendulum, and the cart velocity, are shown in Fig. 5.7. Each line in Fig. 5.7 represents the result of the last 1000 time steps in a run that starts from the different initial state. As shown in Fig. 5.7 (d)-(i), not every line meets the control goal G1 in the R-SE and R-GA. Moreover, the pendulum angle may swing outside the boundary at the last 1000 time steps. However, in the proposed ISRL-HEA, each line can meet the control goal G1 and the pendulum is kept upright during the last 1000 time steps. The percentage that the R-SE and the R-GA controls the plant to G1 are 56% (with 13 runs that the plant unreach G1 and 4 out of 13 runs that the

Figure 5.6: Control results of the inverted pendulum control system using the R-GA in Example 1. (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (d) Velocity of the cart.

(c)

Figure 5.6: Control results of the inverted pendulum control system using the R-GA in Example 1. (a) Angle of the pendulum. (b) Angular velocity of the

Figure 5.6: Control results of the inverted pendulum control system using the R-GA in Example 1. (a) Figure 5.6: Control results of the inverted pendulu

Figure 5.6: Control results of the inverted pendulu Figure 5.6: Control results of the inverted pendulu Figure 5.6: Control results of the inverted pendulu Figure 5.6: Control results of the inverted pendulu Figure 5.6: Control results of the inverted pendulu

pendulum swings outside the boundary) and 54% (with 14 runs that the plant unreach G1 and 5 out of 14 runs that the pendulum swings outside the boundary) respectively. The reason is that the fitness function used in the R-SE and R-GA only evaluates how long before the pendulum falls apart a certain angle (|θ |> 12°) or the cart runs into the bounds of its track (|x|>2.4m). Therefore, the system may not reach G1 and when the control time steps are extend to 100,000 in the testing phase the pendulum may swing outside the boundary.

However, in the ISRL-HEA, the percentage that the plant remains in G1 during the last 1000 time steps is 100%. It’s obvious that the ISRL allows the pendulum angle, angular velocity of the pendulum and the cart velocity to swing a small range near zero and stabilize the control system.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

(i)

Figure 5.7: Control results of the inverted pendulum control system in Example 1. (a) Angle of the pendulum of ISRL-HEA. (b) Angular velocity of the pendulum of ISRL-HEA. (c) Velocity of the cart of ISRL-HEA. (d) Angle of the pendulum of R-SE. (e) Angular velocity of the pendulum of R-SE. (f) Velocity of the cart of R-SE. (g) Angle of the pendulum of R-GA. (h) Angular velocity of the pendulum of R-GA. (i) Velocity of the cart of R-GA.

The accuracy and CPU time comparison of ISRL-HEA, R-SE, and R-GA are shown in

(g) (h) (g) (h) (g) (h) (g) (h) (g) (h) (g) (h)

ISRL adopts a strict restriction in the earlier time steps and evaluates the control system by how soon the plant can meet the control goal. The individual in ISRL-HEA with better performance means it controls the plant to the goal set sooner. As a result, the CPU time of ISRL-HEA is dramatic less than that of R-SE and R-GA. For example, in the R-SE and R-GA, if one individual fails around 5000 time step, this individual is set with a high fitness value and causes other individuals in the population to approach. At the time when most individuals fail around 5000 time step, the evolution in one generation becomes very time-consuming. As shown in the Table 5.2, when compared with the traditional reinforcement signal design, the proposed ISRL can reduce the CPU time and always control the plant to the goal set.

Moreover, the HEA can determine the fuzzy rules automatically without trail and error testing.

The genetic reinforcement learning for neuro control (GENITOR) ([57]), symbiotic adaptive neuro-evolution (SANE) ([96]), temporal difference and genetic algorithm-based reinforcement learning (TDGAR) ([20]), combination of online clustering and Q-value based GA for reinforcement fuzzy system (CQGAF) ([43]), efficient reinforcement learning through dynamical symbiotic evolution (ERDSE) ([44]), and enforce sub-population (ESP) ([40]) have been applied to the same control task and the simulation results are listed in Table 5.2.

The accuracy of the controller meet the control goal and keep the pendulum in 100000 time steps and the CPU time are shown in Table 5.2. A total of thirty runs were executed. Each run started at the different initial state. The initial parameters of these methods ([57], [96], [20], [43], [44], and [40]) are determined according to [104]. In these methods, the network size has the range from Rmin to Rmax in increments of 1. This dissertation determines the network sizes by executing an evolutionary algorithm with fixed string length for each specification (Rmin to Rmax in increments of 1) of the number of network sizes and then computes the average of the generations. In [57], the normal evolutionary algorithm is used to evolve the weights of a

The genetic reinforcement learning for neuro contro adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination GA for reinforcement fuzzy system (CQGAF) ([43]), e adaptive neuro-evolution (SANE) ([96]), temporal di adaptive neuro-evolution (SANE) ([96]), temporal di

The genetic reinforcement learning for neuro contro adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di adaptive neuro-evolution (SANE) ([96]), temporal di adaptive neuro-evolution (SANE) ([96]), temporal di adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination reinforcement learning (TDGAR) ([20]), combination adaptive neuro-evolution (SANE) ([96]), temporal di reinforcement learning (TDGAR) ([20]), combination

the output layer. After trial-and-error tests, the network size is ten. In [96], the symbiotic evolutionary algorithm is used to evolve a two-layer neural network. In [96], the network size is ten. The TDGAR ([20]) that consists of the critic network and action network to the learning system. The critic network is a standard three-layer feedforward network using sigmoid functions in the hidden layer and output layer. The action network is a fuzzy neural network with five layers of nodes and each layer performs one stage of the fuzzy inference process. There are five hidden nodes and five fuzzy rules in the critic network and the action network. In CQGAF ([43]), the fuzzy controller with Q-value based genetic algorithm is proposed to solve controller problems. After trial-and-error tests, the final average number of rules in CQGAF of thirty runs is 8 by using the on-line clustering algorithm. In ERDSE ([44]), the TSK type neuro-fuzzy controller is adopted to solve controller problems. After trial-and-error tests, the number of rules in ERDSE is 7. In ESP ([40]), the author proposed enforced sub-populations to evaluate solution locally. There are five sub-populations in ESP.

The other parameters set for six methods ([57], [96], [20], [43], [44], and [40]) are as follows:

1) the population sizes of the six methods are 130, 170, 100, 130, 80 and 40, respectively; 2) the crossover rates of the six methods are 0.45, 0.55, 0.35, 0.45, 0.8 and 0.5, respectively; 3) the mutation rate of the six methods are 0.21, 0.17, 0.16, 0.24, 0.1 and 0.18, respectively.

When each training step is stopped, the best combination of strings from the population in the final generation is selected and tested with different initial states in thirty times.

As shown in Table 5.2, the proposed ISRL-HEA is more feasible and effective when compared with other existing models ([26], [29], [57], [96], [20], [43], [44], and [40]). The advantages of the ISRL-HEA can be listed as follows:

1. Using the concept of statistics, the ISRL-HEA computes the suitable number of fuzzy rules by probability to avoid the flaw that the number of fuzzy rules has to be assigned in advance under different environments.

trial-and-error tests, the number of rules in ERDSE

enforced sub-populations to evaluate solution locally. There are five sub-populations in ESP.

The other parameters set for six methods ([57], [96], [20], [43], [44], and [40]) are as follows:

1) the population sizes of the six methods are 130, enforced sub-populations to evaluate solution local enforced sub-populations to evaluate solution local trial-and-error tests, the number of rules in ERDSE enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local enforced sub-populations to evaluate solution local enforced sub-populations to evaluate solution local enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96 The other parameters set for six methods ([57], [96 enforced sub-populations to evaluate solution local The other parameters set for six methods ([57], [96

Lyapunov-based safe reinforcement learning. It has better capability to stabilize the plant under different initial states.

Table 5. 2: Performance comparison of various existing models.

CPU time Method

Mean Best Worst Std.

Accuracy

GENITOR ([57]) 120.95 61.34 320.36 92.95 50%

SANE ([96]) 97.56 48.54 254.84 83.56 61%

R-GA ([26]) 89.83 34.85 192.93 69.94 54%

R-SE ([29]) 73.14 28.66 149.43 57.87 56%

TDGAR ([20]) 69.13 26.54 112.73 41.58 53%

ESP ([40]) 58.32 22.08 95.57 35.27 56%

ERDSE ([44]) 51.19 20.77 88.53 30.74 67%

CQGAF ([43]) 48.82 18.79 84.39 26.31 59%

ISRL-HEA 39.97 15.10 71.01 18.23 100%

To demonstrate the proposed ISRL, in this example the safe reinforcement learning (SRL) ([32]) is used. Therefore, the SRL-HEA is used to compare performance with the proposed ISRL-HEA. The simulation was carried out for 30 runs. The goal sets are defined same as the ISRL-HEA. The testing results of the SRL-HEA are shown in Fig. 5.8. The results shown in

To demonstrate the proposed ISRL, in this example the safe reinforcement learning (SRL) ([32]) is used. Therefore, the SRL-HEA is used to compare performance with the proposed ISRL-HEA. The simulation was carried out for 30 runs. The goal sets are defined same as the ISRL-HEA. The testing results of the SRL-HEA are shown in Fig. 5.8. The results shown in