Evaluating performance of the SACG-SE - Inverted Pendulum Control System

Chapter 5 Control Illustration

5.1 Inverted Pendulum Control System

5.1.2 Evaluating performance of the SACG-SE

In this section, the inverted pendulum control system is used to evaluate the performance of the SACG-SE. The initial parameters of the proposed ISRL-SACG-SE in this example are the proposed ERS, the number of individuals with sa

Compare to Type I and Type III method, the Type I n control system. The reason is that the Type I metho the proposed ERS, the number of individuals with sa

Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n Compare to Type I and Type III method, the Type I n

determined by parameter exploration ([104]). The parameters set for the ISRL-SACG-SE are shown in Table 5.5.

Table 5. 5: The initial parameters of the ISRL-SACG-SE before training.

Parameters Value Parameters Value

Nc 20 Stable TimeSteps_ 5000

Crossover Rate 0.4 Thres TimeStep_ 1000

Mutation Rate 0.15 TSSATimes 50

[σ_min,σ_max] [0, 2] A 10

[m_min,m_max] [0, 2] λ 0.01

[w_min,w_max] [-20, 20] η 7

[Rmin, Rmax] [3, 12] Generations 300

Psize 18 Thres StableTimeSteps_ 500

A total of thirty runs were performed. Each run started at the different initial state (θɺ and xɺ are set for 0, θ and x are set randomly according to the predefined ranges). Figure 5.9 shows one run of the results of the probability vectors in the TSSA. In this figure, the final optimal number of rules is 4. Table 5.6 shows the mean, best, and worst of the optimal number of rules from thirty runs. The learning curve of the ISRL-SACG-SE after thirty runs is shown in Fig. 5.10. In this figure, there are thirty runs each run represents how soon the TNFC can meet the goal state. When ISRL-SACG-SE is stopped, the best combination of strings from the groups in the final generation is selected and tested on the inverted pendulum control system.

are set randomly according to the predefined range

5.9 shows one run of the results of the probability vectors in the TSSA. In this figure, the final optimal number of rules is 4. Table 5.6 shows the m

number of rules from thirty runs. The learning curv optimal number of rules is 4. Table 5.6 shows the m

5.9 shows one run of the results of the probability vectors in the TSSA. In this figure, the final 5.9 shows one run of the results of the probability

are set randomly according to the predefined range

5.9 shows one run of the results of the probability vectors in the TSSA. In this figure, the final optimal number of rules is 4. Table 5.6 shows the m

number of rules from thirty runs. The learning curv optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

5.9 shows one run of the results of the probability 5.9 shows one run of the results of the probability 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m optimal number of rules is 4. Table 5.6 shows the m 5.9 shows one run of the results of the probability

optimal number of rules is 4. Table 5.6 shows the m

Figure 5. 9: The results of the probability vectors in the TSSA.

Table 5. 6: The number of rules from thirty runs of the TSSA.

Method Mean Best Worst

ISRL-SACG-SE 4 3 10

The simulation was carried out for thirty runs. The successful results, which consist of the pendulum angle, angular velocity of the pendulum (in degrees/seconds), and the velocity of the cart (in meters/seconds) are shown in Fig. 5.11. Each line in Fig. 5.11 represents each run with a different initial state. The results shown in this figure are the first 1,000 of 6,000 control time steps (Thres TimeStep +_ Stable TimeSteps ). As shown in Fig. 5.11, the _ ISRL-SACG-SE successfully controlled the inverted pendulum control system in all thirty runs (the pendulum angle, pendulum angular velocity, and cart velocity decrease to 0).

As well as ISRL-HEA, we select the best-trained individual of the proposed ISRL-SACG-SE in the training phase, and extend the control time steps to 100,000 in the testing phase. The simulation results, which consist of the pendulum angle, the pendulum angular velocity, and the cart velocity, are shown in Fig. 5.12. Each line in Fig. 5.12 represents the result of the last 1000 time steps in a run that starts from the different initial state. As shown in Fig. 5.12, each line can meet the control goal G₁ and the pendulum is kept upright during the last 1000 time steps. In the ISRL-SACG-SE, the percentage that the plant

The simulation was carried out for thirty runs. The the pendulum angle, angular velocity of the pendulu

of the cart (in meters/seconds) are shown in Fig. 5.11. Each line in Fig. 5.11 represents each the pendulum angle, angular velocity of the pendulu

the pendulum angle, angular velocity of the pendulu the pendulum angle, angular velocity of the pendulu

The simulation was carried out for thirty runs. The the pendulum angle, angular velocity of the pendulu

The simulation was carried out for thirty runs. The the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5 the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5 of the cart (in meters/seconds) are shown in Fig. 5 the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5 the pendulum angle, angular velocity of the pendulu the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5 the pendulum angle, angular velocity of the pendulu the pendulum angle, angular velocity of the pendulu the pendulum angle, angular velocity of the pendulu the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5 of the cart (in meters/seconds) are shown in Fig. 5 the pendulum angle, angular velocity of the pendulu of the cart (in meters/seconds) are shown in Fig. 5

remains in G₁ during the last 1000 time steps is 100%. It’s obvious that the ISRL allows the pendulum angle, the pendulum angular velocity and the cart velocity to swing a small range near zero and stabilize the control system.

Figure 5. 10: The learning curve of the SACG-SE.

(a) (b)

(c)

Figure 5. 11: Control results of the inverted pendulum control system using the ISRL-SACG-SE in Example 1 (first 1000 time). (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (c) Velocity of the cart.

(a) (b)

(c)

Figure 5. 12: Control results of the inverted pendulum control system using the ISRL-SACG-SE in Example 1 (last 1000 time). (a) Angle of the pendulum. (b) Angular velocity of the pendulum. (c) Velocity of the cart.

In this example, in order to demonstrate the effectiveness and efficiency of the proposed ISRL-SACG-SE, the R-SE ([29]) and R-GA ([26]) are used to compare with ISRL-SACG-SE.

As shown in Fig. 5.7 (d)-(i), the accuracy of the TNFC with the R-SE and R-GA that the pendulum does not swing outside the boundary after 6,000 time steps are 56% and 54%.

However, in the ISRL-SACG-SE, the accuracy of the TNFC success meet the control goal and keep the pendulum in 100,000 time steps is 100%. As shown in Fig. 5.12 and 5.7, the ISRL-SACG-SE can perform better than the R-SE and R-GA.

The accuracy and CPU time comparison of the ISRL-SACG-SE, R-SE, and R-GA are shown in Table 5.7. As shown in the Table 5.7, when compared with the traditional

Figure 5. 12: Control results of the inverted pendu Figure 5. 12: Control results of the inverted pendu (c) (c)

reinforcement signal design, the proposed ISRL reduces the CPU time and always controls the plant to the goal set. Moreover, the SACG-SE can not only determine the fuzzy rules automatically without trail and error testing but also let the chromosomes that perform well to cooperate for generating better solutions.

Compare to HEA, the SACG-SE can obtain smaller CPU times because of the SACG-SE considers both of cooperation and specialization. As shown in Fig. 5.2 and 5.10, the learning curves of the SACG-SE converge more quickly than those of the HEA. The worst, mean, best and standard deviation of CPU time of the HEA and SACG-SE are shown in Table 5.7. As shown in this table, the SACG-SE obtains better performance than the HEA.

The GENITOR ([57]), SANE ([96]), TDGAR ([20]), CQGAF ([43]), ERDSE ([44]), and ESP ([40]) methods have been applied to the same control problem. The accuracy and CPU time are shown in Table 5.7. A total of thirty runs were performed. Each run started at the different initial state. The initial parameters of these methods ([57], [69], [20], [43], [44], and [40]) are determined according to Section 5.1.1. The control time steps for testing are extended to 100,000 time steps. As shown in Table 5.7, the proposed ISRL-SACG-SE is more feasible and effective when compared with other existing models ([26], [29], [57], [96], [20], [43], [44], and [40]). The advantages of the ISRL-SACG-SE can be listed as follows:

1. Using the TSSA, the ISRL-SACG-SE computes by probability the suitable number of fuzzy rules to avoid the flaw that the number of fuzzy rules has to be assigned in advance under different environments.

2. The ISRL enhances the stability of the control system by using the design of Lyapunov-based safe reinforcement learning. It has better capability to stabilize the plant under different initial states.

3. The ECCS lets the well-perform chromosomes to cooperate for generating better solutions in the generations.

time are shown in Table 5.7. A total of thirty runs

different initial state. The initial parameters of these methods ([57], [69], [20], [43], [44], and [40]) are determined according to Section 5.1.1. Th

extended to 100,000 time steps. As shown in Table 5

these methods ([57], [69], [20], [43], [44], and these methods ([57], [69], [20], [43], [44], and time are shown in Table 5.7. A total of thirty runs

different initial state. The initial parameters of these methods ([57], [69], [20], [43], [44], and [40]) are determined according to Section 5.1.1. Th

these methods ([57], [69], [20], [43], [44], and [40]) are determined according to Section 5.1.1. Th

different initial state. The initial parameters of these methods ([57], [69], [20], [43], [44], and [40]) are determined according to Section 5.1.1. Th

[40]) are determined according to Section 5.1.1. Th different initial state. The initial parameters of

different initial state. The initial parameters of

[40]) are determined according to Section 5.1.1. Th [40]) are determined according to Section 5.1.1. Th different initial state. The initial parameters of

different initial state. The initial parameters of different initial state. The initial parameters of different initial state. The initial parameters of

[40]) are determined according to Section 5.1.1. Th [40]) are determined according to Section 5.1.1. Th different initial state. The initial parameters of

[40]) are determined according to Section 5.1.1. Th

Table 5. 7: Performance comparison of various existing models in Example 1.

CPU time Method

Mean Best Worst Std.

Accuracy

GENITOR ([57]) 120.95 61.34 320.36 92.95 50%

SANE ([96]) 97.56 48.54 254.84 83.56 61%

R-GA ([26]) 89.83 34.85 192.93 69.94 54%

R-SE ([29]) 73.14 28.66 149.43 57.87 56%

TDGAR ([20]) 69.13 26.54 112.73 41.58 53%

ESP ([40]) 58.32 22.08 95.57 35.27 56%

ERDSE ([44]) 51.19 20.77 88.53 30.74 67%

CQGAF ([43]) 48.82 18.79 84.39 26.31 59%

ISRL-HEA 39.97 15.10 71.01 18.23 100%

ISRL-SACG-SE 30.54 10.23 49.21 11.12 100%

For demonstrating the efficiency of the each component of the proposed SACG-SE (the group-based symbiotic evolution (GSE), TSSA, and ECCS), in this example, five different methods: the proposed ISRL-SACG-SE without TSSA (Type I), the ISRL-SACG-SE without ECCS (Type II), the ISRL-SACG-SE without TSSA and ECCS(Type III), and the SE method (Type IV), and the proposed ISRL-SACG-SE (Type V) are used. In the Type I, III, IV methods, the number of fuzzy rules is determined according to trail and error testing and then compute the average of the generations. In Type II method, each group performs the two-point crossover strategy independently. In Type III, only GSE is used. In the Type IV method, the SE ([29]) with ISRL is adopted. In the Type V method, the ISRL-SACG-SE uses the TSSA to determine fuzzy rules automatically and the proposed ECCS is adopted to perform crossover strategy. In the Type I, III, IV methods, the parameters are set according to [104]. In Type I, III, IV methods, we determine the number of fuzzy rules by executing Type I, III, IV methods with fixed string length for each specification of the number of fuzzy rules and then compute the average of the generations. All the five methods are designed base on ISRL. The performance of five methods is shown in Table 5.8.

In Table 5.8, comparing Type IV with Type III, the GSE outperform than SE because of group-based symbiotic evolution (GSE), TSSA, and EC

methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E (Type IV), and the proposed ISRL-SACG-SE (Type V) a ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty group-based symbiotic evolution (GSE), TSSA, and EC methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty methods: the proposed ISRL-SACG-SE without TSSA (Ty methods: the proposed ISRL-SACG-SE without TSSA (Ty methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E ECCS (Type II), the ISRL-SACG-SE without TSSA and E methods: the proposed ISRL-SACG-SE without TSSA (Ty ECCS (Type II), the ISRL-SACG-SE without TSSA and E

the chromosomes that use to evaluate the solution locally can obtain better performance compared to systems of only one population be used to evaluate the solution. Comparing Type I with Type V method, the Type V method needs few CPU time to balance the control system.

The reason is that the TSSA can determine the suitable number of fuzzy rules automatically.

However, in Type I method, the number of fuzzy rules is determined by trial-and-error testing.

Therefore, the average of the generations of the Type I method is larger than Type V method.

Comparing Type II with Type V method, it is observed that the SACG-SE (Type V) performs better than Type II method. It is observed that ECCS can reduce CPU time. As shown in Table 5.8, the proposed ISRL-SACG-SE (Type V) performs better than other four types of methods.

Table 5. 8: Performance comparison of different methods.

CPU Time Method

Mean Best Worst Std.

Type I 51.54 18.23 83.21 31.12

Type II 45.93 16.41 76.87 27.39

Type III 58.43 22.18 98.91 38.55

Type IV 68.26 26.63 131.25 51.43

Type V 30.54 10.23 49.21 11.12

在文檔中以改良安全性增強式學習為基礎的自我適應進化演算法應用於模糊類神經控制器設計之研究 (頁 109-116)