• 沒有找到結果。

Chapter 4 Improved Safe Reinforcement Learning

4.2 Structure of the ISRL

Although safe reinforcement learning can let the control system to reach and remain in the goal set, it cannot evaluate how soon the system meets the control goal. The system using safe reinforcement learning only can make sure the control system to reach and remain in the goal set. However how soon the control system reaches the goal set is not considered in safe reinforcement learning. It is important to indicate the control system how soon to reach the goal set. For solving above problem, in this dissertation, the improved safe reinforcement learning is proposed. In this section, the other part of the proposed ISRL-SAEAs, that is, improved safe reinforcement learning (ISRL) is discussed. In this dissertation, the self adaptive evolution algorithms (SAEAs) are trained by using the improved safe reinforcement learning (ISRL).

As shown in safe reinforcement, once the system’s Lyapunov function is identified, under Lyapunov-based manipulations on control laws, the architecture can drive the plant to reach and remain in a predefined desired set of states with probability 1. About this, in the proposed ISRL, the time step for the plant entering the desired set of states can be modified to indicate the concept of how soon the system becomes stable.

In the proposed ISRL, a reinforcement signal is designed based on Lyapunov function.

The purpose of ISRL is to guide the system to reach and remain in a set of goal states. Several properties defined in [32] are listed above to express a safety constraint that we want the reinforcement learning. It is important to indicate the control system how soon to reach the goal set. For solving above problem, in this disser

learning is proposed. In this section, the other part of the proposed ISRL-SAEAs, that is, goal set. For solving above problem, in this disser

goal set. For solving above problem, in this disser goal set. For solving above problem, in this disser reinforcement learning. It is important to indicate reinforcement learning. It is important to indicate goal set. For solving above problem, in this disser learning is proposed. In this section, the other pa goal set. For solving above problem, in this disser learning is proposed. In this section, the other pa goal set. For solving above problem, in this disser goal set. For solving above problem, in this disser reinforcement learning. It is important to indicate reinforcement learning. It is important to indicate reinforcement learning. It is important to indicate

learning is proposed. In this section, the other pa learning is proposed. In this section, the other pa

controller to satisfy. Therefore, the improved safe reinforcement learning with self adaptive evolution algorithms (ISRL-SAEAs), which are constructed on a TNFC, are based on Lyapunov analysis. The schematic diagram of the ISRL-SAEAs is shown in Fig. 4.1. The TNFC acts as a control network to determine a proper action according to the current input vector (environment state). The feedback signal in Fig. 4.1 is the reinforcement fitness value that plays a role as a performance measurement. The reinforcement fitness value is evaluating how soon the plant can meet the desired set of states. The reinforcement fitness value is also used as the fitness function of the SAEAs. Each string with higher fitness value represents the better-fitted individual in the population. It will be observed that the advantage of the proposed ISRL-SAEAs is that its capability of meeting global optimum.

The flowchart of the ISRL-SAEAs is shown in Fig. 4.2. The proposed ISRL-SAEAs runs in a feed forward fashion to control the environment (plant) until the controller guides the plant into a predefined goal set. The concept of “goal set” proposed in this paper is referenced from [32]. In [32], authors proposed a Lyapunov-based design for reinforcement learning. The purpose of [32] is to guide the system to reach and remain in a goal set comprising goal states.

Figure 4.1: Schematic diagram of the ISRL-SAEAs for the TNFC.

in a feed forward fashion to control the environmen

plant into a predefined goal set. The concept of “goal set” proposed in this paper is referenced from [32]. In [32], authors proposed a Lyapunov-bas

purpose of [32] is to guide the system to reach and

plant into a predefined goal set. The concept of “goal set” proposed in this paper is referenced plant into a predefined goal set. The concept of “g

in a feed forward fashion to control the environmen

plant into a predefined goal set. The concept of “goal set” proposed in this paper is referenced from [32]. In [32], authors proposed a Lyapunov-bas

plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas from [32]. In [32], authors proposed a Lyapunov-bas plant into a predefined goal set. The concept of “g plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas from [32]. In [32], authors proposed a Lyapunov-bas plant into a predefined goal set. The concept of “g plant into a predefined goal set. The concept of “g plant into a predefined goal set. The concept of “g plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas from [32]. In [32], authors proposed a Lyapunov-bas plant into a predefined goal set. The concept of “g from [32]. In [32], authors proposed a Lyapunov-bas

Figure 4.2: Flowchart of the ISRL-SAEAs.

In the simulations of ISRL-SAEAs, when T=G, Properties 4.1, 4.3, 4.4 represent of an achievement of stability. Lyapunov's direct method is mainly used to study the stability of systems of differential equations. Authors in [32] extend this method to the reinforcement learning problem. The proposed ISRL conducts one Lyapunov-style theorem proposed in [32], which provides criterion for designing the reinforcement learning agent. The theorem is listed below. Let L S → ℝ: denotes a function positive on Tc = −S T , and ∆ denotes a fixed real number.

Theorem 4.1 If ∀ ∉s T , actions a∈A(s), all possible next state s' (either s'∈T or ( ) ( ')

L s −L s ≥ ∆), then from any st∉T , the environment enters T within L s( ) /t ∆ time steps.

The proof of Theorem 4.1 can be found in [32]. Theorem 4.1 provides a guarantee of a plant's meeting the goal state, if the controller is designed such that it reduces the Lyapunov function of the plant in each time step. Therefore, the main concept of the proposed ISRL is to identify a Lyapunov function of a control plant then design the action choices so that the

In the simulations of ISRL-SAEAs, when T=TT achievement of stability. Lyapunov's direct method systems of differential equations. Authors in [32]

achievement of stability. Lyapunov's direct method achievement of stability. Lyapunov's direct method achievement of stability. Lyapunov's direct method

In the simulations of ISRL-SAEAs, when In the simulations of ISRL-SAEAs, when T achievement of stability. Lyapunov's direct method systems of differential equations. Authors in [32]

achievement of stability. Lyapunov's direct method systems of differential equations. Authors in [32]

achievement of stability. Lyapunov's direct method achievement of stability. Lyapunov's direct method

In the simulations of ISRL-SAEAs, when In the simulations of ISRL-SAEAs, when In the simulations of ISRL-SAEAs, when

systems of differential equations. Authors in [32]

systems of differential equations. Authors in [32]

reinforcement learning satisfies the above theorem.