• 沒有找到結果。

Keywords: fuzzy systems, radial basis function neural network, reinforcement learning, supervised learning

Ying-Kuei Yang, Jin-Yu Lin, Wei-Li Fang and Jung-Kuei Pan

Dept. of Electrical Engineering, National Taiwan University of Science & Technology, Taipei, TAIWAN

e-mail: [email protected]

Abstract

The performance of rule-based fuzzy systems primarily relies on its rules and membership functions.

Some researchers have proposed a self-construct rule-based fuzzy system based on reinforcement learning, which often results in a 5-layer neural network. The Fuzzy-reasoning Radial Basis Function Neural Network (FRBFN) proposed in this paper has only 3 layers reduce its forwarding calculation time. The reasoning time is reduced by eliminating fuzzification and defuzzification in the learning system. The membership functions of a rule are finely tuned through learning procedure. The Radial Basis Function Neural Network (RBFN) is employed in FRBFN to offer the generality and smoothness of the network. The result of experiment has shown that our proposed network has the ability to well train a rule-based fuzzy system through reinforcement learning with god performance.

Keywords: fuzzy systems, radial basis function neural network, reinforcement learning, supervised

learning

1. Motivation

The rule-based fuzzy systems have been successfully used in many real-world control applications [1]. Many researchers have devoted research on constructing rules and membership functions by system itself through learning [2][3][4][5]. Supervised learning requires sample data sufficiently large enough to learn. The use of reinforcement learning is to eliminate this requirement.

Reinforcement learning method is based on a trial-and-error method and therefore no precise training pair is required [6][7][8][9][10].

Most researchers have considered a fuzzy system as a five layers neural network. This selection is to map input variables, input membership functions, rule base, output membership functions and output variables to each layer of neural network respectively. The use of five layers has created a

15

complex neural network that requires high computation cost. A less layers of neural network that performs the same function is proposed in this thesis. This idea comes from the similarity between a Radial Basis Function Neural Network (RBFN) and a rule-based fuzzy system [11][12]. The proposed network has only three layers: input layer, hidden layer and output layer. The layer reduction can decrease the computation cost and enhance the learning speed of a network system.

2. The Fuzzy-Reasoning Radial Basis Function Neural Network (FRBFN)

2.1 Network Architecture

The logical network of proposed FRBFN is shown in Figure 1.

Figure 1. Logical Network Architecture of FRBFN

There are four functional blocks in Figure 1. Action Critic Network (ACN), Action Selection Network (ASN) and Stochastic Action Modifier (SAM) function blocks are the basic components of FRBFN network. The environment function block on the right is the target for control purpose.

2.2 Function of ASN

ASN acts as a rule-based fuzzy logic controller that includes three basic components in a fuzzy system: fuzzification, inference engine and defuzzification [1]. Because of the equivalence between rule-based fuzzy system and RBFN[11][12], a RBFN is introduced to implement an ASN. The RBFN performs the task of what a fuzzy system can do and eliminates the procedures of fuzzification and defuzzification. The network architecture of ASN implemented by RBFN is shown in Figure 2. ASN receives state variables s(t) from environment, such as a controlled plant and internal reinforcement signal

r

ˆ t( ) generated from ACN. The state variables offer the information about plant’s current status. For the associated reinforcement learning system, both the state variables and the internal

16

reinforcement signals are used as a criteria to update learning parameters in ASN. The values of these parameters are learned and consequently generated through the signals of state variables s(t) and internal reinforcement signal

r

ˆ t( ).

Figure 2. Block diagram of Action Selection Network (ASN)

The output of ASN does not directly apply to the controlled plant. Instead, the output y(t) of ASN is first sent to the Stochastic Action Modifier (SAM). The SAM then tries to optimize y(t) to generate a modified output

y

ˆ t( ) based on the predicted reinforcement signal

p

(t) at that time and its stochastic algorithm.

)) ( ), ( ( )

ˆ ( t N y t t

y = σ

(1)

where N is a normal or Gaussian distribution function with mean

y

(t) and variance σ(t). A small value of σ(t) indicates the system is closer to a stable situation. For our learning algorithm, we choose probability function σp(t) as

) (

1 2

) 1

( p t

p

t e

= +

σ (2)

where

p

(t) is generated by ACN and is the signal used to predict reinforcement signal

r

(t).

2.3 Function of ACN

The function of an ACN shown in Figure 1 is to generate signals of

p

(t) and

r

ˆ t( ) by receiving state variables and external reinforcement signals. The output signal

p

(t) is a prediction signal and

) ˆ t(

r

is an internal reinforcement signal. The purpose of

p

(t) is to predict the infinite discounted prediction signal

z

(t)and

r

ˆ t( ) is the difference between

p

(t) and

z

(t). The objective of ACN is to model the environment such that it can perform a multi-step prediction of reinforcement signal for the current action chosen by ASN. Using a Radial Basis Function Network (RBFN) is proposed to implement an ACN, because a RBFN has the ability to approximate the real-valued mapping of continuous or piecewisely continuous function. Another reason to use RBFN is for total network architecture of FRBFN. If both the ASN and ACN use the same neural network architecture, some

17

layers of RBFN can be shared. The network diagram of ACN is shown in Figure 3.

Figure 3. Block diagram of Action Critic Network (ACN)

The input space s(t) of ASN and ACN should be identical because they all receive same information of s(t) from the same environment. The input and hidden layers of ASN or ACN can be shared each other. The architecture of proposed FRBFN is more compact by sharing the first two layers,. The network architecture shown in Figure 4 is composed of one ASN and one ACN to form the final version of FRBFN.

Figure 4. FRBFN implementation diagram

The network diagrams of ASN and ACN shown in Figure 2 and Figure 3 are merged in Figure 4.

There are three reasons to combine these two networks: (1) Both ASN and ACN have the same network architecture. (2) The input state space for these two networks is identical. (3) The numbers of hidden nodes in ASN and ACN are identical because of the same input state space. Sharing layer 1 and layer 2 by ACN and ASN will reduce the memory requirement for storage and eliminate the need to recalculate the outputs of hidden nodes. And in learning phase, we do learn widths and centers in hidden nodes once only because ACN and ASN both use the same hidden nodes. These two networks have their own weight parameters between layer 2 and layer3. The network architecture of FRBFN is simpler and the learning parameters are less than others in [3][4][5][6].

18

2.4 Learning in Action Selection Network

The parameters that need to be adjusted by learning mechanism are centers and widths in nodes of hidden layer and the weights between hidden and output layers. One of the goals of the ASN is to have output that maximizes the reinforcement signal

r

(t). The learning relation can be written as:

i parameter

m stands for in ASN. The relation indicates that the amount of adjustment to a parameter

i is proportional to the strength of its associated reinforcement signal. Using chaining rule to expand the right part of equation (3), we have

i

denotes the amount y is changed when adjustment

parameter

m changes. The equation (4) shows

i

相關文件