類神經網路之雞尾酒學習及其在參考模式控制之應用

(1)

行政院國家科學委員會專題研究計畫成果報告

計畫類別：個別型計畫

計畫編號： NSC92-2213-E-011-033-

執行期間： 92 年 08 月 01 日至 93 年 07 月 31 日執行單位：國立臺灣科技大學電機工程系

計畫主持人：蘇順豐

計畫參與人員：鄭至鈞、葉志豪、謝詠昇、林家強、黃文宗

報告類型：精簡報告

報告附件：出席國際會議研究心得報告及發表論文處理方式：本計畫可公開查詢

中華民國 93 年 11 月 1 日

(2)

類神經網路之雞尾酒學習及其在參考模式控制之應用

計畫編號: NSC 92-2213-E-011-033 執行期限:92/08/01 - 93/07/31

主持人: 蘇順豐教授

參與人員：鄭至鈞、葉志豪、謝詠昇、林家強、黃文宗國立臺灣科技大學電機系

一. 摘要

倒傳學習法是常在的學習中使用。然而、

其常會落入區域中的最小值或高原區中。在文獻上有相當多的方法被提出，可是幾乎都不十分成功。在本計畫的研究中，我們探討此一主題，並嘗試結合不同的方法來達到較佳的效果。目前已考慮到的有啟發式隨機擾動、飽滿量測、不確定性、內/外插指標等。由於此方法將納入不同的方法，因此、我們稱為雞尾酒倒傳學習法。而另一研究主題，則是將如上的學習法則應用於參考模式控制的研究上。由於在建立模式的倒式時，有不同的架構及不同的學習方式。我們在此一計畫中，也對不同的方法及架構加以分析及探討。同時我們也利用如上內/外插指標之等，在參考模式控制上，並提出更好的學習與控制架構。

Abstract

The backpropagation learning algorithm is widely used in neural networks. However, it often suffers from being trapped into local minima or plateau areas. Various approaches have been proposed in the literature to deal with the above problems, but not quite successfully.

In this report, we combine various approaches to cope with this problem. The used approaches include heuristically stochastic perturbation, saturation measure, interpolation/extrapolation index, etc. Since it combines various approaches, we call it the cocktail BP learning in our study.

Recently, neural networks and fuzzy systems have been widely used for modeling nonlinear systems. In our study, we use neural networks to model plants, determine their inverse and design controllers. There are various ways of determining inverse for models. We study them.

After determining the model plant and its inverse, a neuro-controller is often employed.

二.計畫緣由與目的

The backpropagation (BP) learning is widely used in training various networks. However, it often suffers from being trapped into local minima or plateau areas. Various studies for coping with those problems have been reported in the literature [3-6]. Leung et al. [1,2]

introduced genetic algorithms (GA) to evolve

the network weights in a controlled manner so as to jump to the regions of smaller mean squared error whenever BP is stucked at a local minimum. GA can quickly locate a near optimum region in the solution space, but it is rather slow to fine tune the solution, whereas BP can only do well in local optimization. To compromise the two methods, their weight evolution algorithm is to evolve the network weights connecting to the output neurons which have above-average errors during the gradient adaptation of BP; that is, the weights are mutated by perturbing the weights of the parent chromosome using some small random real numbers. Recently, Castillo et al. [3,4] explored the problem of escaping local minima using GA.

They proposed a method that attempts to solve the problem of finding appropriate initial weights and learning parameters for multi-layer neural networks by combining an evolutionary algorithm and BP. This approach is different from the approach in [1,2] in that this approach can have a less restricted form of GA-based perturbations in weight space compared to the probabilistic perturbations and can solve both the local minima problem and the network structure design in a unified manner. Genetic algorithm furnishes a global search technique based on emulating the evolutionary behavior of biological systems. However, genetic algorithm does not always insure efficient learning.

Especially in many practical problems with a large scale of data, it is inappropriate to apply genetic algorithm because of its slow learning.

Besides, those approaches are to conduct “blind”

search so as to escape from local minima. In fact, there may exist notable phenomena that may reflect certain situations or problems existing in the network. In this proposal, we attempt to propose several ideas that can be used to overcome inefficient learning of BP. Since in our approach, we may combine various algorithms to improve learning performance, we call our algorithm as the cocktail BP learning algorithm.

三.研究方法及目前成果

As mentioned above, the topics we intend to

(3)

study in this proposal are to develop smart learning algorithms for neural networks and SONFIN-like fuzzy systems and to study the use of these systems in model reference control.

Regarding the smart learning algorithms, the first idea is to employ heuristic stochastic perturbation in the learning process to escape local minima and also determine an appropriate network structure based on the learning performance. Traditionally, the algorithm begins with a BP mode. As the BP learning progresses, the network may be trapped in a local minimum.

In this case the stochastic perturbation mode is applied to reinitialize weights associated with hidden units. Instead of randomly perturbation, in our approach, the units to be reinitialized are selected probably based on the significance of units. The significance of a unit i is defined as

( ) ( )

i i

S = E w −E w , where E(wi) represents the error of the network with unit i eliminated and E(w) is the error of the original network. Then the BP learning continues with the new set of weights. This process is repeated until a termination condition is satisfied. Addition of new hidden units can also be viewed as a special case of stochastic perturbation. Using heuristic stochastic perturbation, the algorithm is capable of solving the local minima problem as shown in Fig. 1 for the horseshoes problem.

Another idea of this research is to split, in the hidden layers, certain neurons that saturate so as to improve performance of BP learning.

Saturation of neurons causes the error inefficient learning due to very flat and small derivate values in the BP learning. By means of neuron splitting, the error may decrease. One is proposed by [5]. The error decay rate is used as a measure for adding new neurons. The algorithm contains two stages of learning:

neuron addition and neuron deletion. Error is used as an index for neuron addition, and is checked every 100 epochs. If it does not decrease less than 2% of the previous one, a new neuron is added. The second method is to split the hidden neurons that saturate; i.e., when most of hidden neuron outputs locate at two extreme sides. Consequently, weights are split into two parts, and thus the split network can create more space for weight correction.

The simulation results are listed in Tables 1.

Initial hidden neuron number varies from 1 to 30 and each case runs for 5 times. It can be seen that error rises after every split, but falls down fast to a lower value if no other split happens. Reason to this phenomenon may be that the output is adjusted to higher value due to improper split ratio. Also, weight correction

may change to undesirable direction. The tables show that more neurons indeed produce better results. If the used initial hidden number is large, fewer splits happen. But if initial neuron is small, this method helps much to decrease the error.

Another topic is the use of an index for interpolation or extrapolation. The original idea comes from the firing strength of fuzzy rules.

Whether an input will produce interpolation or extrapolation is dependent on whether the input is within those rules. In neural networks, there are no rules. A neural network uses neurons to interpret its stored knowledge. Thus, we may also use neurons to define the so-called interpolation degree. The idea is to find out what values those training patterns will generate for a neuron. Then if an input is located within those values, it should generate interpolation effects; otherwise, it should have extrapolation effects. This interpolation degree can be defined for each hidden neurons. In order to have more informative message, we use a fuzzy membership function to define the interpolation degree. The used membership function is a trapezoid function as shown in Figure 2. In the proposed algorithm, we used the training data to define the membership functions; that is to define A, B, C, and D. First, we sort the outputs of the considered neuron for the given training data. Then the upper 20%

of those output will be used to defined the line of CD and the lower 20% of data is used to defined AB. The main potion (60%) of middle data is then the center part of the membership function. There are four key points: the minimum (A), the lower boundary (B), the upper boundary (C) and the maximum (D). The minimum point is the smallest value in the output value sequence and the maximum point is the largest value in sequence. Similarly, B and D are defined as the locations of the upper 20% and the lower 20% values, respectively.

We can sort those data and define A, B, C, and D from those data. In our research, we may vary the above values to see which set of values can have better performance.

In order to further observe the defined interpolation degree, we also compute the mean for neuron’s interpolation degrees. Table 2 shows the mean interpolation degree for four different networks with sin wave as the input function. When test data generate more extrapolation effects, more errors will exist. As a consequence, the obtained mean interpolation degree will be smaller. Conclusively, the proposed interpolation degree indeed can be

(4)

used to examine whether a given test pattern will produce interpolation or extrapolation effects.

Thus, we may also use this index to evaluate whether the used training pattern is sufficient or not. A possible further application of this interpolation degree is that it can be used to define the creditability of an obtained output.

With this creditability, it is possible to solve the problem of garbage in and garbage out in using neural networks in real applications.

In this study, we also study the modeling effects by constructing the inverse model directly from the input-output training patterns.

If such an approach can work well, it is not necessary to add a step to construct plant modeling. Besides, if we can construct the inverse directly, the process will no accumulate plant model errors into the inverse training process. It should be more accurate than those model mentioned above. Nevertheless, there exist other problems for this kind of inverse modeling. First, we do not know whether the plant is controllable or whether it is stable.

Besides, the most important issue about learning is whether the training data set is sufficient or not. If this kind of approach is used, it is very difficult to ensure a sufficient coverage for the training data set. Another idea is to switch the location of the new neural network and the learned neural network; that is to exchange the locations of neural network 1 and neural network 2. According to the theory proposed in [7], this approach is also feasible. The main difference from the original training is the formation of training patterns. It is obvious to pick u (k) to train neural network 2 in the new method. In this study, we also analyze the performance s of this kind of approach.

As we have mentioned in the above, it is very important to select a proper set of training patterns so as to have sufficient learning. If the training patterns cannot sufficiently cover the interested input space, the learned network may have to find extrapolation from learned patterns.

In this study, four cases are considered. Case 1 is the original method in which the outputs for the training and testing patterns are the same.

Case 2 is to switch the locations of neural network 1 and neural network 2 in case 1. Case 3 is to construct the inverse directly. Case 4 is to use another training patterns for case 1. Case1 is the original patterns. The structures of case 2 and case 3 are almost the same and then they use the same training pattern. In case 4, because the plant modeling is limited when it is trained, the range of training patterns is set in the special range. In this comparison, a key concern is “Can

these methods really construct the inverse of the considered plant?” All cases are connected by the real plant and feed an input signal

[ ] [ ]

( ) sin 2 / 50 sin 2 (2 / 50)

u k = πk + × πk for testing.

The simulation results are shown in Table 3.

Next let’s observe the effects of adding noise to the system. In the used control scheme, it is not equipped with robust control. In this analysis, we attempt to study whether neural network based inverse models can tolerate noise in the modeling process. This tolerance is very important due to the cancellation effects required in inverse control. Here, 5% noise is added into the output of the system. The simulation results are shown in Table 4. Finally, we use a set of testing patterns that are different from training patterns. In this study, we simply use another frequency of sinusoidal function.

Here, we do not add noise to avoid confusion.

First, the same frequency (2πk/ 50) is used to train networks. Two another frequencies

2πk/ 25 and 2πk/ 30 are used as the test input functions. The results are shown in Table 5. Fig.

3 shows an adaptive inverse control scheme.

This control system, an adaptive controller is used in the place of the reference model. This controller is adaptive through errors supposed to be the modeling errors. Fig. 4 shows the tracking performances of this control system.

四. 結論與討論

Regarding the smart learning algorithms, the first idea is to employ heuristic stochastic perturbation in the learning process to escape local minima and also determine an appropriate network structure based on the learning performance. Another idea of this research is to split, in the hidden layers, certain neurons that saturate so as to improve performance of BP learning. Another topic is the use of an index for interpolation or extrapolation. With this creditability, it is possible to solve the problem of garbage in and garbage out in using neural networks in real applications. In this study, we also study the modeling effects by constructing the inverse model directly from the input-output training patterns. Various structures are studied.

Finally, an adaptive inverse control scheme is employed. This controller is adaptive through errors supposed to be the modeling errors. The tracking performances of this control system is satisfactory.

五. 參考文獻

[1] S. H. Leung, A. Luk, and S. C. Ng, “Fast convergent genetic-type search for multi-layered network,” IEICE Trans. Fundamentals, vol.

(5)

E77-A, no. 9, pp. 1484-1492, 1994.

[2] S. C. Ng, S. H. Leung, and A. Luk, “Evolution of connection weights combined with local search for multi-layered neural networks,” IEEE, pp.

726-731, 1996.

[3] P. A. Castillo, J. J. Merelo, J. González, A.

Prieto, V. Rivas, and G. Romero, “G-Prop-III:

Global optimization of multilayer perceptions using an evolutionary algorithm,” Proc. Of the Congress on Evolutionary Computation, vol. I, pp. 942-947, 1996.

[4] P. A. Castillo, J. Carpio, J. J. Merelo, A. Prieto, V. Rivas, and G. Romero, “Evolving multilayer perceptions,” Neural Processing Letters, vol. 12, no. 2, pp. 115-128, 2000.

[5] Y. Hirose, K. Yamashita, and S. Hijiya,

“Back-propagation algorithm which varies the number of hidden units,” Neural Networks, vol.

4, pp. 61-66, 1991.

[6] S. E. Fahlman and C. Lebiere, “The cascade-correlation learning architecture,”

CMU-CS-90-100, Technical report, Dept. of CS, Carnegie Mellon University, 1991.C. F. Juang and C. T. Lin, “An On-line self-constructing neural fuzzy inference netwok and its applications,” IEEE Trans. Fuzzy Syst., vol. 6 no.

1, pp. 13-32, 1998.

[7] B. Widrow and E. Walach, Adaptive Inverse Control. Prentice Hall, 1996.

Fig. 1 The learning history of using heuristic stochastic perturbation

Fig. 2 The membership function of interpolation degree

Controller

P^-1(Neural network 2)

Plant P^-1(Neural

network 2)

Reference model Output y(k)

Command

input U₁(k)

U₂(k) +

-

Fig. 3 The adaptive inverse control scheme

Fig. 4 The tracking performance of adaptive inverse control

Table1.Two hidden layers, weight splits while error decay is less than 2% of previous 100 epoch Initial

Neuron No.

Final Neuron No. Ave.

Training error Abs. Ave.

Testing error RMSE Ave.

1 5 0.0332722 0.0472938

2 8.4 0.0392448 0.0619456

5 10 0.0303448 0.0327016

10 14 0.0277070 0.0310348

15 17.2 0.0264026 0.0323580

20 20.4 0.0237386 0.0297500

25 26.2 0.0238214 0.0288534

30 30 0.0211104 0.0250978

Table 2 Mean interpolation degree for these four cases with sin wave

First Second Third Forth Error 1^st Case 0.7828 0.7877 0.7721 0.7694 0.015071 2^nd Case 0.5988 0.4360 0.5403 0.4579 0.018273 3^rd Case 0.7493 0.7594 0.017258 4^th Case 0.8938 0.9373 0.9565 0.9489 0.012070

Table 3 Results without noise for using different inverse schemes.

Type Kind

Training error

Testing error

Epoch ^Combining with system Modeling 0.005856 0.013122 5000(η=0.2)

1st case 0.007704 0.010218 3000(η=0.03) 0.015071 2nd case 0.033847 0.021738 5000(η=0.03) 0.018273 3rd case 0.026816 0.014719 5000(η=0.03) 0.017258 4th case 0.051909 0.007771 5000(η=0.03) 0.012070

Table 4: Results with noise for using different inverse schemes.

Type Kind

Training error

Testing error

Epoch ^Combining with system Modeling 0.024231 0.010470 5000(η=0.03)

1st case 0.008082 0.010788 3000(η=0.03) 0.016505 2nd case 0.030821 0.016428 5000(η=0.03) 0.016069 3rd case 0.048865 0.024356 5000(η=0.03) 0.049921 4th case 0.077389 0.008526 5000(η=0.02) 0.013720 Table 5 Test the found inverse with different inputs.

Type Kind

2πk/ 25 Error (RMSE)

2πk/ 30 Error (RMSE) First case 0.029504 0.038172 Second case 0.020612 0.020948 Third case 0.018855 0.019017 Fourth case 0.013504 0.015172