Annealing Robust Radial Basis Function Networks for Modeling with Outliers

全文

(1)Annealing Robust Radial Basis Function Networks for Modeling with Outliers Chen-Chia Chuang*, Jin-Tsong Jeng* and Pao-Tsun Lin * Department of Electronic Engineering Hwa-Hsia College of Technology and Commerce 111, Hwa-Shin Street, Chung-Ho City, Taipei Country, TAIWAN 235. TEL: (886)-(2)-29426424-309 FAX: (886)-(2)-29426424-203 E-mail: [email protected]. Abstract In this paper, the annealing robust radial basis. Key. words:. Outliers,. Annealing. robust. function networks (ARRBFNs) is proposed to. backpropagation learning algorithm, Radial basis. improve the problems of robust RBFNs for function. function networks, Support vector regression.. approximation with outliers. Firstly, the support vector regression (SVR) approach is used to obtain. 1. Introduction. the initial structure of ARRBFNs. Because of the. Radial basis function networks (RBFNs) are. SVR approach is equivalent to solving a linear. often used for modeling system due to its simplicity. constrained quadratic programming problem under. (i.e. only one layer of weights are required) and. the fixed structure of SVR, the number of hidden. faster convergence [1]. In those approaches, the task. nodes and adjustable parameters (e.g. initial structure). is to obtain networks that can act as closely to the. are easy obtained in the ARRBFNs. Secondly, we use. system to be modeled as possible. Since RBFNs. the results of SVR as initialization of ARRBFNs.. approximated. Then, the annealing robust backpropagation (ARBP). mathematical. learning algorithm used as the learning algorithm of. functionally depend on the inputs, they are often. ARRBFNs and applied to adjust the parameters of. referred to as model-free estimators [2]. The basic. ARRBFNs. The ARBP learning algorithm has been. modeling philosophy of model-free estimators is that. proposed to overcome the problems of initialization. they build systems from input-output patterns. and cut-off points in the robust learning algorithm.. directly, or in more abstract, they learn from. Based on the initialization of ARRBFNs by SVR. examples without any knowledge of the model type.. approach, the ARRBFNs have a fast convergence. This kind of learning schemes used for neural. speed and robust against outliers. Simulation results. networks can also be called data learning. Such. are provided to show the validity and applicability of. learning schemes are to find functions that can match. the proposed ARRBFNs.. all training data as close as possible, no matter. functions. without. description of how the. requiring outputs. whether these data are trustable or not. In fact,.

(2) RBFNs with sufficiently many nodes in the hidden. vector regression (SVR) approach [9] to obtain the. layer are referred to as universal approximators [3].. initial structure of ARRBFNs (i.e. the properly. However, if the training data are corrupted by noise. number of nodes, the parameters of Gaussian. or outliers [4], those data learning schemes may not. function and the synaptic weights). The SVR. always come up with acceptable performance.. approach with. ε -insensitive function can be. When the outliers are exists, the traditional. provides an estimated function within the ε zone. RBFNs approaches are easily affected. Hence, the. that is not slightly affected by outliers. That justly. robust RBFNs approaches are proposed to overcome. provides better initialization to robust learning. traditional RBFNs approaches while facing with. algorithm. Then, we use the annealing robust. outliers. In [5], the parameters of RBFNs (i.e. the. backpropagation (ARBP) learning algorithms to. parameters the of Gaussian kernel function and the. adjusting the parameters of Gaussian function and. synaptic weights) can be regarded as the initial. the synaptic weights [8]. Because of the ARBP. structure of robust RBFNs that determined by. learning algorithm has been proposed to overcome. singular values decomposition (SVD). method.. the problems of initialization and cut-off points. However, the initial structure of robust RBFNs using. selection in the robust backpropagation learning. SVD method still not obtains satisfying performance.. algorithms [8].. Hence, robust learning algorithms that similar with. This paper is organized as follows. After this. the robust backpropagation (BP) learning algorithms. introduction section, the problems of robust RBFNs. [7] are applied to adjusting the parameters of RBFNs. approaches are discussed in Section 2. Section 3, the. for. performance.. ARRBFNs is proposed and discussed. In the section,. Nevertheless, in the use of robust learning algorithms,. SVR approach and ARBP learning algorithm are. there also exist the problems of initialization and the. briefly descripted. The computer simulations are. selection of cut-off points [8]. Moreover, the number. illustrated in Section 4. Finally, Section 5 concludes. nodes of RBFNs are pre-determined. In [6], the. this paper.. number of nodes and the parameters of robust. 2. The Problems of Robust RBFNs. the. improving. learning. RBFNs are obtained by adaptive growing methods. The structure of RBFNs consists with an input. and randomization. The adaptive growing method is. layer, a hidden layer of radial basis functions and a. only growing to a certain number beyond which a. linear output layer. The overall structure, assuming. desired number cannot be reached. In this approach,. an input dimensionality p , implements a nonlinear. it is difficult to determine when the adaptive growing. mapping F : R p → R expanded on a finite basis of. methods based least square (LS) criterion can be. nonlinear functions. When the radial basis functions. switch to the adaptive growing methods based robust. are chosen as Gaussian functions, it can be expressed. criterion.. in the form. In this paper, in order to overcome the problems of robust RBFNs approaches with outliers,. f ( x) =. L. ∑ i =1. wiGi =. − x − m i wi exp  i =1  L. ∑. 2.  ,.  2σ i2  . (1). a novel approach, called the Annealing Robust. where x ∈ R p is the input vector, wi are the. Radial Basis Function Networks (ARRBFNs), is. synaptic weights, mi are the centers of Gaussian. proposed. In this approach, we using the support. functions, σ i are the width of Gaussian functions.

(3) and L are the number of Gaussian functions. The. discriminating against outliers, ei is the estimated. corresponding network structure is shown in figure 1.. error for the i-th training pattern and N is the. In general, the structure of RBFNs is often. number of training data.. constructed by two catalogs. Firstly, the parameters. Nevertheless, the robust RBFNs approaches. of Gaussian function (i.e. the centers and width of. with robust learning algorithm could indeed improve. Gaussian) and the synaptic weights are selected by. the learning performance to some extent when. the. random,. training data contain outliers [5,6]. Thus, the robust. respectively. The cross-validation or generalized. RBFNs approaches are also exists some of problems.. cross-validation [10] or pre-determined are often used. Firstly, the initial structure of robust RBFNs are very. to obtaining the number nodes of RBFNs. Secondly,. important that justly provides better initialization for. the structure of RBFNs are iteratively obtained by. robust learning algorithm. In fact, this initialization. adaptive growing methods [6] or pruning methods.. problem also occurs for nonlinear. This process can be regarded as the initial structure of. approaches in the statistics theory. In [5], the. RBFNs. Then, the parameters of Gaussian function. parameters of RBFNs are determined by singular. and synaptic weights are adjusted to improve. values decomposition (SVD) method. However, this. approximated performance by the traditional learning. approach still not obtains the better initial structure of. algorithms.. robust RBFNs, as outliers existed. Moreover, the. clustering. algorithms. However,. most. of. [10]. and. traditional. regression. RBFNs. number of nodes must be pre-determined. In [6], the. approaches are based on the least square (LS). number of nodes and the parameters of robust. criterion that easily affected by outliers [5,6]. Hence,. RBFNs are obtained by adaptive growing methods. the robust RBFNs approaches are proposed to. and randomization. The adaptive growing method is. overcome the problems of traditional RBFNs. only growing to a certain number beyond which a. approaches with outliers. Those robust RBFNs. desired number cannot be reached. In the growing. approaches are mainly focus on the using robust. process, the initial structure of robust RBFNs is. learning algorithms to adjust the parameters of. obtained by the adaptive growing methods based LS. Gaussian function and the synaptic weights. The. criterion for a period of training. Then, the adaptive. robust learning algorithms, called the robust BP. growing method based the robust criterion (i.e. the. learning algorithm, adopt the concept of the M-. criterion. estimators into backpropagation learning algorithms. algorithm) for rest period of training. The growing. [12]. The basic idea of such algorithms is to use the. process is similarity to robust back-propagation. loss function in the M-estimators to degrade the. learning algorithm. In this approach, the problem is. effects of those outliers. The cost function of a robust. difficult to determine when the adaptive growing. learning algorithm is defined as. methods based LS criterion can be switch to the. N. ER =. ∑σ (e ; β ) , i. (2). i =1. of. robust. back-propagation. learning. adaptive growing methods based robust criterion. Secondly, the outlier’s effects also appear in. where σ (⋅) is the so-called loss function, which is a. traditional learning algorithm based LS criterion.. symmetric function with a unique minimum at zero,. Hence, various robust learning algorithms [5-7,13]. β is the cut-off points serving as an index for. have been proposed to overcome the outlier’s effects.

(4) firstly adopt the Hampel’s M-estimator into the cost. 3.1 The initial structure of ARRBFNs by SVR approach. function to degrade the effects of outliers. Liano [13]. The SVR approach is to approximate the. took another new robust cost function by assuming. given observations in an m-dimensional space by a. errors belonging to the Cauchy distribution. In the. linear function in another feature space F. The. use of robust learning approaches, there also exist. function in SVR is of the form. in traditional learning algorithms. Chen and Jain [7]. some problems [8]. The important one is about the. r r r r f ( x,θ ) = θ , Φ( x ) + b ,. (3). initialization. In those robust learning algorithms, to select a suitable initialization is extremely important.. where ⋅,⋅ is an inner product defined on F, Φ() ⋅ is a. algorithm be applied after a period of training by the. nonlinear mapping function from R m to F (i.e. r Φ : R m → F ), θ ∈ F is a parameter vector to be. traditional. algorithm.. identified in the function, and b is a threshold.. However, this approach may have difficulty in determining when to switch from backpropagation. Suppose that those observations are generated from r an unknown probabilistic distribution G ( x , y ) . Then. learning algorithm to robust learning algorithm.. the solution for the problem is to find f that. Another problem arising in those robust approaches. minimizes the following risk function [9]:. In [7], the authors suggested that their robust learning. back-propagation. learning. is regarding the selection of a parameter, the cut-off points of the M-estimator in the cost function. The cut-off points are used as a threshold for the rejection of outliers. Like in [7], the cut-off points are. r r r R[ f ] = L ( y − f ( x ,θ ))dG ( x , y ) ,. ∫. (4). r. where L( y − f ( xr,θ )) is the loss function measuring. dynamically adjusted based on the value of the fixed. the difference between the desired y and the r r estimated output f( xr,θ ) for a given input x . The loss. percentage of the errors. Such an approach requires. functions are often chosen as the ε -insensitive. that the percentage of errors being considered as. function. The. outliers must be defined first. Therefore, we propose a novel robust RBFNs approach to overcome the above problems.. 3. The annealing (ARRBFNs). robust. RBFNs. In this paper, we propose the annealing robust RBFNs (ARRBFNs) to improve robust RBFNs for modeling with outliers. In this approach, the initial. ε -insensitive function is defined as. 0,  L(e) =  − , ε e . for e ≤ ε , otherwise. (5). for some previously chosen nonnegative number ε . r However, since G ( x , y ) is unknown, then R[ f ] cannot be directly evaluated from (4). Usually, the following empirical risk function is used instead: Remp [ f ] =. 1 P. r 1 L( yi − f ( xi ;θ )) = P i =1 P. ∑. P. ∑ L(e ) , i. (6). i =1. structure of ARRBFNs is obtained by the SVR. where P is the number of training data. Although. approaches. Then, the annealing robust back-. having the advantage of being relatively easy to. propagation (ARBP) learning algorithm is applied to. compute and being uniformly consistent hypothesis. adjusting the parameters of Gaussian function and. classes with bounded complexity, the attempt to. the synaptic weights.. minimize Remp may directly lead to the phenomenon of overfitting and thus, poor generalization occurs in the case of a high model capacity in f. To reduce the.

(5) overfitting effects, a regulation term is added into Remp [ f ] , and (6) is modified as. r2 RSV [ f ] = Remp [ f ] + C ⋅ θ ,. 3.2 The Learning ARRBFNs. Algorithm. of. In the learning algorithm of ARRBFNs, the. (7). annealing robust back-propagation (ARBP) learning. where C>0 is a regular constant. The regulation term. algorithm is used [8]. An important feature of ARBP. in (7) controls the tradeoff between the model. learning algorithms that adopt the annealing concept. complexity and approximation accuracy in order to. into the cost function of robust back-propagation. ensure good generalization performance.. learning algorithm is proposed. Based on the same. It was shown that the solution of SVR approach can be expressed in terms of support r. P. vectors, θ = β i Φ( xr) and therefore, the function f ∑. idea, a cost function for ARBP learning algorithm is defined here: E ARBP (t ) =. i =1. 1 P. ∑ ρ [e (t ); β (t )], P. (10). j. j =1. where t is the epoch number,. can be written as:. e j (t ). is the error. ∑ β K (x , x;θ ) + b . (8). between the j-th desired output and the j-th output of. In the above equation, the inner product Φ( xi ), Φ( x). annealing schedule acting like the cut-off points and. in the feature space is usually considered to be a. ρ (⋅). r r f ( x ,θ ) =. P. ∑β. r r Φ ( xi ), Φ ( x ) + b =. i. i =1. P. i. r r r i. i =1. r r K ( xi , x ) .. kernel function. The. kernel. function. the ARRBFNs at epoch t ,. is a logistic loss function and defined as. [. ]. ρ ej;β =. determines the smoothness properties of solutions. is a deterministic. β (t ). β   e2j ln 1 + 2   β.  .   . (11). and should reflect a prior knowledge on the data. In. Based on the gradient-descent kind of learning. this paper, the Gaussian function is used as kernel. algorithms, the synaptic weights wi , the centers mi. function. The coefficients β i in (8) can be solved by. and width σ i of Gaussian function are updated as. quadratic. programming. methods. with. suitable. ∆wi = −η. transformation of the above problem into constraint optimization problems and properly rearranging the. r r f ( x ,θ ) =. ∑ w K (x , x;θ )+ b . SV. i. r r r. (9). i. ,. (12). ,. (13). ,. (14). learning. constant,. j. j =1. i. P ∂ ej ∂ E ARBP = −η ϕ (e j ; β ) ∂ mi ∂ mi j =1. ∆σ i = −η. P ∂ ej ∂ E ARBP = −η ϕ (e j ; β ) σi ∂ σi ∂ j =1. where. can be rewritten as. ∂ ej. P. ∑ ϕ (e ; β ) ∂ w. ∆mi = −η. equation into a matrix form. Note that only some of β i ’s are not zeros and the corresponding vectors r x i ’s are called the support vectors (SVs). Hence, (8). ∂ E ARBP = −η ∂ wi. (. ). ∑. ∑. η. is. a. ϕ e j ; β = ∂ ρ (e j ; β ) ∂ e j. is usually called the. influence function. When outliers exist, they have. i =1. r where SV is the number of SVs, x i are support. great impact on the approximated results. Such an. vectors and wi = β i for some of β i ≠ 0 . If the kernel. impact can be understood through the analysis of the. function is chosen as Gaussian function, then (9) is. influence function. The using loss function (12) and. equivalent to (1). That is, the SV,. and. its influence function in this papers are shown in. r θ ∈ {mi , σ i } can be represented as the number of. wi. figure 2. In the ARBP learning algorithm, the. Gaussian functions L, the synaptic weights and the. properties of annealing schedule β (t ) have (A). parameters of Gaussian function, respectively.. β initial , β (t ) for first epoch, has a large values; (B).

(6) β (t ) → 0+ for t → ∞ ; (C) β (t ) = k / t for any t. ε -insensitive loss function. Based on the initial. epoch, where k is constants [8].. structure of ARRBFNs, the testing RMSE of. 4. Simulation Results. ARRBFNs is 0.0683 and 0.1012 for. In this section, simple example is tested to. 0.15,. respectively.. Then,. the. ε = 0.1 and. parameters. of. verify of the proposed ARRBFNs approach. The. ARRBFNs are adjusted by the ARBP learning. simulations. Matlab. algorithm, the number of epochs are needed as 156. environment. The support vector machine toolbox. and 323 under the testing RMSE < 0.01 for. provided by the Steve Gunn and obtained through. ε = 0.1 and 0.15, respectively. The final result of. network service is used here. The root mean square. ARRBFNs under the testing RMSE < 0.01 is shown. error (RMSE) of the testing data is used to measure. in the Figure 4. Besides, two errors convergence. the. curves are shown in figure 5. From this example, the. were. performance. conducted. of. the. in. the. learned. networks. (generalization capability). The RMSE is defined as N. RMSE =. ∑ (yˆ i =1. − yi ). initial structure of ARRBFNs is obtained by the SVR approach that also provides a better initialization of. 2. i. ,. (15). N. ARBP learning algorithm. Hence, the proposed ARRBFNs have fast convergence speed.. yi is the desire value at xi and yˆ i is the ARRBFNs output given xi as its input. The learning constant η is chosen as 0.01 in the. where. 5. Conclusions In this paper, we propose ARRBFNs approach to improve the RBFNs for modeling with outliers. In. simulation. Now, the sinc function is considered as. the proposed approach, we use the SVR approach as. [9]:. the initial structure of ARRBFNs. Then, we apply y=. sin( x ) x. with x ∈ [− 10, 10].. (16). ARBP. learning. algorithm. to. improve. the. performance of ARRBFNs. Based on the initial. 51 training data set are generated from (16) and three. structure by SVR approach, the ARRBFNs have a. artificial outliers are added. After training, another. fast convergence speed. Simulation results are. 201 testing data set are used for evaluating the. provided to show the validity and applicability of the. performance of ARRBFNs.. proposed ARRBFNs.. In the ARRBFNs, the initial structure of. References. ARRBFNs is firstly obtained by the SVR approach.. [1] J. Moody and C. J. Darken, “Fast learning in. In the SVR approach, those required parameters are. networks of locally-tuned processing units,”. set as C=3, Gaussian kernel function with. σ=3. Neural Computation, vol. 2, pp 281-294, 1989.. ε = 0.1, 0.15 . Two initial structures of. [2] S. F. Su and S. R. Huang, “Analysis of Model-. ARRBFNs with the hidden nodes (i.e. the number of. Free Estimators - Applications on Stock Market. ε = 0.1 and. with the use of Technical Indices“, Master thesis,. and. SVs) are obtained as 12 and 11 for. 0.15, respectively. These initial results SVR for ARRBFNs are shown in figure 3. From the figure 3,. NTUST, 1999. [3] J. Park and I. W. Sandberg, “Approximation and. it is clear that the hidden nodes and initial structure. Radial. (i.e. initial testing RMSE) of ARRBFNs are. Computation, vol. 5, pp. 305-316,1993.. controlled by. ε in the using SVR approach with. Basis. Function. Networks,”. Neural. [4] D. M. Hawkins, Identification of Outliers,.

(7) Chapman and Hall, 1980. [5] V. David Sanchez A., “Robustization of learning. G1 w1. method for RBF networks,” Neurocomputing 9, G2. pp. 85-94, 1995. [6] C. C. Lee, P. C. Chung, J. R. Tsai and C. I. Chang, “Robust. Radial. Basis. Function. ∑. w2. x. f(x). Neural. Networks,” IEEE Trans. on Systems, Man, and. wL. GL. Cybernetics, vol. 29, no. 6, pp 674-685, 1999. [7] D. S. Chen and R. C. Jain, “ A Robust Back. Figure 1: The structure of RBFNs is shown.. Propagation Learning Algorithm for Function Approximation, “ IEEE Trans. Neural Networks, vol. 5, no. 3, pp. 467-479, 1994.. 1.2 1. [8] C. C. Chuang, S. F. Su and C. C. Hsiao, “ The Annealing. Robust. Backpropagation. Learning. Algorithm,”. loss function 0.8. (BP). 0.6 0.4. IEEE. Trans.. Neural. 0.2. Networks, vol. 11, no. 5, pp. 1067-1077, 2000.. 0 -0.2. [9] V. Vapnik, The nature of statistical learning. influence function -0.4. theory, Springer-Verlag, 1995.. -0.6 -3. -2. -1. 0. 1. 2. 3. [10] A. K. Jain and R. C. Dubes, Algorithms for Clustering. Data,. Englewood. Cliffs,. NJ:. Figure 2: The logistic loss function and its influence function are shown.. Prentice Hall, 1988. [11] M. J. L. Orr, Introduction to Radial Basis Function Networks, University of Edinburgh,. 1. 1996.. 0.8. [12] P. J. Rousseeuw, and M. A. Leroy, Robust. 0.6. ε = 0.15. Regression and Outlier Detection. Wiley, 1987.. 0.4 y. [13] K. Liano, “Robust Error Measure for Supervised Neural Network Learning with Outliers,” IEEE. 0.2. 0. -0.2. ε = 0.1. Trans. Neural Networks, vol. 7, no. 1, pp. 246250, 1996.. -0.4 -10. -8. -6. -4. -2. 0. 2. 4. 6. 8. 10. x. Figure 3: The training data points and two initial results of proposed ARRBFNs using SVR approach are represented as ‘+” and “-“, respectively..

(8) 1.2. 1. 0.8. 0.6. 0.4. 0.2. 0. -0.2. -0.4 -10. -8. -6. -4. -2. 0. 2. 4. 6. 8. 10. Figure 4: The final results of ARRBFNs under testing RMSE < 0.01 is shown.. 0.08. 0.07. 0.06. 0.05. 0.04 RMSE 0.03. ε = 0.15. 0.02. 0.01. ε = 0.1. 0 0. 50. 100. 150. 200. 250. 300. 350. 400. epoch. Figure 5: Error convergence curves of ARRBFNs are shown..

(9)