Fuzzy perceptron neural networks for classifiers with numerical data and linguistic rules as inputs

(1)

Fuzzy Perceptron Neural Networks for Classifiers

with Numerical Data and Linguistic Rules as Inputs

Jia-Lin Chen and Jyh-Yeong Chang

Abstract—This paper presents a novel learning algorithm of fuzzy perceptron neural networks (FPNNs) for classifiers that uti-lize expert knowledge represented by fuzzy IF-THEN rules as well as numerical data as inputs. The conventional linear perceptron network is extended to a second-order one, which is much more flexible for defining a discriminant function. In order to handle fuzzy numbers in neural networks, level sets of fuzzy input vec-tors are incorporated into perceptron neural learning. At different levels of the input fuzzy numbers, updating the weight vector de-pends on the minimum of the output of the fuzzy perceptron neural network and the corresponding nonfuzzy target output that indi-cates the correct class of the fuzzy input vector. This minimum is computed efficiently by employing the modified vertex method to lessen the computational load and the training time required. Moreover, the pocket algorithm, called fuzzy pocket algorithm, is introduced into our fuzzy perceptron learning scheme to solve the nonseparable problems. Simulation results demonstrate the effec-tiveness of the proposed FPNN model.

Index Terms—Fuzzy classifiers, fuzzy functions, perceptron learning.

I. INTRODUCTION

I

N solving a problem, most scientific algorithms adopt a crisp, or nonfuzzy, discipline and make use of only numerical data because numerical data are easily processed by computers. In this conventional approach, the exclusive processing domain is purely numerical. The number-based approach is usually significant when numerical data are pre-cise enough and representative to the system behavior. This approach usually lacks an ability to model the uncertain or am-biguous information existing among data, which is, however, so often encountered in the real world. On the other hand, humans make many successful decisions and/or judgments primarily on the basis of approximate and/or conceptual information, which is usually uncertain, imprecise, and frequently stated in terms of linguistic terms or rules. Fuzzy set theory has been introduced [1]–[3] to model the uncertain and/or ambiguous characteristics inherent among the data and these character-istics being defined by suitable fuzzy sets and rules are then inferred to reason the useful happening of the result. Since its inception, the research of fuzzy logic has been the focus of various fields and has demonstrated many fruitful results both in theory and application [4]. It can be easily observed that

Manuscript received April 15, 1998; revised October 20, 1999. This work was supported in part by the National Science Council under Grant NSC 85-2213-E-009-119, Taiwan, R.O.C.

The authors are with the Department of Electrical and Control Engi-neering, National Chiao Tung University, Taiwan 300, R.O.C. (e-mail: jychang@cc.nctu.edu.tw).

Publisher Item Identifier S 1063-6706(00)10692-7.

these two paradigms, number-based nonfuzzy approach and fuzzy-logic-based approach, solve problems from different, i.e., in a sense almost complementary, viewpoints. To benefit from these two approaches, a combined routine that integrates both the numerical computation and fuzzy techniques would be more effective than merely uses either one of them. Due to their complementary natures in the way of solving a problem, the integrated scheme will affect the algorithmic routine cooperatively and efficiently and, hence, will enhance the system performance further. As a result, a hybrid paradigm of neuro-fuzzy integration has been a growing area of research in both the academic and industrial communities and has become prevailing in the context of pattern recognition, decision support system, control system applications, and many others [5].

In particular, the realm of designing a classifier system still parallels the above lines of thought. Conventionally, classifier design through numerical data learning is the general approach we have commonly used directly and naturally. For instance, the backpropagation (BP) approach [6], [7] and genetic algo-rithm [8], [9] are widely used in synthesizing a classifier under the framework of neural models. But these learning activities via datum learning, sequentially count each pattern instance equally without regard to the inherent difference present among the patterns, whereas in another fuzzy-logic-based classifica-tion, paradigms emerge recently because they can manipulate the various types of uncertain or ambiguous nature exhibited among the data and can tackle the real-world problems in a manner more like humans. Broadly, the number-based approach proceeds the learning for classification primarily from numer-ical patterns, but ignores the difference between numernumer-ical data, i.e., a collective attribute of the data. Fuzzy set theoretic design conveys the conceptual layout of classifying numerical data given, but considers little on the information in each datum singly. The weakness and lack of collective aspect of the data of a number-based traditional approach is the strength, containing the collective nature of the data set of a fuzzy-logic-based approach. On the contrary, the weakness, lack of the individual nature of each datum of a fuzzy-logic-based approach is the strength, including the character of each training pattern of a number-based approach. To circumvent the defect in the use of the above approaches singly, it is advantageous to integrate these two paradigms together because the weakness of one approach can be counterbalanced by the strength of the other approach and vice versa. Using such a hybrid scheme, number-based and fuzzy-logic-based can enrich the very basic ideas in the framework of classification and can constitute a fundamental ingredient of advanced and successful topology of the classifier.

(2)

As was noted before, for connectionist model-based clas-sification systems presented in the literature, most learning procedures utilize and process numerical data only, i.e., each pattern instance is trained sequentially and equally, but the mutual difference existing among them is ignored during the course of training. If, however, other pieces of classification knowledge, especially concerning the nature of patterns in a set, can be included as a part of inputs and then learned by the training procedure, the defect of discarding the pattern difference in learning can be minimized. For instance, fuzzy IF-THEN rules that describe the relation between pattern fea-ture attributes and numerical data in a set or category could be one of many classification domain knowledges that are useful and could be added to describe the system. Linguistic values such as “small,” “medium,” and “large,” are typical and helpful linguistic terms to be defined for specifying patterns in a cate-gory by the forms of rules. Under the integrated formalism, the included fuzzy IF-THEN rule inputs will reflect the conceptual layout of the classification problems in a higher level and/or altogether viewpoint. Such conceptual extension definitely enlarges the range of the classification problems and removes the weakness found in the instance training itself. Through inte-grating fuzzy notions into a traditional number-based classifier, the combined learning scheme will be trained by two kinds of inputs, numerical data of training patterns and structured data of fuzzy classification rules; and they are complementary in nature. Consequently, these two kinds of inputs will affect the learning routine cooperatively and efficiently and the overall classification performance will be enhanced. Such judicious integration matches the increasing trend of deriving a new formulation that can embrace classification schemes involving hybrid numerical and linguistic computation, which is noted in a recent literature review [10].

In the literature to date, two approaches are available for a classifier dealing with linguistic rules and crisp data together. One paradigm extracts fuzzy IF-THEN rules from numerical data and then these deduced rules together with the given lin-guistic knowledge of rules are combined to execute the classifi-cation by fuzzy inference. Corresponding to this paradigm, Wei et al. [11] proposed the additive fuzzy logic classifier (AFLC). The AFLC, a direct design scheme without the training phase, does not require a significant amount of learning time needed for a neural-based classifier. However, this approach has some limitations. If only fuzzy IF-THEN rules are used as inputs, the classification results depend on the membership functions of the linguistic labels defined in the if part. Hence, if there exist some regions that are not covered by any linguistic labels of IF-THEN rules, then there is no information to determine an input point in that region to be in which class it belongs to. The other paradigm extends fuzzy notions into the neural network learning for the linguistic rules and then trains all the numerical data and rules by the neural model in a conventional manner [12]–[14]. This ap-proach appears to be more attractive because the neural learning is fused into fuzzy data processing and can learn and generalize from training patterns and fuzzy IF-THEN rules. Following this formalism, Ishibuchi et al. [12] proposed a multilayer feedfor-ward neural network to explore the neural learning including fuzzy sets. The learning algorithm is the fuzzy extension of the

backpropagation algorithm, referred to hereafter as the FBP al-gorithm. Based on the extension principle, the learning formula for -levels of input fuzzy sets are explicitly derived. However, drawbacks of the BP algorithm, such as converging to local minima and/or slow learning convergence, still persist in this BP-based scheme. The shortcoming of being apt to converge to a local minimum causes the FBP algorithm to frequently con-verge to an inaccurate solution. Also, slow learning concon-vergence leads to a long training time required.

Also in the context of neuro-fuzzy hybrid computing para-digm, a fuzzy neural classifier [15] based on the multilayer per-ceptron structure and the backpropagation learning algorithm is described. Through converting the numerical/linguistic in-puts into larger overlapping linguistic partitions, this model also shows the same feature of capable of handling input vectors pre-sented in quantitative and/or linguistic form, but demonstrates different output forms of providing outputs of soft belonging-ness in terms of degrees of confidence among belonging classes. In this method, the components of the input vector consist of the membership values to the overlapping partitions of linguistic properties, “low,” “medium,” and “high,” corresponding to each input feature. When the input feature is linguistic, its corre-sponding membership values of the three linguistic terms are quantified as fixed values. The desired output is a membership value denoting the degree of belonging of the input vector to that class. This procedure of assigning fuzzy output membership values, instead of the conventional crisp binary output values, enables this model to be more efficient in classifying ambiguous data with overlapping class boundaries. An extended applica-tion of the above scheme is further considered to design a con-nectionist expert system [16]. In this expert system, the user could be queried for the more essential feature information in case of partial inputs. This expert system also provides justifi-cation in the form of rules for any inferred decision.

A most general neuro-fuzzy computing scheme, which is still embedded in a multilayer perceptron structure, was proposed by Hayashi et al. [17]. In this fuzzy neural model, both the input/output signals and weights are all fuzzy sets. They pre-sented a fuzzified delta rule for learning, however, a method to implement this learning algorithm is still not known. They also argued that a learning algorithm based on -levels of the error measure is too complicated and may sometimes fail. The difficulty of deriving such general fuzzy functional algorithm through -levels is obvious.

To provide an efficient and reliable solution, proposed in this article is a new fuzzy neural classification model, which is in-stead subsumed with crisp outputs and weight parameters (and thus is not as general as the model of Hayashi) and allows inputs either in numerical and/or fuzzy forms. The proposed model is a neural-based learnable classifier, called fuzzy perceptron neural network (FPNN), and its learning scheme is successfully de-rived based on -level concept.

The perceptron algorithm [18], [19], a conventional iterative training algorithm, guarantees to determine a linear decision boundary separating the patterns of two classes in a finite number of steps if these patterns are linearly separable. For the linearly nonseparable patterns, Gallant [20], [21] introduced the pocket algorithm to optimally dichotomize the given patterns in

(3)

the sense of minimizing the erroneous classification rate. The pocket algorithm is structurally resembled to a conventional perceptron learning except that a checking amendment to stop the algorithm has been added. In light of this concern, this paper incorporates fuzzy sets into a perceptron learning algorithm to enhance the perceptron neural network, which, in addition to handling numerical data, can also handle linguistic knowledge. To avert the limitation of producing a linear boundary by the conventional perceptron, this work introduces a more flexible and simple (under the constraint of limited increase in param-eters) boundary by extending the linear discriminant function to a higher order one and, hence, allows a nonlinear separating hyperplane to be generated to tackle nonlinear separability. To this end, we propose a second-order fuzzy perceptron neural network that can handle fuzzy vectors, in a form of fuzzy IF-THEN rules as well as numerical samples as inputs. Based on the level sets of fuzzy numbers, the learning procedure of the fuzzy perceptron network is analyzed. Moreover, the vertex method is modified and applied to find the minimum of the fuzzy discriminant function, whose value indicates whether or not a learning update of the perceptron weight vector should be executed. It is to be noted that in an earlier paper [14], we have proposed a scheme based on level concept and an optimization technique, but it requires much more computational effort in getting the extreme points iteratively. Intensive computational effort needed in the previous paper is greatly reduced by intro-ducing the vertex method in this paper. The pocket algorithm is finally generalized to the fuzzy domain so that the proposed fuzzy perceptron model can copy with a nonseparable case.

It is to be remarked that perceptron learning with a fuzzy membership function can also be found in the literature. Keller and Hunt [22] introduced fuzzy set techniques into the single-layer perceptron algorithm for two-class classification problem. This algorithm assigns fuzzy membership functions to input data to reflect their geometrical proximity to the means of class 1 and class 2 before training the perceptron classifier. This fuzzy perceptron learning scheme can improve the convergence significantly especially when the crisp data are overlapping. The concept and content realized in [22] is quite different from FPNN because it deals with crisp input data only and these data are artificially imposed by membership functions for faster convergence.

The rest of this paper is organized as follows. Section II re-views the concepts of fuzzy function and the extension principle that is employed for analyzing fuzzy functions. Section III in-troduces the fuzzy perceptron neural networks. Their learning schemes are thoroughly described as well. In Section IV, several numerical examples and the two-spiral benchmark data are sim-ulated. Performance comparisons of the proposed model with other related approaches are summarized by statistical perfor-mance evaluation indexes computed from the simulation results. Concluding remarks are finally made in Section V.

II. FUZZYFUNCTION AND THEEXTENSIONPRINCIPLE Since our proposed fuzzy perceptron neural network relies heavily on the evaluation of fuzzy function, it is instructive to describe the derivation of the fuzzy function briefly. This section

will start with introducing the extension principle, which is the rationale behind evaluating the fuzzy function.

The extension principle [3] is the most important fuzzy set theory that provides the generalization procedure of mapping between fuzzy sets. In light of this principle, algebraic operation on real numbers can be extended to fuzzy numbers, i.e., convex and normal fuzzy sets. According to the extension principle, for

a fuzzy multivariable function, of fuzzy

variables , i.e.,

(1) the membership function of can be expressed as

(2) The computation and algorithm involved in implementing (2) is not trivial to the fuzzy set with a continuous universe. A simple and intuitive way is using the discretization technique [23] in the variable domain. However, if the value of the discretized size is not properly selected, this technique would fail and lead to an irregular and inaccurate result [24], [25]. Consequently, pre-vious investigations [26]–[28] proposed methods for computing fuzzy function, based on the -level concept. The -level set is much more effective as a representation form of fuzzy sets since it is a discretization technique on membership value domain of variables, instead of on variable domain themselves. The ab-normality of using the conventional discretization on variable domains can be averted by performing the fuzzy function on -levels. The fuzzy function using -level concept is illustrated in the following.

For any , the -level sets, i.e., cuts, of the fuzzy set are defined as follows:

for (3)

where denotes an -level set of a fuzzy set. Furthermore, if is a continuous function and fuzzy sets are upper semicontinuous, then the following holds1 _{[29, p. 39]:}

for (4)

The relation above paves a simpler way to compute the value of a fuzzy function compared with applying the extension principle of (2) directly. In the following, the fuzzy functions encountered in the FPNN learning will be computed by (4) because the above assumptions required by the function and fuzzy sets involved in the FPNN classification tasks are generally satisfied.

III. FUZZYPERCEPTRONNEURALNETWORK

This section first describes the fuzzy IF-THEN rules for classification problems. The proposed neural-based network, i.e., the fuzzy perceptron neural network (FPNN), is then introduced. The FPNN can perform a classification task using not only numerical patterns but also fuzzy IF-THEN rules as 1_If_{[1] denotes specifically the support of the interval with nonzero} member-ship degrees of a fuzzy set, then (4) holds forh = 0 also.

(4)

Fig. 1. The architecture of a fuzzy perceptron neural network.

inputs. Next, the vertex method [30] is modified to obtain the minimum of a fuzzy function; this function is the discrim-inant function of the FPNN. The extremum of the function determines whether or not the coefficients of the discriminant function should be modified in a training iteration. Finally, the fuzzy pocket algorithm is developed to provide a stop criterion for the fuzzy perceptron neural learning.

A. Structure of the Fuzzy Perceptron Neural Network

Based on the perceptron neural network structure, we shall construct a two-class classification system that can also accept fuzzy IF-THEN rules as inputs besides numerical pattern data. Fuzzy IF-THEN rules utilized for the classification problem [12] are given as follows:

If is and and is

then belongs to

(5) where

linguistic label;

either class 1 or class 2; number of rules given.

To prevent the discriminant function of the FPNN from passing through the origin only, we can augment the input vectors by

including the threshold constant . Because

is a constant of one, it is considered as a fuzzy singleton to the input vector . With this augmentation, (5) can be generalized and simplified as

belongs to

(6) where denotes a fuzzy vector.

With the crisp data being regarded as fuzzy numbers of singleton, numerical input data to be classified are consid-ered as a special form of linguistic knowledge represented by (6), in which s are all fuzzy singletons. Therefore, (6) can accommodate the crisp input data set given as well. By this setting, the FPNN is designed, as shown in Fig. 1,

so that the augmented -dimensional fuzzy vectors

are classified. Each input can

be either the linguistic term represented by fuzzy numbers or fuzzy singletons of crisp data. It follows from (4) that the level sets of the output of a fuzzy function can be propagated through the neural network. Fuzzy set is represented by

its level sets: , where

and represents the number of level sets sampled. At these levels, input–output relations of the neural network are derived as follows. At the -level of the fuzzy input vector , let level set denotes the interval input vector of . Then, the fuzzy perceptron neural network for the th fuzzy IF-THEN rule is defined by

Input units

(7) (8) Output unit

(9)

where and , respectively, represent the signum

ac-tivation function [5, p. 208] and the input of output neuron as defined in the following.

Owing to the quest for a nonlinear discriminant boundary rather than just a linear one, this work uses a second-order perceptron neural network [31], [32]. Subject to a negligible increase in the number of parameters utilized for perceptron learning, a second-order discriminant function can produce various quadratic curves, paraboloids, ellipsoids, and hy-perboloids, by varying the coefficients to meet the needed curvature. This flexibility can hopefully accommodate most of the nonlinear boundary needed to discriminate the hybrid data sets. Without loss of generality, the second-order FPNN is illustrated by a two-dimensional (2-D) input vector, which is

augmented to in which denotes the

fuzzy singleton equal to one. At the -level of the given 2-D input vector , let represents the interval vector of ,

i.e., ].

The weighted sum of a second-order perceptron neuron for the input vector is given by

(10) For the interval input vectors with the corresponding target output which is either 1 of class 1 or of class 2, the clas-sifier is required to find the perceptron weight vector

so that

(5)

As mentioned in Section II, the fuzzy function calculation by the use of the extension principle is equivalent to the -level-based evaluation by (4) that involves interval arithmetic. Some fundamental properties regarding interval arithmetic [30] are therefore summarized as follows.

Assuming that , , and are interval numbers, we have the following.

Associativity:

(12) Commutativity:

(13) However, distributivity does not always hold. Instead, we have the following.

Subdistributivity:

(14) Above equation hints that after the distributing operation, the re-sultant range of interval would be enlarged. Distributivity fails because two occurrences of an identical interval number in the right-hand side of (14) are treated as two independent in-terval numbers. With this subdistributivity property in mind, we can see that deriving the interval of (10) is complex since the input variables, and of the second-order fuzzy per-ceptron neural network appear more than once and should be treated as linearly dependent. Hence, to evaluate of (10) in a manner as direct interval arithmetic computation in the right hand side of (14) is inappropriate.

To satisfy inequality (11) at each -level of the fuzzy input vector, we have to invent a scheme that can search for the ex-tremum of the function for a given weight vector . If the target output is class 1, then we set and we will

find the minimum . On the contrary, we set and

we will find the maximum . In fact, only the constrained minimization must be found because for the case , the maximization is transformed to the minimization by negating the function. At the -level of fuzzy sets involved in the th IF-THEN rule, the problem becomes to find the minimum of

with the constraints and , which

define the subspace of the 2-D parameters and are called the

fea-sible region of and . Notably, Net and are and

dependent, i.e., , and .

How-ever, for simplicity, these two scripts are not explicitly shown in Net and . Although our previous work [14] derived an optimization scheme to solve the above problem, it is, however, computationally expensive. In the sequel, a new computation-ally efficient technique, named the modified vertex method is introduced to solve this constrained minimization problem. B. Fuzzy Perceptron Learning by the Modified Vertex Method

As mentioned earlier, solving a constrained minimization problem is necessary for the fuzzy perceptron neural learning.

According to our observation, this minimization problem can be solved more efficiently by modifying and applying the vertex method. Details of this approach are as follows.

1) Vertex Method: Let be an -dimensional interval func-tion given by

(15) where

(16) Function value is also an interval number. These interval variables form an -dimensional hypercube

with , i.e., , vertices. All vertices’ coordinates of the interval function are a combination of pairs of end points of interval numbers. According to the vertex method [30], these vertices in the -dimensional space are critical to calculate the interval of a function of interval variables. Essential properties of the vertex method are given as follows.

For a continuous and differentiable function in the -di-mensional hypercube, if has no extreme point, i.e., a point with a differential value equal to zero, in the feasible region, interval of the function in the defined domain (including the boundaries) can be obtained by

(17) where and denote the th and th vertices of the vertices of the hypercube.

The vertex method is effective only when the conditions of continuity and no extreme point existing in the region are sat-isfied. Furthermore, if extreme points of the function exist in the feasible region, these extreme points must also be checked to obtain the minimal value. That is, suppose that function has extreme points, then interval calculation of (17) is extended to

(18) where and denote the th and th extreme points.

In light of results above, how to determine the minimum of with

the constraints and is considered

again. Fig. 2 depicts the case that the extreme point, denoted as , exists in the feasible region, while Fig. 3 shows the case that the extreme point is outside the feasible region. According to our study, the vertex method should be modified when it is exploited in our FPNN model. It can be observed from Figs. 2 or 3 that the minimal point of a Net function could also be located on the boundary of the fuzzy set in addition to the vertices and extreme point. In response to this modification, an iterative searching process by the bisection method [33, ch. 2]

(6)

Fig. 2. Searching for the minimum ofNet(x ; x ), x x x , x

x x . The extreme point (x ; x ) is in the feasible region.

Fig. 3. Searching for the minimum ofNet(x ; x ), x x x , x

x x . The extreme point (x ; x ) is not in the feasible region.

is introduced. For the region defined by the and inter-vals, four boundaries should be checked for the minimal point. For any one of the four boundaries, function is reduced to an one-dimensional function , is either 1 or 2, with the other variable remaining fixed at its corresponding lower or higher value. For convenience, allow the interval of the changing

vari-able to be defined in , i.e., can be either or

.

2) Bisection Method: To search for the minimal value of the quadratic function above, with the given interval and the maximum number of iterations , we proceed the following steps.

Step 1) Set .

Step 2) While , perform Steps 3)–5).

Step 3) Set .

Step 4) .

Step 5) If , then , else .

Step 6) The minimum of is .

The above bisection process is operated on these four bound-aries sequentially to get their respective minima and then the smallest value is selected from these four minima obtained. Finally, the minimal Net value equals if there is no ex-treme point in the feasible region, otherwise is the min-imum of and the Net values evaluated at the extreme points in the feasible region. In addition, the corresponding minimal

point is denoted as . Parameter determines how

accurate the minimal point is. For instance, in this paper, six iterations are selected to locate the optimal point

leading to an accuracy of . For the -level of the th

fuzzy input vector, if , then the second-order

fuzzy perceptron learning algorithm with a learning constant updates the weight vector by

(19) where

(20) The above learning procedure does not stop until inequality (11) holds for all -levels of rules and all the crisp training patterns as well. The learning step size of the above fuzzy per-ceptron algorithm is proportional to the current -level. A larger membership degree implies a larger learning step size. This learning procedure can accept not only fuzzy IF-THEN rules, but also crisp data since real numbers can be regarded as fuzzy singletons and the corresponding -level is assumed to equal one. If the training data are crisp, the proposed second-order fuzzy perceptron network reduces to the conventional second-order perceptron network. Furthermore, the proposed scheme is quite general since it can be easily extended to the third- or higher order fuzzy perceptron model in the same manner. From this perspective, the proposed second-order fuzzy perceptron al-gorithm can be viewed as an extension of the conventional per-ceptron learning to the case of fuzzy rules as inputs.

C. Fuzzy Pocket Algorithm

Perceptron learning is quite appropriate for separable prob-lems, i.e., problems for which some set of weights is available that correctly classifies all training patterns after a finite number of mistakes. Nonseparable problems are a different story. The fact that no set of weights can correctly classify all training patterns implies that a set of weights, which correctly classi-fies as large a fraction of the training patterns as possible is preferred. Pocket algorithm is developed to determine a set of weight in this optimal sense [20]. In line with such optimal sense, a fuzzy pocket algorithm is developed to resolve the non-separable problem encountered in fuzzy perceptron learning.

This work modifies the pocket algorithm to effectively ad-dress the nonseparable case for our fuzzy perceptron algorithm such as overlapping fuzzy numbers or the input data cannot be dichotomized by a second-order discriminant function. The pocket algorithm adds additional steps to monitor the

(7)

performance of the perceptron network. For training patterns of crisp data, the performance is measured on the basis of the correct classification among training patterns. The pocket algorithm must be modified so that it can be applied involving fuzzy input vectors. This modified pocket algorithm called fuzzy pocket algorithm should optimally dichotomize the fuzzy input vectors of linguistic terms as follows.

1) The total misclassified membership values of the two classes should be as small as possible.

2) The difference between the maximal misclassified mem-bership values (the highest misclassified memmem-bership value of overlapping fuzzy input vectors for each class) of these two classes should be as small as possible. At the -level of fuzzy numbers, the following index, mem index, should be minimized to find an optimal weight vector that can realize the above two statements

mem index mem mis

mem dif (21)

where mem mis denotes the sum of the membership levels of those training vectors which cannot be correctly classified by the discriminant function and mem dif represents the abso-lute value of the difference between the maximal misclassified membership levels of the two classes. To be fair to both mis-classified classes occurring in overlapping fuzzy input vectors, the decision boundary should be located at a point that the max-imal misclassification membership functions of the misclassi-fied classes are as nearly equal as possible and this justifies the requirement of a minimal mem dif value. Parameter is the index of iteration number in the training procedure, whose value goes from one up to the epoch number we selected. Constant , a value greater than one but very close to one, increasingly emphasizes mem dif in accordance with an increasing number of iterations. Constant , which can certainly be chosen by the user, is introduced for the relative weighting of mem mis and mem dif . By our experience, one, two, or three are suitable choices for constant . In the numerical simulations presented later, we used for all the illustrative examples. In this manner, the fuzzy pocket algorithm identifies a weight vector that can minimize mainly the cumulative misclassified mem-bership values of fuzzy numbers initially and then searches for a weight that cannot only minimize the misclassification error but also minimize the difference between maximal misclassifi-cation membership values. Notice that in the case of nonover-lapping fuzzy numbers, mem index is calculated from mem mis only, while mem dif is neglected. In a manner resembling the pocket algorithm, we save “in the pocket” of the weight vector with the smallest mem index in the fuzzy perceptron algorithm. This simple modification facilitates fuzzy perceptron learning well behaved.

The relative weighting between crisp data and linguistic rules on calculating mem mis is worth mentioning. This relative weighting is subjective and, naturally, more reliable information (either crisp data or linguistic rules) should be more

empha-sized. In this paper, six -levels, i.e., ,

and , are used in the fuzzy perceptron learning algorithm, whereas for crisp data, only -level of one is assigned. On

mem mis calculation, the relative weighting for a crisp datum is chosen three times that of a fuzzy IF-THEN rule. This relative weight compensates the crisp data for using only -level of one instead of the six -levels of rules. By this setting, we count equally on numerical data and the linguistic rules because the sum of these six levels equals three.

To summarize, the second-order fuzzy perceptron learning with fuzzy pocket algorithm is as follows.

1) Set to a small and random vector.

2) Let be the current weight. Select a training fuzzy input

vector .

3) If correctly classifies , then

a ) If the current run of the mem index is smaller than the run of mem index in your pocket, then put in your pocket and remember the mem index of its run. else form a new set of weights by fuzzy perceptron learning.

4) If the specified number of iterations has not been taken or the specified mem index has not been reached, then go to 2); otherwise, stop the iteration.

D. Multiclass Classification

To be more general than just dealing with two-class classifi-cation [14], the proposed FPNN model can be extended to solve the multiclass classification problems by increasing the number of discriminant functions equal to the number of classes to be classified. For a -class classification problem, we define

dis-criminant functions with weights .

At the -level of the th fuzzy input vector , if be-longs to class , then value should be the largest among the discriminant functions. If, however, for some , we have , then the updating rules for weight vectors are given by

for (22)

where is in the same form of (20).

For a multiclass nonseparable problem, the modified pocket algorithm should be amended to suit the concept mentioned in the above subsection. The mem index should be revised to the following form:

mem index mem mis

mem dif (23)

where mem mis is the sum of erroneous membership levels

of the th class patterns, i.e., , for all

-levels of those th rules defined for the th class but erro-neously classified to the th class . The mem dif in (23) is the absolute value of the difference between the maximal mis-classification membership levels of classes and in the class overlapping region. Via this modification, the fuzzy pocket

(8)

algorithm can be extended to multiclass classification problems in a manner analogous to two-class problems.

IV. SIMULATION

Simulations were performed not only to verify the effective-ness of the proposed fuzzy perceptron neural network, but also to compare with that obtained by FBP and AFLC algorithms. In this section, two simulations were presented. In Simulation 1, all the fuzzy IF-THEN rules for classification problems were initially given. Four testing examples were provided; the first three examples were the two-class classification problems and the final one addressed three-class classification. Notice that the results of Examples 2 and 4 by AFLC were not presented because these two examples are not solvable by AFLC since undefined regions of fuzzy inputs exist. In Simulation 2, we tested the proposed algorithm on the neural network bench-mark problem, two-spiral classification proposed by Lang et al. [34]. In this simulation, we generated IF-THEN rules from the two-spiral data to underline the existing regularity among sam-ples. The FPNN classification boundaries were improved by in-corporating these rules generated and crisp data as inputs. A. Simulation 1

Due to the nondeterministic learning nature of FBP and FPNN each of these four examples was run for 200 design trials. Based on these running results, classification perfor-mance in terms of several statistical indexes will be provided in the subsequent subsection. For these two algorithms, a learning cycle of each pattern and every fuzzy IF-THEN rule being presented once constitute an epoch of learning iteration.

In running the four examples by the FBP algorithm, the feed-forward neural network was structured with one hidden layer of five hidden units; these network structures and the number of epochs were chosen as recommended (except Example 3) by [12]. In our experience, the FBP algorithm cannot easily con-verge to satisfactory solutions. As a consequence, there are few satisfactory decision boundaries obtained in the 200 trials of each example. Therefore, in the following illustrative examples, the best decision boundary (in the sense of minimal error of FBP [12]) of each example was selected from 200 design trials and then plotted.

For FPNN, the following examples were simulated by

em-ploying six levels of , i.e., , and for

the linguistic values in the fuzzy perceptron learning algorithm. Note that to make the zero level of the fuzzy number effective

hinges on replacing with (a small positive

value) whenever updating from (19) or (22), as well as com-puting the mem index of (21) or (23). In running FPNN with 200 design trials for each example, almost all decision bound-aries obtained are similar to the best one (still sticking to the sense of minimal mem index). Also, we plotted the best decision boundary of FPNN algorithm for each of these four examples.

Example 1: In line with Ishibuchi et al. [12], this work

de-signed a two-class classifier on a pattern space .

The numerical data are

class 1 (24)

class 2 (25)

Fig. 4. The membership functions of the linguistic values “small” and “very large” in Example 1.

Fig. 5. A simulation result of learning with only numerical data by the proposed second-order fuzzy perceptron learning algorithm.

The following two fuzzy IF-THEN rules given from human ex-perts are

If is small and is small

then belongs to class 1 (26)

If is very large or is very large

Fig. 4 displays the membership functions, which are adopted from [12] and coincide with our intuition, of the fuzzy num-bers “small” and “very large.” The fact that the pattern space is accounts for why the fuzzy IF-THEN rule (27) with the “or” connection can be converted into the following two rules with the “and” connection:

If is very large and is in

If is in and is very large

Initially, we trained the second-order fuzzy perceptron using only numerical data. Fig. 5 plots the simulation result with

(9)

Fig. 6. The simulation result by the FBP algorithm with only numerical data.

Fig. 7. The simulation result of learning with both numerical data and fuzzy IF-THEN rules by the proposed method.

after 100 epochs training. According to this figure, all the given patterns are correctly classified by the second-order fuzzy perceptron neural network. By using only the numerical data and iterating for 1000 epochs, Fig. 6 displays the result obtained from the FBP algorithm. The boundary curve in Fig. 6 is drawn by plotting all the points in the pattern space where the output values are 0.5. This algorithm classifies a test pattern with very large values of as class 1 [12]. The proposed scheme resolved this weakness because the parameters of a second-order percep-tron network are so flexible that the neural network can correlate well with the crisp data.

Fig. 8. The simulation result of Example 1 using the FBP algorithm.

Fig. 9. A simulation result of Example 1 using the AFLC approach.

Next, we trained the second-order fuzzy perceptron neural network using not only numerical data but also fuzzy IF-THEN rules. Fig. 7 depicts the best simulation result with

, and for 100 epochs. As this figure reveals, the discriminant boundary classifies all the given patterns and pre-cisely conveys the effects of the two linguistic rules. Among the 200 simulation results by the FBP approach, it is rare to obtain a satisfactory decision boundary. Fig. 8 presents the best decision boundary chosen from 200 FBP trials after 1000 epochs. For the AFLC approach, Fig. 9 summarizes the result of setting the

(10)

Fig. 10. A symmetric triangular fuzzy number.

Fig. 11. The simulation result of learning for fuzzy input data vectors by the proposed method. The rectangles are supports of fuzzy vectors.

and s are the chosen parameters of AFLC) [11]. However, the decision boundary obtained by other parameter settings is not as successful as that shown in Fig. 9.

Example 2: The classification power of the proposed method was examined as a nonlinear classification machine of fuzzy input vectors. The proposed method was applied to the fuzzy data [12]

class 1

(30) class 2

(31) where denotes a symmetric triangular fuzzy number, as shown in Fig. 10, with the center and the spread defined by the membership function

(32)

After 200 epochs and with , , , Fig. 11

summarizes the best simulation result of a second-order fuzzy perceptron neural network in which the nonlinear boundary curve is highly successful. In this figure, the hatched area and white area denote the supports of these two classes of fuzzy vectors, respectively. This same figure reveals that the given nonoverlapping fuzzy vectors are all correctly classified. In addition, the boundary curve crossing the overlapping area of the given fuzzy vectors was rather fair for both classes. After 1000 epochs of training, Fig. 12 shows the optimally classified simulation result of this example selected from the FBP approaches. Unfortunately, this satisfactory separating boundary is seldom observed during 200 design trials.

Fig. 12. The simulation result of Example 2 using the FBP approach. Example 3: Based on an example from [11], we designed a two-class classifier. The numerical training data consist of ten points from each of the two classes

class 1

(33) class 2

(34) In addition, the following three linguistic rules are used:

If is large then belongs to class 1 (35)

If is large then belongs to class 1 (36)

If is small then belongs to class 1 (37)

The membership functions for “large” and “small,” adopted from [11], are displayed in Fig. 13.

After 2500 epochs training of this example, FPNN with

, , and leads to the best result shown in

Fig. 14. Fig. 15 depicts the least error solution chosen from it-erating FBP for 12 000 epochs. Obviously, our approach yields a better boundary than that from the FBP. Good parameters set-ting of and all s and s equal to unity except for the AFLC scheme [11] produced the decision boundary of Fig. 16. Note that outlier training pattern, , is so atyp-ical that all the three methods cannot classify it correctly.

Example 4: In the following, we considered the three-class classification problem [12] of fuzzy vectors:

class 1

(38) class 2

(11)

Fig. 13. The membership functions of the linguistic values “small” and “large” in Example 3.

Fig. 14. The simulation result of Example 3 using the second-order fuzzy perceptron learning algorithm.

class 3

(40)

In this example, we set , and

for the FPNN. Fig. 17 presents the separating boundary chosen from those by the FPNN algorithm after 1000 learning epochs. The boundary curves are drawn to denote the points at which the larger two function values are identical. Namely, if the value assumes the maximum value at a point, then this point is labeled as class 1. This same figure also reveals that the

Fig. 15. The simulation result of Example 3 using the FBP algorithm.

Fig. 16. A simulation result of Example 3 using the AFLC approach. given nonoverlapping fuzzy vectors are correctly classified and the boundary curves crossing the overlapping areas are quite fair in minimizing the misclassified area of the overlapping fuzzy vectors. Fig. 18 displays the chosen separating boundary gener-ated by the FBP for 1000 epochs.

The above four examples demonstrate the effectiveness and flexibility of the FPNN. The insolvability of the AFLC on Examples 2 and 4 reveals that it is not as general as FPNN and FBP for such problems. The decision boundary produced by the AFLC approach is deterministic and its design time is shorter

(12)

Fig. 17. The simulation result of Example 4 using the second-order fuzzy perceptron learning algorithm.

Fig. 18. The simulation result of Example 4 using the FBP approach.

than the neural-based classifier since its design phase does not include a training procedure. On the other hand, the separating boundaries produced by FPNN or FBP are not deterministic in nature; their quality of classification is best assessed by statistical measures. The consistency and quality of solutions generated by FPNN and FBP algorithms are evaluated through the learning performance statistics on these four examples. For comparison, the subsequent subsection provides statistical

per-formance in terms of the quality of the discriminant boundary and the training times required by FPNN and FBP.

1) Recognition Rate and Speed Improvement of the FPNN Approach: As mentioned earlier, the FBP algorithm extends the backpropagation algorithm to cases involving inputs of fuzzy rules. The drawbacks of the BP algorithm such as converging to local minima and slow learning convergence still persist in this approach because it is a gradient-based technique. Previous investigations, although making some progress with respect to these defects [35]–[37], could not completely resolve these drawbacks. Moreover, determining the structure, i.e., the number of layers and hidden units in each layer, of a multilayer network is difficult and critical to a trial’s success [35], [38]. To our knowledge, no previous work has successfully determined exactly how many layers and nodes the network should have, thereby avoiding situations of over learning and over fitting. Hence, in the following comparison, we set up the FBP structure according to the recommendation of [12] for Examples 1, 2, and 4.

In the statistical tests of Examples 1–4 using FBP, a satis-factory solution was not frequently obtained. As noted before, Figs. 8, 12, 15, and 18, respectively, are the best classification boundaries chosen from 200 design trials of the FBP. In our ex-perience, whenever the final total error of a trained FBP network is roughly three times or larger than that of a satisfactory solu-tion, the separating boundary of the classifier markedly differs from those of satisfactory ones. In the sequel, the performance of the FPNN is compared with that of the FBP approach in terms of recognition rate and the training time required. In this com-parison, both algorithms were run on an HP model/712 work-station. For Examples 1, 2, and 4, FBP was performed for 1000 epochs; while for Example 3, it iterated for 12 000 epochs.

For FPNN and FBP approaches, Table I lists the average recognition rate and the average training time required over 200 design test trials with random and small initial weight vectors on these four examples, respectively. The entries of the recognition rate contain two terms. The first entry records the average crisp data recognition rate, called crisp recognition rate hereafter, and the second records the average percentage of correctly classi-fied area of fuzzy input data, referred as fuzzy recognition rate hereafter. In Examples 2 and 4, the entries of the crisp part are missing since the training patterns are only fuzzy input vectors. The average fuzzy recognition rates on these four examples by FBP are 60.5%, 92.9%, 36.7%, and 77.2%, respectively, leading to a recognition rate of 66.8% on the average of these four exam-ples. For the first three two-class problems, by our fuzzy percep-tron neural network, the fuzzy recognition rates are all superior to those obtained by FBP algorithm. Particularly for Example 3, the fuzzy recognition rate of FPNN substantially outperforms the FBP. Regarding the multiclass task of Example 4, the pro-posed method also excels in terms of the amount of progression. Satisfactory results are obtained by FPNN not only for two-class tasks but also for the multiclass problem. The same outcome can be found in Table I for the crisp recognition rate comparison. For AFLC algorithm, the best fuzzy recognition rates of Examples 1 and 3 (Fig. 9 and Fig. 16) are 61.9% and 99.7%, respectively. The training times needed for both networks were also recorded. As Table I indicates, the CPU training time required

(13)

TABLE I

THEAVERAGETRAININGTIMERATIOS ANDRECOGNITIONRATES ON200 TRIALS OFEXAMPLES1–4

TABLE II

MEANS ANDSTANDARDDEVIATIONS OFMISCLASSIFIEDAREAS FOR

EXAMPLES2AND4ON200 TRIALS

by FBP algorithm is significantly longer than ours. The ratios of central processing unit (CPU) training time needed for FPNN over that needed for FBP of the four examples are 7.5%, 8.4%, 16.5%, and 28.7%, respectively, and lead to a ratio of 15.3% average over these four examples.

Moreover, to quantitatively assess each scheme’s solution quality, the misclassified area by the separating boundary was computed for each design test trial. The misclassified area can be employed as an evaluation index for the solution consistency of these two networks. Table II lists the means and the standard deviations (SDs) for the misclassified areas of Examples 2 and 4 over the 200 trials. This table reveals that the mean and SD obtained from FPNN are all smaller than those from FBP. In addition, averaging these two examples indicates that the mean of the misclassified areas obtained from FBP is 2.78 times larger than that achieved with FPNN. For the SD of the misclassified areas, FBP is 32 times greater than FPNN. Such large values of the means and SDs of the misclassified areas explain the solutions inconsistency observed from numerous FBP trials above, and the infrequency of the satisfactory decision boundaries after these design trials. The superiority of the misclassified areas of FPNN over FBP partially accounts for why the recognition rate of FPNN is markedly exceeds that of FBP.

To assess the reliability of the qualified convergence of FPNN and FBP, we define the successful classification trial in the fol-lowing manner. For those examples with crisp and fuzzy input data, i.e., Examples 1 and 3, a test design trial is assumed to be a successful classification if all the crisp data are correctly classified. Note that the outlier training pattern in Ex-ample 3 is neglected regardless of weather it is correctly classi-fied or not. For those examples of only fuzzy rule input data, i.e., Examples 2 and 4, if all the nonoverlapping vectors are classi-fied correctly, then this test design trial is considered as a suc-cessful classification. If a trial satisfies the condition described

TABLE III

THEAVERAGESUCCESSFULCLASSIFICATIONRATIOS ON200 TRIALS OF

EXAMPLES1–4

above, then this trial is labeled as a successful one. The sum of all successful trials over the total trials, i.e., 200, gives the successful classification ratio. The results in Table III indicate that the successful classification ratios on these four examples by FBP are 89%, 34%, 32%, and 21%, respectively, and lead to a successful classification ratio of 44% on the average of these four examples. By our fuzzy perceptron neural network, a 100% successful classification ratio has been obtained by averaging these four examples.

Based on the comparison above, we can conclude that FPNN can lead to a much more reliable discriminant boundary con-sistently than that of FBP algorithm. The proposed approach cannot only produce a very high classification rate, but also take a much shorter learning time than that by the FBP approach. B. Simulation 2

The well-known two-spiral data set is a neural network benchmark problem for classification [34]. The training set consists of 194 points, half for each class. These training points are arranged in two interlocking spirals that go around the origin three times, as shown in Fig. 19(a) (“ ” points denote class 1 whereas “ ” points denote class 2). Note that our FPNN is a classifier of extending single-layer structure and is capable of providing second-order discriminant functions in a distributed manner. By a divide-and-conquer strategy, we divided the two-spiral data into subregions and these subregions can be suitably dichotomized by a set of elementary forms such as paraboloids and ellipsoids provided by the FPNN models. Namely, the concept of using a number of FPNNs to cover the divided subregions was adopted in this simulation. Accordingly, we divided the two-spiral patterns into a few subsets of regions. The size of the subregion of patterns is

(14)

(a)

(b)

Fig. 19. (a) The training points for the two-spiral problem and the regulation of dividing the patterns into a few subsets. (b) The processes of the coordinates transformation.

somewhat inversely proportional to the patterns’ density in the subregion. In this setting, the data contained in a subregion of the outer turn consists of a smaller number of patterns, while the subregion at the inner turn contains a larger number of patterns [see Fig. 19(a)] and there were 25 data subsets generated after this division and each of the 25 data subsets will be solved by an FPNN for this benchmark problem.

First, we would extract 25 sets of fuzzy IF-THEN rules, i.e., one set of rules for each data subset in a subregion. By trans-lation and rotation techniques, a new – coordinate system was assigned to each subregion so that the patterns contained in each subregion are more easily manageable for rule extraction. Observing, for example, the first outer data subset in this new

coordinates system, we can cluster these nine samples, as shown in Fig. 19(b), into three clusters through characterizing the pat-tern subset using IF-THEN rules concerning -coordinate fea-ture. For instance, the three samples in the middle can be speci-fied by a rule such as “If is medium, then it belongs to class 1.” Similarly, the upper and lower three samples can be spec-ified by the -coordinate being large and small, respectively. In this way, we use three linguistic terms, “small,” “medium,” and “large” of -coordinate for constructing the fuzzy rules for classification. For the definition of the membership functions, arithmetic means, of the -coordinate, of these three clusters were calculated and then used as the centers s of the corre-sponding symmetric triangular membership functions of (32).

(15)

The overlapping of the membership functions depends on the spread, i.e., the parameter of (32), chosen; and in this example, the spreads of the “small,” “medium,” and “large” fuzzy sets of -coordinate were chosen to be three times of the standard de-viations of the corresponding patterns, respectively. As to the membership function of -coordinate, because -coordinate can singly specify the characteristics of the data very effectively, membership function of the -coordinate is chosen to be al-most crisp. The minimal and maximal values of -coordinate

of all patterns in the subregion (denoted as and ,

respectively) with a small tolerance were chosen as the interval

range, for full membership function and

zero elsewhere. In this example, parameter was of the

value .

After the rule generation process above, the transformed pat-tern subset can be qualitatively described by the following three IF-THEN rules:

If is and is large then it belongs to class 2 If is and is medium then it belongs to class 1 If is and is small then it belongs to class 2 (41) FPNN was then used to classify the transformed samples, to-gether with these three IF-THEN rules generated. In a similar manner, each of the other 24 data subsets was transformed to its new and suitable coordinate system and then three IF-THEN rules in the same format as (41) were also extracted. Each data subset and its corresponding fuzzy rules were trained using an FPNN model. All the parameter settings for the FPNNs were the same as in the Simulation 1 and 1000 learning epochs were taken. After all the 25 data subsets have been respectively pro-cessed by FPNNs, the 25 final decision boundaries derived by FPNNs were shown in Fig. 20. These decision boundaries were finally combined by the “OR” operator. The output is assumed to be class 1 if at least one decision boundary indicates that it belongs to class 1; otherwise, it belongs to class 2. In this figure, 25 FPNNs were used to accomplish the whole classification task and 100% recognition rate was produced.

For comparison, the same 25 data subsets and their corre-sponding rules were also respectively applied to the FBP algo-rithm. The structure and learning epochs used for the FBP were the same as those used for Example 1 of Simulation 1, and have achieved 70.48% recognition rate. The AFLC was also tested on this benchmark problem in a similar manner. Using these 25 subsets along with their corresponding IF-THEN rules as in-puts, AFLC produced a high recognition rate of 97.43% under 25 sets of best-tuned parameters, , and , obtained by trial and error. Comparing the classification accuracy obtained for the two-spiral benchmark data, the best performance still goes to the proposed FPNN approach.

V. CONCLUSION

To address classification problems, this paper presents a fuzzy perceptron neural network which is capable of accepting two kinds of input data: fuzzy IF-THEN rules and numerical data. Incorporating the -level sets into the perceptron neural network effectively represents the linguistic values in fuzzy rules, thereby, the proposed neural network with fuzzy input

Fig. 20. The simulation result of the two-spiral problem using the FPNN approach.

handling ability is enhanced. At the -level of fuzzy numbers, a fuzzy perceptron learning procedure is derived. The minimum of the fuzzy discriminant function, obtained from the modified vertex method, determines weather a fuzzy perceptron learning update step is executed or not. The derived learning algorithm extends the conventional perceptron algorithm to fuzzy input vectors. Moreover, the fuzzy pocket algorithm is derived and then further incorporated into the fuzzy perceptron learning scheme to tackle nonseparable cases. Simulation results demonstrate that the proposed algorithm not only consistently yields an accurate and efficient solution, but also resolves the limitations of inaccuracy and slow learning convergence encountered in the FBP approach.

ACKNOWLEDGMENT

The authors would like to thank the referees for their valu-able comments and suggestions, which helped us to improve this paper.

REFERENCES

[1] L. A. Zadeh, “Fuzzy sets,” Inform. Contr., vol. 8, pp. 338–353, 1965. [2] , “Outline of a new approach to the analysis of complex systems

and decision processes,” IEEE Trans. Syst., Man, Cybern., vol. 3, pp. 28–44, Jan. 1973.

[3] , “The concept of a linguistic variable and its application to approx-imate reasoning—Parts I, II, III,” Inform. Sci., vol. 8, pp. 199–249, 1975. also, pp. 301–357; vol. 9, pp. 43–80.

[4] H.-J. Zimmermann, Fuzzy Set Theory and Its Applications. Norwell, MA: Kluwer, 1996.

[5] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy

Syn-ergism to Intelligent Systems.. Englewood Cliffs, NJ: Prentice-Hall, 1996.

[6] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa-tions by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.

(16)

[7] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing:

Explorations in the Microstructures of Cognition. Cambridge, MA: MIT Press, 1986, vol. 1. PDP Res. Grp..

[8] D. J. Montana and L. Davis, “Training feedforward neural network using genetic algorithms,” in Proc. 11th Int. Joint Conf. Artificial Intell., De-troit, MI, Aug. 1989, pp. 762–767.

[9] Y. Ichikawa and Y. Ishii, “Retaining diversity of genetic algorithms for multivariable optimization and neural network learning,” in Proc.

IEEE Int. Conf. Neural Networks, vol. 2, Nagoya, Japan, Mar. 1993,

pp. 1110–1114.

[10] W. Pedrycz, “Fuzzy sets in pattern recognition: Accomplishments and challenges,” Fuzzy Sets Syst., vol. 90, no. 2, pp. 171–176, 1997. [11] W. Wei and J. M. Mendal, “A fuzzy classifier that uses both crisp

sam-ples and linguistic knowledge,” in Proc. 3rd IEEE Conf. Fuzzy Syst., vol. 2, Orlando, FL, June 1994, pp. 792–797.

[12] H. Ishibuchi, R. Fujioka, and H. Tanaka, “Neural networks that learn from fuzzy IF-THEN rules,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 85–97, Feb. 1993.

[13] H. M. Lee and W. T. Wang, “A neural network architecture for classi-fication of fuzzy inputs,” Fuzzy Sets Syst., vol. 63, no. 2, pp. 159–173, 1994.

[14] J. L. Chen and J. Y. Chang, “Fuzzy perceptron learning and its applica-tion to classifier with numerical data and linguistic knowledge,” in Proc.

IEEE Int. Conf. Neural Networks, vol. 6, Perth, Australia, Nov. 1995, pp.

3129–3133.

[15] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, and classifi-cation,” IEEE Trans. Neural Networks, vol. 3, pp. 683–697, 1992. [16] S. Mitra and S. K. Pal, “Fuzzy multilayer perceptron, inferencing and

rule generation,” IEEE Trans. Neural Networks, vol. 6, pp. 51–63, 1995. [17] Y. Hayashi, J. J. Buckley, and E. Czogala, “Fuzzy neural network with fuzzy signals and weights,” Int. J. Intell. Syst., vol. 8, pp. 527–537, 1993. [18] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychol. Rev., vol. 65, pp. 386–408, 1958.

[19] M. Minsky and S. Papert, Perceptrons. Cambridge, MA: MIT Press, 1969.

[20] S. I. Gallant, “Perceptron-based learning algorithms,” IEEE Trans.

Neural Networks, vol. 1, pp. 179–191, 1990.

[21] , “Optimal linear discriminants,” in Proc. 8th Int. Conf. Pattern

Recogn., 1986, pp. 849–852.

[22] J. M. Keller and D. J. Hunt, “Incorporating fuzzy membership functions into the perceptron algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, no. 6, pp. 693–699, 1985.

[23] K. J. Schmucker, Fuzzy Sets, Natural Language Computations, and Risk

Analysis. Rockville, MD: Comput. Sci., 1984.

[24] D. Dubois and H. Prade, “Operations on fuzzy numbers,” Int. J. Syst.

Sci., vol. 9, no. 6, pp. 613–626, 1978.

[25] W. M. Dong and F. S. Wong, “Fuzzy weighted averages and implemen-tation of the extension principle,” Fuzzy Sets Syst., vol. 21, no. 2, pp. 183–199, 1987.

[26] G. Alefeld and J. Herzberger, Introduction to Interval

Computa-tions.. New York: Academic, 1983.

[27] R. E. Moore, Interval Analysis.. Englewood Cliffs, NJ: Prentice-Hall, 1966.

[28] J. J. Buckley and Y. Qu, “On using-cuts to evaluate fuzzy equations,”

Fuzzy Sets Syst., vol. 38, no. 3, pp. 309–312, 1990.

[29] R. Kruse, J. Gebhardt, and F. Klawonn, Foundations of Fuzzy

Sys-tems.. New York: Wiley, 1994.

[30] W. M. Dong and H. C. Shah, “Vertex method for computing functions of fuzzy variables,” Fuzzy Sets Syst., vol. 24, no. 1, pp. 65–78, 1987. [31] A. Roy, L. S. Kim, and S. Mukhopadhyay, “A polynomial time algorithm

for the construction and training of a class of multilayer perceptrons,”

Neural Networks, vol. 6, no. 4, pp. 535–545, 1993.

[32] G. S. Lim, M. Alder, and P. Hadingham, “Adaptive quadratic neural nets,” Pattern Recogn. Lett., vol. 13, no. 5, pp. 325–329, 1992. [33] R. L. Burden and J. D. Faires, Numerical Analysis.. Boston, MA:

PWS, 1993.

[34] K. J. Lang and M. J. Witbrock, “Learning to tell two spirals apart,” in Proc. 1988 Conf. Connectionist Models Summer School, 1988, pp. 52–59.

[35] P. Boldi and K. Hornik, “Neural networks and principle component anal-ysis: Learning from examples without local minima,” Neural Networks, vol. 2, no. 1, pp. 53–58, 1989.

[36] M. Gori and A. Tesi, “On the problem of local minima in backpropa-gation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 76–86, 1992.

[37] R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, no. 4, pp. 295–307, 1988. [38] E. D. Sontag, “Feedback stabilization using two-hidden-layer nets,”

IEEE Trans. Neural Networks, vol. 3, pp. 981–990, 1992.

Jia-Lin Chen received the B.S. degree in electrical

engineering from Feng-Chia University, Taichung, Taiwan, R.O.C. in 1994, and the M.S. degree in control engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1996. He is currently working toward the Ph.D. degree in the Department of Electrical and Control Engineering at National Chiao Tung University, Hsinchu, Taiwan.

His current research interests include pattern recognition, fuzzy theory, and neural networks.

Jyh-Yeong Chang received the B.S. degree in

control engineering in 1976 and the M.S. degree in electronic engineering in 1980, both from National Chiao Tung University, Taiwan, and the Ph.D. degree in electrical engineering from North Carolina State University, Raleigh, in 1987.

From 1976 to 1978 and 1980 to 1982, he was a Research Fellow at Chung Shan Institute of Science and Technology (CSIST), Taiwan. Since 1987, he has been an Associate Professor in the Department of Electrical and Control Engineering, National Chiao Tung University. His research interests include fuzzy sets and systems, image processing, pattern recognition, and neural network applications.