An on-line ICA-mixture-model-based self-constructing fuzzy neural network

(1)

An On-Line ICA-Mixture-Model-Based

Self-Constructing Fuzzy Neural Network

Chin-Teng Lin, Fellow, IEEE, Wen-Chang Cheng, and Sheng-Fu Liang

Abstract—This paper proposes a new fuzzy neural network

(FNN) capable of parameter self-adapting and structure self-con-structing to acquire a small number of fuzzy rules for inter-preting the embedded knowledge of a system from the given training data set. The proposed FNN is inherently a modified Takagi-Sugeno-Kang (TSK)-type fuzzy-rule-based model with neural network’s learning ability. There are no rules initiated at the beginning and they are created and adapted through an on-line learning processing that performs simultaneous structure and parameter identification. In the structure identification of the precondition part, the input space is partitioned in a flexible way according to the newly proposed on-line independent component analysis (ICA) mixture model. The input space is thus represented by linear combinations of independent, non-Gaussian densities. The first input training pattern is assigned to the first rule initially by the on-line ICA mixture model. Afterwards, some additional significant terms (input variables) selected by the on-line ICA mixture model will be added to the consequent part (forming a liner equation of input variables) incrementally or create a new rule in the learning processing. The combined precondition and consequent structure identification scheme can make the network grow dynamically and efficiently. In the parameter identification, the consequent parameters are tuned by the backpropagation rule and the precondition parameters are turned by the on-line ICA mixture model. Both the structure and parameter identifications are done simultaneously to form a fast learning scheme. The derived on-line ICA mixture model also provide a natural linear transformation for each input variable to enhance the knowledge representation ability of the proposed FNN and reduce the re-quired rules and achieve higher accuracy efficiently. In order to demonstrate the performance of the proposed FNN, several exper-iments covering the areas of system identification, classification, and image segmentation are carried out. Our experiments show that the proposed FNN can achieve significant improvements in the convergence speed and prediction accuracy with simpler network structure.

Index Terms—Backpropagation rule, Gaussian mixture model,

non-Gaussian mixture model, principal component analysis, Takagi-Sugeno-Kang (TSK) fuzzy rules.

Manuscript received January 9, 2004; revised June 16, 2004. This work was supported in part by the Ministry of Education, Taiwan, R.O.C., under Grant EX-91-E-FAOE-4–4 and Ministry of Economic Affairs, Taiwan, R.O.C., under Grant 93-17-A-02-S1-032. This paper was recommended by Associate Editor Y. Nishio.

C.-T. Lin and S.-F. Liang are with the Department of Electrical and Con-trol Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C., and also with the Brain Research Center, University System of Taiwan, Taipei 112, Taiwan, R.O.C. (e-mail: ctlin@mail.nctu.edu.tw; sfliang@mail.nctu.edu.tw).

W.-C. Cheng is with the Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C. (e-mail: wccheng@ mail.hit.edu.tw).

Digital Object Identifier 10.1109/TCSI.2004.840110

I. INTRODUCTION

I

N RECENT years, the fuzzy neural network (FNN) has found widely in industrial, commercial, and image pro-cessing applications that require the analysis of uncertain and imprecise information due to its nice merge of the fuzzy inference system (FIS) and neural network (NN), which are complementary technologies in the design of adaptive intelli-gent systems. FIS is a popular computing framework based on the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. With crisp inputs and outputs, FIS implements a nonlinear mapping from its input space to output space by a number of if-then rules. To build a FIS, we have to specify the fuzzy sets, fuzzy operators and the knowledge (rule) base. The selection of fuzzy if-then rules often relies on a substantial amount of heuristic observation to express proper strategy’s knowledge. However, it is difficult for human experts to ex-amine all the input-output data from a complex system to find a number of proper rules for the FIS.

Artificial neural network (ANN) learns from scratch by ad-justing the interconnections between layers. A valuable property of ANN is that of generalization, whereby a trained network is able to provide a correct matching in the form of output data for a set of previously unseen input data. For constructing an ANN, the user needs to specify the architecture and learning al-gorithm. Learning mechanism of ANN does not rely on human expertise. Due to the homogenous structure of ANN, it is dif-ficult to extract structured knowledge from either the weights or the configuration of the ANN. For many practical problems,

a priori knowledge is usually obtained from human experts and

it is more appropriate to express the knowledge as a set of fuzzy if-then rules. However, it is not easy to encode prior knowledge into an ANN.

To cope with the respective difficulties generated by ANN and FIS, integrating them into a functional system, i.e., FNN, has attracted the growing interest of researchers due to the growing need of adaptive intelligent systems to meet the real world re-quirements. The key advantage of the FNN approach over tra-ditional ones lies on that the former doesn’t require a mathe-matical description of the system while modeling. Moreover, in contrast to pure ANN or FIS methods, the FNN possesses both of their advantages; it brings the low-level learning and com-putational power of ANN into FIS and provides the high-level human-like thinking and reasoning of FIS into ANN [1]–[5]. The FNN solves the problems successfully which are encoun-tered in many areas such as control, communications, pattern recognition, etc. [6]–[9].

(2)

Fig. 1. Fuzzy partitions of two-dimensional input space. (a) Grid-type partitioning. (b) If-then rules based on grid-type partitioning. (c) Clustering-type partitioning. (d) If-then rules based on clustering-type partitioning.

number of generated fuzzy rules. Efficient partition of input-output data may result in faster convergence and better perfor-mance for FNN. The most direct way is to partition the input space into grid types and each grid represents a fuzzy if-then rule [see Fig. 1(a)]. It is called grid-based partitioning. The major problem of such kind of partition is that the number of fuzzy rules increases exponentially if the number of input variables or that of partition increases. This is the so-called problem of curse of dimensionality. To cope with this problem, a clustering-based partition is employed which does reduce the number of gen-erated rules [10]–[13]. The cluster-based algorithm provides a more flexible way for space partition to avoid drastic increase of fuzzy rules and thus generates the corresponding rule base with appropriate number of rules. For example, by observing the projected membership functions in Fig. 1(c), although the number of membership functions in Fig. 1(d) is more than that in Fig. 1(b), there are only five rules in Fig. 1(d); however, there are nine rules in Fig. 1(b). By observing the projected mem-bership functions in Fig. 1(c), we find that some memmem-bership functions projected from different clusters have high similarity degrees. These highly similar membership functions should be combined to reduce the number of membership functions.

There are several methods for input space partitioning, which are to cluster the input training vectors in the input space, such as Kohonen learning rule, hyperbox method, product-space par-titioning, fuzzy c-mean method, EM algorithm, etc. [15]–[18]. These methods are based on Gaussian membership functions. In general, the observed data can be categorized into several mu-tually exclusive classes [20], and the data in each class can be modeled as multivariate Gaussian, called the Gaussian mixture model (GMM). GMMs are widely used throughout the fields of machine learning and statistics. Despite their popularity, GMMs suffer from several serious drawbacks [14]. One major

draw-back is that if the dimension of the problem space increases, the size of each covariance matrix, , becomes prohibitively large. This problem has been solved by Tipping and Bishop [21] who replaced each Gaussian with a probabilistic principal component analysis (PCA) model. This allowed the dimension-ality of each covariance to be effectively reduced while main-taining the richness of the model class. However, some recent research approaches try to reduce the information redundancy by capturing the statistical structure in observed data that is be-yond second-order information. Independent component anal-ysis (ICA) is a technique that exploits higher order statistical structure of the data. This method has recently gained atten-tion due to its successful applicaatten-tions to signal processing prob-lems including speech enhancement, discrete signal processing, image processing, etc. The goal of ICA is to linearly transform the data such that the transformed variables are as statistically independent from each other as possible [22]–[26]. This means that the value of any one of the components gives no informa-tion on the values of the other components. Basically, it finds directions in the input space which lead to independent com-ponents instead of just uncorrelated ones, as PCA does, so it reduces not only the number of rules but also the number of membership functions under a prespecified accuracy require-ment dynamically.

Another drawback of GMMs is that it is based on Gaussian functions. In some situation, it could not be separated from each other. It is generalized by assuming the data in each class are generated by a linear combination of independent non-Gaussian sources [12], [14], [19], [27]. This model is called the ICA mix-ture model. This allows modeling of classes with non-Gaussian structure; e.g., platykurtic or leptokurtic probability density functions are used for learning and the gradient ascent method is used to maximize the log-likelihood function. In previous ap-plications, this approach showed improved performance in data classification problems [28] and in learning efficient codes for representing different types of images [12], [19]. The advantage of this model is that it provides greater flexibility in modeling structure and in finding more features compared with GMMs or standard ICA algorithms. Although the ICA mixture model has many advantages in data clustering, the proper number of clusters should be given beforehand. Once the cluster number is determined, we have to stick to it until independent axes are obtained. In reality, the correct or proper number of clusters is usually unknown, and improper assignment of cluster number will affect the representation of learned independent axes a lot. Moreover, the existing ICA mixture model scheme is only suitable for off-line instead of on-line operation. Hence, to adopt this scheme, a large amount of representative data should be collected in advance, and the learning of ICA mixture model usually spends a lot of time through trial and errors.

To attack the aforementioned problems, in this paper we derive an on-line ICA mixture model to provide better and on-line partitioning of the input-output space for FNN, and propose a novel FNN model called ICA-mixture-model-based self-constructing FNN. This FNN can grow its structure and tune its parameters on the fly efficiently based on the derived on-line ICA mixture model. Several experiments covering the areas of system identification, classification, and image

(3)

seg-mentation have been carried out based on the proposed FNN. These experiments show that the proposed FNN can achieve significant improvements in convergence speed and prediction accuracy.

II. PROPOSEDON-LINEICA MIXTUREMODEL The ICA mixture model is an unsupervised classification al-gorithm derived by modeling observed data as a mixture of sev-eral mutually exclusive classes that are described by linear com-binations of independent, non-Gaussian densities [27]. It is used for learning a complete set of basis functions and these basis functions can be learned simultaneously.

Assume that the data are drawn

independently and are to be clustered into the total number of classes, , where is assumed to be known in advance, is the total number of data vectors, and each data vector is -dimensional. The component densities are non-Gaussian and the data within each class are presented by

(1) where is a scalar matrix, is the bias vector for class , and is called the source vector, i.e., the coefficients for each basis function, where and are the dimensions of the input vector and the source vector , respectively. For simplicity, we consider the case where the number of sources is equal to the number of linear combinations. According to the values of , and , there are ways for presenting . However, we assume mutually exclusive classes and maximum-likelihood estimation results in one model that fits the data the best [16].

The likelihood of data is given by the joint density as

(2)

where the mixture density is

(3)

where denotes the class . The goal of ICA mixture model algorithm is to determine the parameters for each class, , by using the maximum log-likelihood method. Therefore, the rule to update the basis function for each class can be written as

(4) where the log-likelihood of the data for each class is

(5) and the probability for each class given the data vector is

(6)

The updating rule for the basis terms is

(7)

where is the data index .

Furthermore, for the automatic switching between super-Gaussian and sub-Gaussian models, a switching ma-trix shown in (9) can be used; i.e., source distributions are more peaked or less peaked than the Gaussian

(8)

where is dimensions of the source, is the th dimen-sion of the source in the th class, and is an indicator which allows for automatic switching between super-Gaussian and sub-Gaussian models

(9) The above ICA mixture model is good for clustering, but it requires that a correct or proper cluster number should be given in advance for a set of training data, which is usually unknown in reality. To make the choice of proper cluster number automatic and to let the ICA mixture model useful for on-line clustering, an on-line ICA mixture model is first derived in this section. In this model, there is no cluster initially. When the first data vector is fed into it, the first cluster is generated. Then for the following incoming data vector (pattern), the on-line ICA mix-ture model will determine if this pattern belongs to the first (or existing) cluster or another new cluster should be generated to accommodate this new pattern. To make this decision, we let the log-likelihood value calculated in (5) to represent the degree to which the newly incoming pattern belongs to the th cluster,

i.e., . Then we define

(10)

where the superscript is the index for the maximum log-likelihood value among all log-log-likelihood values and is the total number of clusters at time . If , the cor-responding new incoming pattern is added to the existed cluster with index and the parameters of this cluster are updated properly, where is a given threshold value. In this case, no new cluster is generated. If , a new cluster will be generated to accommodate this new pattern. The threshold value is obtained empirically and it is a negative value.

(4)

In the rest of this section, we shall derive the details of the updating rules for the proposed on-line ICA mixture model. As-sume the number of clusters at time is . Then, the mixture probability at time is

(11) Therefore, the posterior probability is

(12) where is the prior probability at the preceding time step, which can be obtained by former calculation result of the th cluster. Hence, at this moment can be calculated by the following:

(13) Using the above results, we can obtain the following updating rules for the parameters of each cluster, including basis matrix , mean , and the criterion of data distribution

( , and ) that determine if the

distribution of data is super-Gaussian or sub-Gaussian with the previous calculation results. They are defined as follows:

(14)

By substituting the term, , into (14),

we can rewrite (14) as follows:

(15) Let be defined as the function of criterion which al-lows for automatic switching between super-Gaussian and sub-Gaussian models and then (9) can be further derived as

(16)

Fig. 2. Structure of the proposed on-line ICA-mixture-model-based FNN.

where

(17) Finally, the independent axes representing the axis of the th cluster can be obtained by the following updating rule:

(18) III. STRUCTURE OF THE ON-LINE ICA

MIXTURE-MODEL-BASEDFNN

In this section, a novel self-constructing FNN is developed based on the on-line ICA mixture model derived in the last sec-tion. The structure of the proposed FNN is shown in Fig. 2. This five-layered network realizes a FIS of the following form:

Rule : is and and is

is (19)

where the current input data vector is is

the number of dimension, is a fuzzy set, is the center of a symmetric membership function on , and is a consequent parameter. It is noted that unlike the traditional TSK model

(5)

where all the input variables are used in the output linear equa-tion, only the significant ones are used in the proposed FNN; i.e., some ’s in the above fuzzy rules are zero.

The FNN consists of nodes, each of which has some finite “fan-in” of connections represented by weight values from other nodes and “fan-out” of connections to other nodes. As-sociated with the fan-in of a node is an integration function , which serves to combine information, activation, or evidence from other nodes. This function provides the net input for this node

(20)

where are inputs to this node and

are the associated link weights. The superscript in (20) indicates the layer number. This notation will also be used in the following equations. A second action of each node is to output an activation value as a function of its node input

(21) where denotes the activation function. We shall next de-scribe the functions of the nodes in each of the five layers of the proposed FNN.

Layer 1: No computation is done in this layer. Each node in

this layer, which corresponds to one input variable, only trans-mits input values to the next layer directly. That is,

(22) From the above equation, the link weight in layer one is unity.

Layer 2: Each node in this layer corresponds to one

lin-guistic value (“small”, “large”, etc.) of one of the input vari-ables in Layer 1. In other words, the membership value which specifies the degree to which an input value belongs a fuzzy set is calculated in Layer 2. In contrast to the types of member-ship functions used normally, such as triangular, trapezoidal, or Gaussian functions, the membership functions are determined by the on-line ICA mixture model in the proposed FNN.

In this layer, the output from Layer 1 is projected into the independent axes obtained by the on-line ICA mixture model (as shown in Fig. 3) such that

(23)

where and are the basis matrix and

mean vector, respectively, determined by the on-line ICA

mix-ture model, , and is the number of

clus-ters at time . That is, if the input data are classified into clusters, the number of learned fuzzy rules will be .

With the choice of non-Gaussian membership function, the operation performed in this layer is

(24)

Fig. 3. Input space transformation by the on-line ICA mixture model in the structure learning of the proposed FNN. (a) The regions covered by the original axes. (b) The regions covered by the independent axes obtained by the on-line ICA mixture model.

where

for super-Gaussian

for sub-Gaussian (25) where is the transformed value of the th term of the th input variable . The transformation can be regarded as a linear combination of the original variables. With the transformation of input coordinates, the rule format in (19) should be modified as Rule : is and is and is is (26)

where the th element of is the

trans-formation matrix for rule , and are the newly gen-erated input variables and it is called the sources in ICA. The linguistic implication is now implicated by the new vari-able , which is a linear combination of the original variables. After transformation, the region that the membership functions cover is shown in Fig. 3(b). It is observed that the membership functions cover distribution of transformed data well, and thus a single fuzzy rule can associate this region with its proper output region (consequent).

Layer 3: A node in this layer represents one fuzzy rule and

performs precondition matching of a rule. Here, we use the fol-lowing AND operation for each Layer-2 node

(6)

The link weight in Layer 3 is unity. The output of a Layer-3 node represents the firing strength of the corresponding fuzzy rule.

Layer 4: This layer is called the consequent layer. Two types

of nodes are used in this layer and they are denoted as blank and shaded circles in Fig. 2, respectively. The node denoted by a blank circle (blank node) is the essential node representing a fuzzy set (described by a membership function) of the output variable. Different nodes in Layer 3 may be connected to the same blank node in Layer 4, meaning that the same consequent fuzzy set is specified for different rules. As to the shaded node, each node in Layer 3 has its own corresponding shaded node in Layer 4. One of the inputs to a shaded node is the output delivered from Layer 3 and the other inputs (terms) are the input variables from Layer 1. Combining these two types of nodes in Layer 4, we obtain the whole function performed by this layer as

(28) where is the center of output membership function and is the corresponding parameter.

Layer 5: Each node in this layer corresponds to one output

variable. The node integrates all the actions recommended by Layer 4 and acts as a defuzzifier with

(29) IV. LEARNINGRULES OF THE ON-LINE ICA

MIXTURE-MODEL-BASEDFNN

Two types of learning, structure and parameter learning, are used concurrently for constructing the proposed on-line ICA-mixture-model-based FNN. The structure learning in-cludes both the precondition and consequent structure iden-tification of a fuzzy if-then rule. Here precondition structure identification corresponds to the input-space partitioning and can be formulated as a combinational optimization problem with two objectives: to reduce the number of rules generated and to reduce the number of fuzzy sets on the universe of discourse of each input variable. As to the consequent structure identification, the main task is to decide when to generate a new membership function for the output variable and which signifi-cant terms (input variables) should be added to the consequent part (a linear equation) when necessary. In our system, we use the on-line ICA mixture model to realize the precondition and consequent structure identification of the proposed FNN.

For the parameter learning based on unsupervised and super-vised learning algorithms, the parameters of the linear equa-tions in the consequent parts are adjusted by the backpropa-gation rule to minimize a given cost function. The parameters in the precondition part are adjusted by the on-line ICA mix-ture model. The FNN can be used for normal operation at any time during the learning process without repeated training on the input-output patterns when on-line operation is required. There

Fig. 4. Flowchart of the learning algorithm for the proposed FNN.

are no rules (i.e., no nodes in the network except the input-output nodes) in this network initially. They are created dynamically as learning proceeds upon receiving on-line incoming training data by performing the following learning processes simultaneously as shown in Fig. 4. In this figure, learning processes (1) and (2) belong to the structure learning phase and process (3) belongs to the parameter learning phase. In the rest of this section, the details of these learning processes are described in details.

A. Structure Learning by the On-Line ICA Mixture Model Algorithm

The way the input space is partitioned determines the number of rules extracted from training data as well as the number of fuzzy sets on the universal of discourse of each input variable. For each incoming pattern, the firing strength of a rule (i.e., the output of each layer2-node of the proposed FNN) can be inter-preted as the degree that the incoming pattern belongs to the corresponding cluster. In other words, we can use the log-like-lihood value calculated in (5) to represent the degree to which the newly incoming pattern belongs to the th cluster, i.e., . Then, according to the on-line ICA mixture model [see (10)] derived in Section II, we can deter-mine if a new cluster (i.e., new rule) should be generated (i.e., grow the network). This process is applied to both the input space and output space partitioning (clustering) simultaneously but individually. The detailed algorithms, called input space par-titioning and output space parpar-titioning, are given below.

Algorithm of Input Space Partitioning

IF is the first incoming pattern THEN do

PART 1. Generate a new rule with center , set the parameters

, and , where .

ELSE for each newly incoming pattern , do

PART 2. Find ,

IF ,

do the parameters updating steps of the on-line ICA mixture model derived in

Section II. ELSE

(7)

generate a new fuzzy rule with and set the parameters of

the new rule (cluster) as in PART 1.

Algorithm of Output Space Partitioning

IF there is no output cluster

do PART 1 in Input Space Partitioning Algorithm, with replaced by

ELSE

do Find ,

IF ,

connect input cluster to the existing output cluster ,

ELSE

generate a new output cluster as the ELSE part of PART 2 of the Input Space

Partitioning Algorithm, and connect input cluster to this new output

cluster.

In the above algorithms, the threshold determines how many rules (clusters) will be generated in the input (output) space, where and should be negative since they are taken in natural log. For a larger value of , more rules will be generated. The generation of a new input cluster corresponds to the generation of a new fuzzy rule, with its precondition part constructed by the input space partitioning algorithm in the above. At the same time, the above output space partitioning al-gorithm will decide the consequent part of the generated rule. The algorithm is based on the fact that different preconditions of different rules may be mapped to the same consequent fuzzy set. Since only the center of each output membership function is used for defuzzification, the consequent part of each rule may simply be regarded as a singleton. Compared to the gen-eral fuzzy rule-based models with singleton output where each rule has its own individual singleton value, fewer parameters are needed in the consequent part of the proposed FNN, especially for the case with a large number of rules.

B. Parameter Learning by the On-Line ICA Mixture Model and Backpropagation Algorithms

After the network structure is adjusted according to the current training pattern, the network then enters the parameter identification phase to adjust the parameters of the network optimally based on the same training pattern. Notice that the following parameter learning is performed on the whole network after structure learning, no matter whether the nodes (links) are newly added or are existent originally. The idea of backpropagation is used for this supervised learning. Consid-ering the single-output case for clarity, our goal is to minimize the error function

(30)

where is the desired output and is the current output. For each training data set, starting at the input nodes, a forward pass is used to compute the activity levels of all the nodes in the network to obtain the current output . Then, starting at the output nodes, a backward pass is used to compute

for all the hidden nodes. Assuming that is the adjustable parameter in a node (e.g., in the FNN), then the general updating rule used is

(31) where is the learning rate and

(32) To show the updating rules, we shall show the computations of and update the parameter . First, we start the derivation from the output nodes. The error signal , which needs to be computed and propagated, is derived by

(33) The updating rule for is

(34)

(35) Hence, the parameter is updated by

(36) (37) For the parameters and in Layer 2 of the proposed FNN, their updating rules can be determined by the proposed on-line ICA mixture model based on statistical independence under the constraint of minimizing the error function in (30). The on-line ICA mixture model with constraint is formulated as follows:

(38) The problem of (38) is expressed as a constrained optimization problem which can be solved through the use of an augmented Lagrangian function. Hence, (18) can be rewritten as

(8)

Fig. 5. Input training patterns of the target dynamic system. (b) The training procedure is performed for 1000 time steps. (c) Error function. (d) Simulation results of the FNN after 50 000 time steps. The dotted line denotes the output of the FNN and the solid line denotes the actual output.

where is the learning rate for maximizing the log likelihood, is the Lagrangian parameter, and

..

. . .. . ..

..

. . .. . ..

(40)

For each element of the above matrix, we compute

(41)

(42)

Substituting (42) into (41), we get the final updating rule for . Similar approach can derive the updating rule of .

V. EXPERIMENTS

To verify the performance of the proposed FNN, several ex-periments are presented in this section. The exex-periments cov-ering the areas of system identification, classification, and image segmentation are carried out and show that the proposed FNN can achieve significant improvements in the convergence speed and prediction accuracy.

A. Identification of Dynamic Systems

In this experiment, the proposed FNN is used to identify a dynamic system:

(43) Since both the unknown plant and the FNN are driven by the same input, it adjusts itself with the goal of causing the output of the identification model to match that of the unknown plant. Upon convergence, their input-output relationship should match.

Example 1: The plant to be identified is guided by the

differ-ence equation

(44) The output of the plant depends nonlinearly on both its past values and inputs, but the effects of the input and output values are additive. In applying the FNN to this identification problem,

the used learning parameters are , and

, where and are the threshold parameters used in the input and output clustering processes, respectively.

The training patterns are generated with .

(9)

Fig. 6. (a) Prediction results of the FNN after training. The dotted line denotes the output of the FNN and the solid line denotes the actual output. (b) Prediction errors for testing.

TABLE I

INFLUENCE OF THEPARAMETERSF ANDON THEPERFORMANCE OF THE

PROPOSEDFNNAND THERESULTING(rms ERROR, NUMBER OFRULES)

training is performed for 1000 time steps [see Fig. 5(b)]. After training, three input and three output clusters are generated. Fig. 5(d) shows the outputs of the plant and the identification model after 50 000 time steps. In this figure, the outputs of the FNN are presented as the dotted curve while the plant outputs are presented as the solid curve. Since perfect identification re-sult is achieved with our network, no additional terms need to be added to the consequent part.

In the above simulation, the parameters and need to be selected in advance. To give a clear understanding of the influence of these parameters on the structure and performance, different values of them are tested. For convenience, and are assigned to the same value. The generated network structure and corresponding root mean square (rms) errors and the number of rules are listed in Table I. From Table I, we can see that in certain ranges of the parameters, the rms error has no much change. According to our experiment, a higher value of will increase the number of rules, but it is not necessarily reducing the rms error.

Example 2—Mackey–Glass Chaotic Time Series Prediction:

We apply the proposed FNN to the Mackey–Glass time series prediction problem, which has been used in many studies in

the FNN or NN communities. The Mackey–Glass time-delay differential equation is defined by

(45)

where and in our experiment. When

and , we have a nonperiodic and nonconvergent time series as shown in Fig. 6(a). Now we want to build an FNN that can predict from the past values of this time series

including, , and . Therefore,

the input data format is

and the output is . From to , we collect

1000 data pairs. The first 500 data pairs are used for training while the others are used for testing. In applying the proposed FNN to this prediction problem, the used learning parameters

are , and . Fig. 6(a) shows

the testing results of the FNN after training. The dotted line denotes the out of the FNN and the solid line denotes the actual output. The prediction errors by the proposed FNN are shown in Fig. 6(b). The average prediction error over 30 runs proposed FNN was 0.032, which was smaller than the prediction error (0.034) of the cooperative neural network ensembles presented in [29].

B. Experiments on Data Classification

In this section, four well-known benchmark data sets in classification—the iris data set, the Wisconsin breast cancer data set, the wine classification data set and Australian data set are used to evaluate the performance of the proposed FNN. These data sets are available from the University of California, Irvine, via an anonymous ftp site: ftp.ics.uci.edu/pub/ma-chine-learning-databases [30].

(10)

Fig. 7. The resultant membership functions of the proposed FNN with respect to Iris data. (a) The input vectors inx domain. (a) The independent vectors in s domain and the membership functions of rule 1. (c) The independent vectors ins domain and the membership functions of rule 2.

Example 1—Iris Data: The Fisher–Anderson iris data

con-sist of four input measurements, sepal length (sl), sepal width (sw), petal length (pl), and petal width (pw), on 150 specimens of iris plant. Three species of iris are involved, Iris Sestosa, Iris Versiolor, and Iris Virginica, and each species contains 50 instances. To evaluate the effectiveness of the proposed FNN, 25 instances from each species were randomly selected as the training set and the remaining instances were used as the testing set. To perform classification, the output of our system was used with the following classification rule:

Iris

Sestosa if Versiolor if Virginica if

(46)

We set the threshold and learning rate is

for the clustering algorithm. After learning, three clus-ters were revealed, so our structure consisted of three fuzzy rules and there are three fuzzy term sets for each input variable. Next, the parameter learning algorithm proceeds to fine tune the net-work to achieve a better performance. Fig. 7 shows the resul-tant memberships for illustration. We only plot two of the three classes Iris data and two of the four dimensions for simplicity. Fig. 7(a) shows the input vectors that defined in the space. Figs. 7(b) and (c) present the independent space obtained by the on-line ICA-mixture-model. and denote the member-ship functions of Rule 1. and denote the membership functions of Rule 2. According to Fig. 7(b), we can find that the membership functions of Rule 1 can match Class “x” well

after the transformation through ICA; in the meantime, Class “o” is pushed away from the membership functions of Rule 1, which makes the fire strength of Class “o” to Rule 1 be almost zero. Similarly, according to Fig. 7(c), the membership func-tions of Rule 2 can match Class “o” well after the transformation through ICA; in the meantime, Class “x” is pushed away from the membership functions of Rule 2. Consequently, we can find that the membership functions among these rules are not over-lapped with each other.

We can also use this simplified example (two classes and two dimensions) as shown in Fig. 7 to discuss the semantics of the obtained fuzzy rules in the proposed model. These two resultant rules in the independent space can be represented as (47), shown at the bottom of the next page. After the transformation of (the inverse of transformation matrix), according to Fig. 7(a), the fuzzy rules shown in (47) can be modified and presented in the input space with linguistic implications as (48), shown at the bottom of the next page. It is observed that the major difference between (47) and (48) is the space of the input vectors in precondition of fuzzy rules, but their consequent of the fuzzy rules are the same. The firing strength of fuzzy rules in (48) can be represented by the firing strength of fuzzy rules in (47), because the transformation is one-to-one mapping.

Table II shows the comparison of the proposed FNN with different iterations and Table III shows the contrast of the pro-posed FNN with different threshold values for iris classifi-cation and the results are the average of ten different training

(11)

TABLE II

PERFORMANCE OF THEPROPOSEDFNN WITHDIFFERENTITERATIONS ON THEIRISDATACLASSIFICATIONPROBLEM

TABLE III

PERFORMANCECOMPARISONS OF THEPROPOSEDFNN WITHDIFFERENT

THRESHOLDVALUESF ON THEIRISDATACLASSIFICATIONPROBLEM

TABLE IV

PERFORMANCECOMPARISONS OFVARIOUSCLASSIFIERS ON THEIRISDATACLASSIFICATIONPROBLEM

and testing sets. In this example, the iteration is defined as 100. A higher value of results in a larger rule number. It means the number of rules is increased or decreased depending on the parameter . Because the iris data consist of three-cluster pat-terns, the average testing classification rate of the FNN with three fuzzy rules is the highest. Table IV shows the comparison of the classification results of our FNN and other fuzzy classi-fiers on iris data.

Example 2—Wisconsin Breast Cancer Diagnostic Data: The

Wisconsin Breast Cancer Diagnostic data set contains 699 pat-terns distributed into two output classes, “benign” and “malig-nant.” Each pattern consists of nine input features: clump thick-ness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chro-matin, normal nucleoli, and mitoses. In this data set, 458 pat-terns are in the Benign class and the other 241 patpat-terns are in the Malignant class. Since there are 16 patterns containing missed values, we used 683 patterns to evaluate the performance of the proposed FNN. To compare the performance with other classi-fiers, half of the 683 patterns were used as training set and the remaining patterns were used as the testing set. The data set was normalized to the range [0, 1]. We classified the output of the structure using the following classification rule:

Breast Cancer Bengin if

Malignamt if (49)

TABLE V

PERFORMANCE OF THEPROPOSEDFNN WITHDIFFERENTITERATIONS ON THE

WISCONSINBREASTCANCERDATACLASSIFICATIONPROBLEM

TABLE VI

PERFORMANCECOMPARISONS OF THEPROPOSEDFNN WITH

DIFFERENTTHRESHOLDVALUESF ON THEWISCONSIN

BREASTCANCERDATACLASSIFICATIONPROBLEM

TABLE VII

PERFORMANCECOMPARISONS OFVARIOUSCLASSIFIERS ON THE

WISCONSINBREASTCANCERDATACLASSIFICATIONPROBLEM

We set the threshold and learning rate

for training. Two clusters were revealed in the final learning process. The learned structure consisted of 2 fuzzy rules and 2 fuzzy terms when the iteration is set for 10 for each input feature. In the same situation, when we set the iteration to be 100, the learned structure will be change into 4 fuzzy rules and 4 fuzzy terms for each input feature. We repeated the exper-iment on 10 different training sets (see Table V). In Table V, we can find that when the iteration number increases, the number of rules will increase. The situation may occur in updating the inde-pendent axes, . With the different transformation of input co-ordinates, the number of rules may be changed. Table VI shows the comparison of the proposed FNN with different thresholds for the breast cancer data classification. Table VII shows the comparison between the learned structure models and other fuzzy, neural-network, and neuro-fuzzy classifiers on the same target problem. It shows that the recognition rate of our pro-posed FNN outperforms the listed classifiers.

Example 3—Wine Classification Data: The wine

classi-fication data set contains 178 wines that are brewed in the same region of Italy but derived from three different culti-vars. Each pattern consists of 13 continuous features: alcohol, malic acid, ash, alkalinity of ash, magnesium, total phenols, flavonoids, nonflavonoid phenols, proanthocyanins, color ''

'' (47)

''

(12)

TABLE VIII

WINEDATACLASSIFICATIONPROBLEM

intensity, hue, OD280/OD315 of diluted wines and proline. Corcoran et al. [41] applied a real-coded genetic-based method to learn 60 nonfuzzy if-then rules from 178 patterns and used a population of 1500 individuals for 300 generations with full replacement. Ishibuchi et al. [40] proposed an integer-coded GA and grid-partitioning to design a fuzzy classifier with 60 fuzzy rules from the 178 patterns. They used a population of 100 individuals and applied for 1000 generations with full replacement. Setnes et al. [39] applied a real-coded GA and c-means clustering algorithm on all the available 178 patterns to design a TSK model as a classifier. Nine features were selected during their proposed simplification and optimization process. In [35], the MCA clustering algorithm was proposed to solve this problem. With our scheme, three clusters were revealed in the final learning process. After applying the parameter learning process for five epochs, the classification error was reduced to zero. The comparison between our classifier and the above-mentioned fuzzy classifiers are shown in Table VIII.

Example 4—Australian Credit Approval Data: This dataset

contains 690 patterns distributed into two output classes. Each pattern consists of 14 (6 Continuous and 8 Categorical) input features. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. This dataset is interesting because there is a good mix of attributes: continuous, nominal with small numbers of values, and nom-inal with larger numbers of values. There were orignom-inally a few missing values, but these have all been replaced by the overall median. We classified the output of the structure using the fol-lowing classification rule:

Australian (50)

We set the threshold and learning rate

for training. After structure learning, our structure consisted of two fuzzy rules. The results with the tenfold cross validation is showed in Table IX. According to the experimental results, the proposed FNN with only two fuzzy rules can reach higher accuracy than other methods.

Since the tenfold cross-validation testing model would pro-duce more reliable results, the testing results of the proposed FNN on the Iris, Wine, Wisconsin and Australian datasets with the testing model of tenfold cross validation is presented in Table X. These experimental results show that, given a reason-able number of seed clusters, the proposed FNN is capreason-able of automatically identifying the true cluster configuration. Hence, the proposed recursive on-line ICA mixture model can further reduce the number of required rules and achieve better system performance.

TABLE IX

AUSTRALIANDATACLASSIFICATIONPROBLEMWITH THE

TESTINGMODEL OF10-FOLDCROSS-VALIDATION

TABLE X

TESTINGRESULTS OF THEPROPOSEDFNNON THEIRIS, WINE, WISCONSIN ANDAUSTRALIANDATASETSWITH THE

TESTINGMODEL OFTENFOLDCROSSVALIDATION

Fig. 8. Texture segmentation. (a) Texture of four different materials: (top-left) herringbone weave, (top-right) woolen cloth, (bottom-left) denim, (bottom-right) raffia. (b) The labels found by the proposed FNN are shown in different grey levels. The misclassified patches of size 102 10 pixels are shown from the square region of the texture.

C. Unsupervised Image Classification and Segmentation

In this section, we applied our FNN to learn multiple classes in a single image. The learned classes are mutually exclusive and the whole image is divided into small image patches for classification. Three experiments were performed to illustrate how the algorithm can identify textures in an image. In the first experiment, four texture images were taken and merged into one image. Fig. 8(a) shows the textures of four different ma-terials: (top-left) herringbone weave, (top-right) woolen cloth, (bottom-left) denim, (bottom-right) raffia. Each of the texture image size is 200 200 pixels. Four classes of training patterns were adopted by randomly sampling 10 10 pixel patches from each texture image; i.e., no label information was taken into ac-count. We classify the output using the following classification rule: Class herringbone weave if woollen cloth if denim if raffia if (51)

The automatic classification results of the image as shown in Fig. 8(b) was done by dividing the image into adjacent

(13)

nonover-Fig. 9. Example of text extraction: The 52 5 pixel image patches were randomly sampled from the image and used as training patterns to the proposed FNN. (a) Original image. (b) Text image patches. (c) Picture image patches. (d) Background image patches.

lapping 10 10 pixel patches. The misclassified patches are shown with different grey levels from the square region of the texture.

In the second experiment, we used a text image of scanned newspaper articles. The training data set consisted of 5 5 pixel patches selected randomly from the images of two difference types. Each type is random selected with 50 patches. We classify the output using the following classification rule:

Class Text if

Background if (52)

Fig. 9(a) shows the original text image and Fig. 9(b) shows the classification result of the FNN. The 5 5 pixel patches are shown in Fig. 9(c) and (d), respectively. In Fig. 9(b), the text region is denoted by white color while the background region by black color. Obviously, the text region and the background region are successfully separated into two classes using the pro-posed on-line ICA-mixture-model-based FNN.

The final experiment shows the segmentation of a text/picture mixture image of scanned newspaper articles. This image con-tains text and a picture, and the goal is to separate text, picture, and background regions in the image apart. The training data set consists of 10 10 pixel patches selected randomly from the text, picture, and the background regions. In the training set, each class includes 50 image patches. The output is classified by the following classification rule:

Class

Text if

Picture if

Background if

(53) Fig. 10(a) shows the scanned image. Fig. 10(b)–(d) illustrate examples of the image patches including text, picture, and background regions. Before learning, the prior values of the image patches were normalized to the range [0, 1]. Fig. 11 shows the classification result of the proposed FNN for this scanned image using image patches of 10 10 pixel size. When the iteration number is set to 100, the error is reduced to 0.0023. When the iteration number is increased to 500, the error rate almost reaches zero. Finally, the segmentation result of the whole scanned image is shown in Fig. 11(b).

Fig. 10. Example of the scanned page. The 102 10 pixel images were randomly sampled from the images as training patterns to the proposed FNN. (a) Original image. (b) Text image patches. (c) Picture image patches. (d) Background image patches.

VI. CONCLUSION

In this paper, a novel FNN was proposed based on a newly de-rived on-line ICA mixture model. It is a general connectionist model of a fuzzy logic system, which can find its optimal struc-ture and parameters automatically. Both the strucstruc-ture and pa-rameter identifications are done simultaneously during on-line learning, so it can be used for normal operation at any time as learning proceeds without any assignment of fuzzy rules in ad-vance. For structure learning, the proposed on-line ICA mix-ture model algorithm was able to identify the optimal number of clusters (i.e., rules) and simultaneously estimate the centers and variances of the clusters for constructing the FNN structure in a single pass without a priori knowledge of the distribution

(14)

Fig. 11. Segmentation of an image scanned from a magazine: (a) Original image. (b) The segmentation result of the whole scanned image.

of the training data set. A novel network construction method for solving the dilemma between the number of rules and the number of consequent terms is developed. The number of gener-ated rules and membership functions is small even for modeling a sophisticated system. As a summary, the proposed FNN can always find itself an economic network size, and the learning speed as well as the modeling ability is all appreciated. Several experiments covering the areas of system identification, clas-sification, and image segmentation were carried out to demon-strate the performance of the proposed FNN. These experiments showed that the proposed FNN can achieve significant improve-ments in the convergence speed and prediction accuracy.

REFERENCES

[1] B. Kosko, Neural Networks and Fuzzy Systems. Englewood Cliffs, NJ: Prentice-Hall, 1992.

[2] C. T. Lin, Neural Fuzzy Control Systems with Structure and Parameter

Learning. New York: World Scientific, 1994.

[3] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neural-Fuzzy

Synergism to Intelligent Systems. Englewood Cliffs, NJ: Prentice-Hell, 1996.

[4] R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing: A

Computational Approach to Learning and Machine Intelligence. En-glewood Cliffs, NJ: Prentice-Hall, 1997.

[5] D. Nauck, F. Klawonn, and R. Kruse, Foundations of Neuro-Fuzzy

Sys-tems. New York: Wiley, 1997.

[6] S. Horikawa, T. Furuhashi, and Y. Uchikawa, “On fuzzy modeling using fuzzy neural networks with the backpropagation algorithm,” IEEE

Trans. Neural Netw., vol. 3, pp. 801–806, Sep. 1992.

[7] K. Tanaka, M. Sano, and H. Watanabe, “Modeling and control of carbon monoxide concentration using a neuro-fuzzy technique,” IEEE Trans.

Fuzzy Syst., vol. 3, pp. 271–279, Aug. 1995.

[8] Y. Lin and G. A. Cunningham, “A new approach to fuzzy-neural system modeling,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 190–197, May 1995. [9] M. Sugeno and T. Yasukawa, “A fuzzy-logic-based approach to

qualita-tive modeling,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 7–31, Feb. 1993. [10] L. Wang and R. Langari, “Building sugeno-type models using fuzzy

discretization and orthogonal parameter estimation techniques,” IEEE

Trans. Fuzzy Syst., vol. 3, pp. 454–458, Nov. 1995.

[11] E. H. Ruspini, “Recent development in fuzzy clustering,” Fuzzy Set and

Possibility Theory, pp. 113–147, 1982.

[12] T. W. Lee, M. S. Lewicki, and T. J. Sejnowski, “Unsupervised classi-fication with nongaussian mixture models using ICA,” Adv. Neural Inf.

Process. Syst., vol. 11, pp. 508–514, 1999.

[13] C. F. Juang and C. T. Lin, “An on-line self constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, no. 1, pp. 12–32, Feb. 1998.

[14] T. W. Lee, M. Girolami, and T. J. Sejnowski, “Independent component analysis using an extended infomax algorithm for mixed sub-gaussian and super-gaussian sources,” Neural Comput., vol. 11, no. 2, pp. 417–441, 1999.

[15] J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy Models and

Algorithms for Pattern Recognition and Image Processing. Boston, MA: Kluwer, 1999.

[16] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001.

[17] F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy Cluster

Anal-ysis. New York: Wiley, 1999.

[18] G. J. McLachlan and T. Krishnan, The EM Algorithms and

Exten-sions. New York: Wiley, 1997.

[19] T.-W. Lee and M. S. Lewicki, “Image processing methods using ICA mixture models,” in Independent Component Analysis: Principles and

Practice, S. Roberts and R. Everson, Eds. New York: Cambridge Univ. Press, 2001.

[20] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.

[21] M. E. Tipping and C. M. Bishop, “Mixtures of probabilistic principal component analyzers,” Neural Computation, vol. 11, no. 2, pp. 443–482, 1999.

[22] M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representa-tions,” Neural Computation, vol. 12, no. 2, pp. 337–365, 2000. [23] D. MacKay, Maximum Likelihood and Covariant Algorithms for

Inde-pendent Component Analysis, Draft 3.7, 1996.

[24] U.-M. Bae and T.-W. Lee, “Blind signal separation in teleconferencing using the ICA mixture model,” Electron. Lett., vol. 36, no. 7, pp. 680–382, 2000.

[25] J. F. Cardoso, “High-order contrasts for independent component anal-ysis,” Neural Comput., vol. 11, pp. 157–192, 1999.

[26] R. O. Duda and P. E. Hart, Pattern Classification and Scene

Anal-ysis. New York: Wiley, 1973.

[27] T. W. Lee, M. S. Lewicki, and T. J. Sejnowski, “ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation,” IEEE Trans. Pattern Anal.

Mach. Intell., vol. 22, no. 10, Oct. 2000.

[28] T. W. Lee, M. S. Lewicki, and T. J. Sejnowski, “ICA mixture models for unsupervised and automatic context switching,” in Proc. Int. Workshop

ICA, 1999, pp. 209–214.

[29] Md. M. Islam, X. Yao, and K. Murase, “A constructive algorithm for training cooperative neural network ensembles,” IEEE Trans. Neural

Netw., vol. 14, no. 4, Jul. 2003.

[30] C. L. Blake and C. J. Merz. (1998) UCI Repository of Machine Learning

Databases [Online]. Available:

http://www.ics.uci.edu/~mlearn/ML-Repository.html

[31] P. K. Simpson, “Fuzzy min-max neural networks—Part I: Classifica-tion,” IEEE Trans. Neural Netw., vol. 3, pp. 776–786, Sep. 1992. [32] H. M. Lee, “A neural network classifier with disjunctive fuzzy

informa-tion,” Neural Netw., vol. 11, no. 6, pp. 1113–1125, 1998.

[33] H. M. Lee, C. M. Chen, J. M. Chen, and Y. L. Jou, “An efficient fuzzy classifier with feature selection based on fuzzy entropy,” IEEE Trans.

Syst., Man, Cybern. B, vol. 31, pp. 426–432, Jun. 2001.

[34] T. P. Wu and S. M. Chen, “A new method for constructing membership functions and fuzzy rules from training examples,” IEEE Trans. on Syst.

Man, Cybern. B, Cybern., vol. 29, pp. 25–40, Feb. 1999.

[35] J. S. Wang and C. S. George Lee, “Self-adaptive neuro-fuzzy inference systems for classification applications,” IEEE Trans. Fuzzy Syst., vol. 10, no. 6, Dec. 2002.

[36] B. C. Lovel and A. P. Bradley, “The multiscale classifier,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 18, no. 2, pp. 124–137, Feb. 1996.

[37] D. Nauck and R. Kruse, “A neuro-fuzzy method to learn fuzzy classi-fication rules from data,” Fuzzy Sets Syst., vol. 89, no. 3, pp. 277–288, 1997.

[38] R. Setiono and H. Liu, “Neural-network feature selector,” IEEE Trans.

Neural Netw., vol. 8, no. 3, pp. 654–662, Jun. 1997.

[39] M. Setnes and H. Roubos, “GA-fuzzy modeling and classification: Com-plexity and performance,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 509–522, Oct. 2000.

(15)

[40] H. Ishibuchi, T. Nakashima, and T. Murata, “Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 29, pp. 601–618, Oct. 1999.

[41] A. L. Corcoran and S. Sen, “Using real-valued genetic algorithms to evolve rule sets for classification,” in Proc. 1st IEEE Conf. Evolutionary

Computation, Orlando, FL, Jun. 1994, pp. 120–124.

[42] P. Brazdil and J. Gama, LIACC, Univ. of Porto Rua Campo Alegre 823 4150 Porto, Portugal.

Chin-Teng Lin (S’88–M’91–SM’99–F’04) received the B.S. degree in control engineering from the National Chiao-Tung University (NCTU), Hsinchu, Taiwan, R.O.C., in 1986, and the M.S.E.E. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, IN, in 1989 and 1992, respectively.

Since August 1992, he has been with the College of Electrical Engineering and Computer Science, Na-tional Chiao-Tung University, where he is currently the Associate Dean of the college and a professor of Electrical and Control Engineering Department. He has also served as the Di-rector of Brain Research Center, NCTU Branch, University System of Taiwan since September 2003. He served as the Director of the Research and Devel-opment Office of the National Chiao-Tung University from 1998 to 2000, and the Chairman of the Electrical and Control Engineering Department from 2000 to 2003. His current research interests are neural networks, fuzzy systems, cel-lular neural networks (CNN), fuzzy neural networks (FNN), neural engineering, algorithms and VLSI design for pattern recognition, intelligent control, and mul-timedia (including image/video and speech/audio) signal processing, and intelli-gent transportation system (ITS). He is the book co-author of Neural Fuzzy

Sys-tems—A Neuro-Fuzzy Synergism to Intelligent Systems (Prentice Hall), and the

author of Neural Fuzzy Control Systems with Structure and Parameter Learning (New York: World Scientific, 1994). He has also published over 80 journal pa-pers in the areas of neural networks, fuzzy systems, multimedia hardware/soft-ware, and soft computing, including 60 IEEE journal papers.

Dr. Lin is a member of Tau Beta Pi, Eta Kappa Nu, and Phi Kappa Phi hon-orary societies. He is also a member of the IEEE Circuit and Systems Society (CASS), the IEEE Neural Network Society, the IEEE Computer Society, the IEEE Robotics and Automation Society, and the IEEE System, Man, and Cy-bernetics Society. Dr. Lin is the Distinguished Lecturer representing the NSATC of IEEE CASS from 2003 to 2005. He has been very active in the IEEE Inter-national Symposium on Circuits and Systems (ISCAS) by serving as the Or-ganizing Committee member, as the International Liaison of ISCAS 2005 in Japan, and the Organizing Committee member as the Special Session Co-Chair of ISCAS 2006 in Greece. He has been the Executive Council member (Super-visor) of the Chinese Automation Association since 1998. He was the Executive Council member of the Chinese Fuzzy System Association Taiwan (CFSAT), from 1994 to 2001. Dr. Lin is the Society President of CFSAT since 2002. He has won the Outstanding Research Award granted by the National Science Council (NSC), Taiwan, since 1997 to present, the Outstanding Electrical En-gineering Professor Award granted by the Chinese Institute of Electrical Engi-neering (CIEE) in 1997, the Outstanding EngiEngi-neering Professor Award granted by the Chinese Institute of Engineering (CIE) in 2000, and the 2002 Taiwan Outstanding Information-Technology Expert Award. He was also elected to be one of the 38th Ten Outstanding Rising Stars in Taiwan, R.O.C., (2000). He cur-rently serves as an Associate Editor of the IEEE TRANSACTIONS ONCIRCUITS ANDSYSTEMS: I—REGULARPAPERS, IEEE TRANSACTIONS ONCIRCUITS AND

SYSTEMS: II—EXPRESSBRIEFS, International Journal of Speech Technology, and the Journal of Automatica.

Wen-Chang Cheng received the B.S. degree in electronics engineering from National Cheng-Kung University, Tainan, Taiwan, R.O.C., the M.S. degree in electronics engineering from National Chung-Cheng University, Chiayi, Taiwan, R.O.C., in 1997 and 1999, respectively. He is currently working toward the Ph.D. degree in electrical and control engineering at National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

He is also a Lecturer in information management at Hsiuping Institute of Technology, Taichung, Taiwan, R.O.C. His current research interests include neuro-fuzzy systems, neural net-works, image processing, machine learning, and artificial intelligence.

Sheng-Fu Liang was born in Tainan, Taiwan, R.O.C., in 1971. He received the B.S. and M.S. degrees in control engineering from the National Chiao-Tung University (NCTU), Taiwan, R.O.C., in 1994 and 1996, respectively. He received the Ph.D. degree in electrical and control engineering from NCTU in 2000.

Currently, he is a Research Assistant Professor in Electrical and Control Engineering, NCTU. Dr. Liang has also served as the Chief Executive of Brain Research Center, NCTU Branch, University System of Taiwan since September 2003. His current research interests are neural networks, fuzzy neural networks (FNN), brain-computer interface (BCI), and multimedia signal processing.