A recurrent fuzzy cellular neural network system with automatic structure and template learning

(1)

A Recurrent Fuzzy Cellular Neural Network System

With Automatic Structure and Template Learning

Chin-Teng Lin, Senior Member, IEEE, Chun-Lung Chang, and Wen-Chang Cheng

Abstract—It is widely accepted that using a set of cellular neural

networks (CNNs) in parallel can achieve higher level information processing and reasoning functions either from application or bio-logics points of views. Such an integrated CNN system can solve more complex intelligent problems. In this paper, we propose a novel framework for automatically constructing a multiple-CNN integrated neural system in the form of a recurrent fuzzy neural network. This system, called recurrent fuzzy CNN (RFCNN), can automatically learn its proper network structure and parameters simultaneously. The structure learning includes the fuzzy division of the problem domain and the creation of fuzzy rules and CNNs. The parameter learning includes the tuning of fuzzy membership functions and CNN templates. In the RFCNN, each learned fuzzy rule corresponds to a CNN. Hence, each CNN takes care of a fuzzily separated problem region, and the functions of all CNNs are inte-grated through the fuzzy inference mechanism. A new online adap-tive independent component analysis mixture-model technique is proposed for the structure learning of RFCNN, and the ordered-derivative calculus is applied to derive the recurrent learning rules of CNN templates in the parameter-learning phase. The proposed RFCNN provides a solution to the current dilemma on the decision of templates and/or fuzzy rules in the existing integrated (fuzzy) CNN systems. The capability of the proposed RFCNN is demon-strated on the real-world defect inspection problems. Experimental results show that the proposed scheme is effective and promising.

Index Terms—Cellular neural networks (CNN) template design,

defect inspection, fuzzy clustering, fuzzy neural network (FNN), independent component analysis (ICA), ordered derivative, recur-rent neural network.

I. INTRODUCTION

A

CELLULAR neural network (CNN) [1], [2] is a locally in-terconnected analog processor array arranged to a regular two-dimensional (2-D) grid. Its 2-D inputs and outputs make it very suitable for image processing. It possesses some important characteristics such as efficient real-time processing capability and feasible very large-scale integration (VLSI) implementa-tion. A CNN has a space invariant local interconnection struc-ture associated with 19 free parameters (neighborhood within a ). This parameter set called template exclusively determines its dynamic behavior. The CNN has been used to mimic the local function of biological neural circuits, especially the human visual pathway system [3]. According to a current bi-ological study [4], mammalian visual systems process the world

Manuscript received July 30, 2003; revised January 10, 2004. This work sup-ported by the Brain Research Center, University System of Taiwan, under Grant 92B-711. This paper was recommended by Guest Editor A. Zarándy.

The authors are with the Department of Electrical and Control Engi-neering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: [email protected]; [email protected]; wcc.ece88g@ nctu.edu.tw).

Digital Object Identifier 10.1109/TCSI.2004.827622

through a set of separate parallel channels. Each subchannel can be regarded as a unique CNN. The output of these subchannels is then combined to form the new channel responses. As a re-sult, it is widely accepted that using a set of CNNs in parallel can achieve higher level information processing and reasoning functions either from biologics or application points of views. Such an integrated CNN system can solve more complex intel-ligent problems.

For designing an integrated CNN system, in addition to the determination of a set of templates, another kernel problem is the way of integration. To solve this problem, the fuzzy in-ference system (FIS) is gaining attention. The FIS is a pop-ular computing framework based on the concept of fuzzy set theory, fuzzyIF-THENrules, and fuzzy reasoning. With crisp in-puts and outin-puts, FIS implements a nonlinear mapping from its input space to output space by a number ofIF-THENrules. It is very useful in image processing when it is difficult to specify, in a crisp mathematical form, the operation that is needed to yield a satisfying result from a complex image. For example, the boundary detection of different regions strongly depends on a subjective decision, especially in medical image. It cannot be clearly defined what is an edge-like and what is a noise-like pat-tern. In many cases, both statements might be true, therefore, a fuzzy-type linguistic description of all patterns is better than a crisp set approach. Therefore, FIS can play an important role to integrate a set of CNNs into a system.

To make a CNN or a set of CNNs having the ability of reasoning functions, several fuzzy-based CNN models were proposed [5]–[9], which are fuzzy CNN (FCNN) proposed by Yang et al. [5] and Yang and Yang [6], and fuzzy reasoning implemented on CNN proposed by Balsi and Voci [7], [8]. To make a set of CNNs in parallel achieve higher level information processing, several integrated CNN systems are proposed [4], [9]–[11], which are cellular neuro-fuzzy networks (CNFNs) proposed by Colodro and Torralba [9], and fuzzy-type CNN proposed by Rekeczky et al. [10], [11] and Szatmári et al. [4]. The common drawbacks of these approaches are that the corresponding templates cannot be learned and the fuzzy rules must be obtained by domain experts. Although according to Nossek’s survey [12], the template coefficients of a CNN can be found by design [12], [13] or by learning [12], [14], these techniques cannot be applied to the design or learning of an integrated CNN system directly.

An observation on the works of Colodro et al. [9], Rekeczky

et al. [10], and Szatmári et al. [4], they have two common

char-acteristics. First, they all used many CNNs in parallel to solve a complex problem such as edge detection with impulse noise, the detection of fuzzy boundary, and features extraction, etc.

(2)

Fig. 1. Structure of the proposed RFCNN.

Second, they all used an FIS to make a decision. For building an FIS, we have to specify the fuzzy sets, fuzzy operators and the knowledge base. However, the existing methods [4], [9], [11] all need to take the fuzzy rules manually by domain experts, which is difficult, even for domain experts, to examine all the input–output data from a complex system to find a number of proper fuzzy rules. In addition, they all need to assign the corre-sponding templates of CNNs in advance (i.e., templates cannot be learned). To cope with these drawbacks, we propose a novel framework for automatically constructing a multiple-CNN integrated neural system in the form of a recurrent fuzzy neural network (FNN). This system, called recurrent fuzzy CNN (RFCNN), can automatically learn its proper network structure and parameters simultaneously. The structure learning includes the fuzzy division of the problem domain and the creation of fuzzy rules and CNNs. The parameter learning includes the tuning of fuzzy membership functions and CNN templates. In the RFCNN, each learned fuzzy rule corresponds to a CNN. Hence, each CNN takes care of a fuzzily separated problem region, and the functions of all CNNs are integrated through the fuzzy inference mechanism.

The RFCNN is constructed in the form of a recurrent FNN. Two important learning tasks of a FNN are the structure identification and the parameters identification [15]–[19]. The structure identification is the partition of the input–output space [20]–[23], which influences the number of generated fuzzy rules, each corresponding to a CNN. Efficient partition of input–output data will result in faster convergence and better performance for FNN. In this paper, a new online adaptive independent component analysis (ICA) mixture-model

tech-nique is proposed for the structure learning of the RFCNN. Basically, ICA finds directions in the input space which lead to independent components instead of just uncorrelated ones, as principle component analysis (PCA) does [24], [25], so it reduces not only the number of rules (i.e., CNN) but also the number of membership functions under a pre-specified accuracy requirement dynamically. In the parameter learning of the RFCNN, the ordered derivative calculus is applied to derive the recurrent learning rules due to the recurrent structure of the RFCNN inherited from CNNs [1], [2]. The derived rules can learn the CNN templates and other parameters in the RFCNN efficiently. The proposed RFCNN provides a solution to the current dilemma on the decision of templates and/or fuzzy rules in the existing integrated (fuzzy) CNN systems. It has been applied to solve the real-world defect inspection problems, which contain multiple types of defects (faults) with different features on a single image. Experimental results successfully demonstrate that the proposed scheme is very effective and promising.

The paper is organized as follows. Section II describes the structure and functions of the proposed RFCNN. Section III describes the online structure and parameters learning algo-rithm for the RFCNN. Section IV gives experimental results and discussions. Finally, conclusions are summarized in the last section.

II. STRUCTURE OF THERFCNN

In this section, the structure of the proposed RFCNN shown in Fig. 1 is introduced. For clarity, we consider a CNN, with

(3)

time constant , time step , and neighborhood within a radius , which is characterized by the following templates:

(1) where , and is the feedback template, control tem-plate, and bias of the th CNN, respectively. By defining a CNN as above, the six-layered RFCNN network will realize a fuzzy model of the following form:

Rule is and is and is

is (2)

or

Rule is and is and is

is

(3) where the current input vector is

is is

is a sigmoid function, is a fuzzy set, and , and are consequent parameters representing feedback template, control template, and bias of the th CNN, respectively. The number of input dimension of the RFCNN will be if the neighborhood of a CNN cell is within a . As shown in (3), we focus on uncoupled CNN cells in this paper. With this six-layered network structure of the RFCNN, we shall define the function of each node and use the proposed online ICA mixture model described in the next section to construct the structure of the RFCNN.

The RFCNN consists of nodes, each of which has some fi-nite “fan-in” of connections represented by weight values from other nodes and “fan-out” of connections to other nodes. As-sociated with the fan-in of a node is an integration function , which serves to combine information, activation, or evidence from other nodes. This function provides the net input for this node

node input

(4)

where are inputs to this node and

are the associated link weights. The superscript in (4) indicates the layer number. This notation will also be used in the following equations. A second action of each node is to output an activation value as a function of its net input

node output node input

(5)

Fig. 2. Transformation by the online ICA mixture model for the proposed RFCNN. (a) Regions covered by the original axes. (b) Covered regions by the independent axes obtained by the online ICA mixture model transformation.

where denotes the activation function. We shall next de-scribe the functions of the nodes in each of the six layers of the RFCNN, which include five feedforward layers and one feed-back layer.

Layer 1: No computation is done in this layer. Each node in

this layer, which corresponds to one input variable, only trans-mits input values to the next layer directly. That is

and

(6) From the above equation, the link weight in layer one is unity.

Layer 2: Each node in this layer corresponds to one linguistic

value (small, large, etc.) of one of the input variables in Layer 1. In other words, the membership value which specifies the degree to which an input value belongs a fuzzy set is calculated in Layer 2. There are many choices for the types of membership functions for use, such as triangular, trapezoidal, or Gaussian ones. In this paper, the membership functions are determined by the online ICA mixture model, which are either super-Gaussian function or sub-Gaussian function. It is noted that the output from Layer 1 is projected into the independent axes obtained by the online ICA mixture model (as shown in Fig. 2) such that

(7) where is the basis matrix determined by the online ICA mix-ture model, , and is the number of clusters. That is, if the input data are classified into clusters, the number of rules will be .

With the choice of non-Gaussian membership function, the operation performed in this layer is

where

for super-Guassian

for sub-Gaussian and

(8) where is the transformed value of the th term of the th input variable . The transformation can be regarded as a change of

(4)

input coordinates, where the parameters of each membership function are kept unchanged, i.e., the center and the width of each membership function on the new coordinate axes are the same as the old ones.

Layer 3: A node in this layer represents one fuzzy logic rule

and performs precondition matching of a rule. Here, we use the followingANDoperation for each Layer-3 node

and

(9) The link weight in the Layer is unity. The output of a Layer-3 node represents the firing strength of the corresponding fuzzy rule.

Layer 4: This layer is called the consequent layer. Different

nodes in Layer 3 may be connected to the same node in Layer 4, meaning that the same consequent fuzzy set is specified for dif-ferent rules. One of the inputs to each node is the output deliv-ered from Layer 3 (firing strength) and the other inputs are CNN related inputs , which are the output of feedback term node. The feedback term node will be described in the feedback layer part in this section. Combining the two kinds of inputs in Layer 4, we obtain the whole function performed by this layer as

and

(10) where is the feedback template, control template, bias of the th CNN, respectively, as defined in (1), and

is a sigmoid function, as defined in (13).

Layer 5: Each node in this layer corresponds to one output

variable. The node integrates all the actions recommended by Layer 4 and acts as a defuzzifier with

and

(11)

Feedback Layer: As shown as Fig. 1, this self-feedback layer

characterizes the consequents of the RFCNN as a CNN tem-plate. Two types of nodes are used in this layer, the square node named as context node and the circle node named as feedback

term node, where each context node is associated with a

feed-back term node. The number of context nodes (and thus the number of feedback term nodes) is the same as that of output term nodes in layer 4. The inputs to a context node are from its corresponding output term nodes , the input variables

from Layer 1 , and template

bias . The output of its associated feedback term node is fed to the original node in layer 4. The context node functions as the state (the summation of input part) of the th CNN

(12) As to the feedback term node, the membership function is used to approximate piecewise-linear

Fig. 3. Learning algorithm for the proposed RFCNN.

function used in CNN. With this choice, the feedback term node evaluates the output by

(13) This output is connected to its corresponding node in layer 4, which characterizes the consequents of the RFCNN as a CNN template.

III. LEARNINGALGORITHMS FORRFCNN

Two types of learning, structure and parameter learning, are used concurrently for the RFCNN. The structure learning includes both the precondition and consequent structure iden-tification of a fuzzyIF-THENrule. In the RFCNN, the structure learning includes the fuzzy division of the problem domain (precondition structure identification), and the creation of fuzzy rules and CNNs (consequent structure identification). The precondition structure identification corresponds to the input–space partitioning and can be formulated as a combina-tional optimization problem with the following two objectives: to reduce the number of rules generated and to reduce the number of fuzzy sets on the universe of discourse of each input variable. As to the consequent structure identification, the main task is to decide when to generate a new consequent term (or a new CNN) for the output variable. In our system, we propose an online ICA mixture model to realize the precondition and consequent structure identification part of the RFCNN.

For the parameter learning, the parameters of each CNN tem-plate in the consequent parts are adjusted by the ordered deriva-tive algorithm to minimize a given cost function. The parameters in the precondition part are adjusted by the online ICA mixture model algorithm. The RFCNN can be used for normal opera-tion at any time during the learning process without repeated training on the input–output patterns when online operation is required. There are no rules (i.e., no nodes in the network except the input–output nodes) in this network initially. They are cre-ated dynamically as learning proceeds upon receiving online in-coming training data by performing the following learning pro-cesses simultaneously (see Fig. 3).

As shown in Fig. 3, learning processes (1) and (2) belong to the structure learning phase and (3) belongs to the parameter learning phase. The details of these learning processes are de-scribed in the rest of this section.

(5)

Fig. 4. Fuzzy partitions of 2-D input space. (a) Grid-based partitioning. (b)IF-THENrules based on grid-based partitioning. (c) Clustering-based partitioning. (d)IF-THENrules based on clustering-based partitioning.

A. Input–Output Space Partitioning

Efficient partition of input–output data will result in faster convergence and better performance for the RFCNN. The most direct way is to partition the input space into grid types and each grid represents a fuzzyIF-THENrule [see Fig. 4(a)]. This is called grid-based partitioning. The major problem of such kind of par-tition is that the number of fuzzy rules (and thus, the number of CNNs) increases exponentially if the number of input vari-ables or that of partition increases. A flexible partition method, the clustering-based approach, which clusters the input training vectors in the input space, will reduce the rule and CNN num-bers [20]–[23]. In fact, by observing the projected memnum-bership functions in Fig. 4(c), although the number of membership func-tions in Fig. 4(d) is more than that in Fig. 4(b), there are only five rules in Fig. 4(d); however, there are nine rules in Fig. 4(b). By observing the projected membership functions in Fig. 4(c), we find that some membership functions projected from dif-ferent clusters have high similarity degrees. These highly ilar membership functions can be checked and merged by sim-ilarity measure. In this paper, we propose a clustering method based on a new online ICA mixture model to provide a better partition of the input–output space for the proposed RFCNN. The background and algorithm of the proposed online ICA mix-ture model for clustering will be described in the following subsections.

1) ICA Mixture Model: Several methods for input space

partition have been proposed to cluster the input training vectors

in the input space, such as Kohonen learning rule, hyperbox method, product–space partitioning, fuzzy -mean method, electromagentic algorithm, etc., [26]–[29]. Those methods are usually based on Gaussian membership functions. In general, the observed data can be categorized into several mutually exclusive classes [30]. When the data in each class are modeled as multivariate Gaussian, it is called a Gaussian mixture model (GMM) which is widely used throughout the fields of machine learning and statistics. One major drawback of GMMs is that if the dimension of the problem space increases, the size of each covariance matrix, , becomes prohibitively large. This problem has been solved by Tipping and Bishop [31] who replaced each Gaussian with a probabilistic principal compo-nent analysis (PCA) model. This allowed the dimensionality of each covariance to be effectively reduced while maintaining the richness of the model class. ICA [24] is a technique that exploits higher-order statistical structure of the data, which has recently gained attention due to its successful applications to signal processing problems including speech enhancement, dis-crete signal processing and image processing, etc. The goal of ICA is to linearly transform the data such that the transformed variables are as statistically independent from each other as possible. Basically, it finds direction in the input space which lead to independent components instead of just uncorrelated ones as PCA does, so it can be used to reduce not only the number of rules but also the number of membership functions under a pre-specified accuracy requirement dynamically.

(6)

Another drawback of GMMs is that it is based on Gaussian function. In some situation, it could not be separated from each other. It is generalized by assuming the data in each class are generated by a linear combination of independent non-Gaussian source [33]. This model is called the ICA mixture model. This allows modeling of classes with non-Gaussian structure such as platykurtic or leptokurtic probability density functions, and the model uses the gradient ascent method to maximize the log-like-lihood function. In previous applications, this approach showed improved performance in data classification problems [34] and learning efficient codes for representing different types of im-ages [25]. The advantage of this model is that the input data with increasing numbers of classes can provide greater flexibility in modeling structure and in finding more features compared with GMMs or standard ICA algorithms. Although the ICA mixture model has many advantages, its cluster number should be given beforehand and the learning scheme is only suitable for off-line instead of online operation. Therefore, in the following section, we shall propose an online ICA mixture model to provide better dynamic partitioning of the input–output space for the proposed RFCNN.

2) Online ICA Mixture Model for Dynamic Clustering: The

proposed online ICA mixture model is derived from the con-ventional ICA mixture model. To enable the online operation, we will define a criterion to determine whether the number of clusters should be increased or not for any incoming training pattern. For each incoming pattern to the RFCNN, the resulting firing strength of a fuzzy rule can be interpreted as the degree that the incoming pattern belongs to the corresponding cluster. This likelihood can be represented as

(14) where denotes the incoming pattern at time , and is the log likelihood value indicating the degree that the input data, , belongs to the th cluster for . Now, we assume that the number of clusters at time is . Then, the total probability at time is

(15) Therefore, the posterior probability is

(16) where is the prior probability at preceding time, which can be obtained by former calculation result of the th cluster. Hence, the probability at this moment can be calculated by the following:

(17) Then, the posterior probability in (16) can be obtained.

Based on the above derivation, we can obtain the following criterion for the generation of a new fuzzy rule (i.e., a new CNN). Let be the newly incoming pattern at time . Defining (18) If , then, a new rule is generated, where is a pre-specified threshold value that decays during the learning process. Once a new rule is generated, the next step is to as-sign initial values of the corresponding membership functions. If , a new incoming data is added to an existed cluster and we have to update the parameters of each cluster such as mean , covariance matrix , and the criterion of data distribution that determines if the distribution of data is super-Gaussian or sub-Gaussian with the previous calculation results. They are defined in (19)–(21), shown at the bottom of the next page. In these, the function is defined as the function of criterion which allows for automatic switching be-tween super-Gaussian and sub-Gaussian models and (21) can be further derived as

(22) where

(23) Finally, the independent axes , representing the axis of the th cluster, can be obtained by the following formulations:

(24) and

(25) In (24), the function is called component-wise non-linearity function. If the distribution of data is appearing the super-Gaussian distribution, then it will be defined as . Otherwise, if the distribution of data is appearing the sub-Gaussian distribution, then it will be defined

as .

Since the algorithm of online ICA mixture model can auto-matically determine the number of clusters according to new in-coming data, it solves the problem of conventional ICA mixture model that the number of clusters has to be given beforehand.

(7)

B. Structure Learning Algorithm of RFCNN With On-Line ICA Mixture Model

The way the input space is partitioned determines the number of rules extracted from training data as well as the number of fuzzy sets on the universal of discourse of each input variable. We will define a criterion to determine whether a new cluster (i.e., a new fuzzy rule or a new CNN) should be added or not. Let of cluster be the newly incoming pattern at time . Defining (26)

where is the log likelihood value

indi-cating the degree that input data, , belongs to the th cluster, and the superscript is a maximum log likelihood value among all log likelihood values. If , the number of cluster is not increased, where is a pre-specified threshold value that decays during the learning process. In this case, the new incoming pattern is added to an existed cluster and the pa-rameters of this cluster will be updated properly. Oppositely, if , the number of cluster will be increased. The threshold value is determined by experiments.

The whole algorithm for the generation of new fuzzy rules as well as fuzzy sets in each input variable is shown in Fig. 5 step by step. In PART 2 of Fig. 5, the threshold determines how many rules will be generated, where should be nega-tive since it is taken in natural log. For a lower value of , more rules will be generated. Similarly, determines how many output clusters will be generated and a lower value of will result in more output clusters. For the output space parti-tioning, the same approach in (14) is used. The generation of a new output cluster corresponds to the generation of a new CNN.

Fig. 5. Algorithm of input space partitioning.

Suppose a new input cluster is formed after the presentation of the current input–output training pair ; then, the conse-quent part is constructed by the algorithms shown in Fig. 6.

The above algorithm is based on the fact that different pre-condition of different rules may be mapped to the same conse-quent term, i.e., CNN. Since only the center of each output mem-bership function is used for defuzzification, the consequent part of each rule may simply be regarded as a singleton. Compared to the general fuzzy rule-based models with singleton output where each rule has its own individual singleton value, fewer parameters are needed in the consequent part of the RFCNN, especially for the case with a large number of rules.

(19)

(20)

(8)

Fig. 6. Algorithm of output space partitioning.

C. Parameter Learning Algorithm of RFCNN by Ordered Derivative Calculus

After the network structure is adjusted according to the cur-rent training pattern, the network then enters the parameter iden-tification phase to adjust the parameters of the network opti-mally based on the same training pattern. Notice that the fol-lowing parameter learning is performed on the whole network after structure learning; no matter whether the nodes (links) are newly added or are existent originally. Since the RFCNN is a dy-namic system with feedback connections, the backpropagation learning algorithm cannot be applied to it directly. Also, due to the online learning property of the RFCNN, the off-line learning algorithms for the recurrent neural networks, like tion through time and time-dependent recurrent backpropaga-tion [17], cannot be applied here. Instead, the ordered derivative [34], which is a partial derivative whose constant and varying terms are defined using an ordered set of equations, is used to derive our learning algorithm. The ordered set of equations, de-scribed in Section II in each layer, is summarized in (28)–(33). Our goal is to minimize the error function

(27)

where is the desired output, is the

cur-rent output, and is . For each

training data set, starting at the input nodes, a forward pass is used to compute the activity levels of all the nodes in the net-work to obtain the current output . In the followings, dependency on time will be omitted unless emphasis on tem-poral relationships is required.

Summarizing the node functions defined in Section II, the function performed by the network is

(28) (29) where (30) (31) (32)

and (1) is redefined as the following equation for clarity:

(33) With the above formula and the error function defined in (27), we can derive the update rules for the free parameters in the RFCNN as follows.

Update rule of (the parameter of feedback template of the th CNN) is (34) (35) where (36) and (37) where (38) and (39) Hence, the parameter is updated by

(40) Similarly, the parameter (the parameters of control tem-plate of the th CNN) is updated by

(41) and the parameter (the bias of the th CNN) is updated by

(42) As shown in (37) to (39), the update rules are in recursive form. The value is equal to zero initially. For the rest free parameters in the RFCNN, they are obtained during the structure-learning phase by the online ICA mixture model algorithm proposed in the last section. Notice that according to the real-time recurrent learning (RTRL) scheme [35], we can also obtain the same parameter learning rules for the RFCNN.

(9)

Fig. 7. Training images. (a) Input image. (b) Desired output.

Of course, other existing online learning algorithms [36], [37] for tuning the weights of recurrent neural networks can be pos-sibly adopted for tuning the RFCNN, too.

IV. EXPERIMENTALRESULTS ANDDISCUSSIONS

The capability of the proposed RFCNN is demonstrated on the real-world defect inspection problems. Automatic defect spection systems are becoming more and more important in in-dustrial production lines. Especially in the electronics industry, an attempt is often made to achieve almost 100% quality con-trol of all components and final goods. Here, we are interested in the defect inspection of color filter, which is one of components in thin film transistor liquid crystal display (TFT-LCD) module and gives each pixel of LCD its own color. The difficulties in the defect inspection of color filter are its complex texture and need for high-speed processing. For high-speed processing, the CNN is a good way to achieve defect inspection. Besides, different kinds of defects in color filter need different CNN templates and some complex defects cannot be detected by a single CNN. Therefore, the proposed RFCNN is a good alternative to detect defect of color filter images. To train the RFCNN, we use a 3 3 window to get the system inputs and set the whole image as the inputs of the RFCNN. The 3 3 window covers the central pixel and its eight connected neighbors. The training image and cor-responding desired output are shown in Fig. 7(a) and (b). We set the threshold and learning rate as

for the clustering algorithm. As mentioned in Section III, there are no rules (and no CNNs) in the RFCNN initially. They are created dynamically as learning proceeds upon receiving online incoming training data by performing the learning processes shown in Fig. 3. When the learning processes are done, three clusters (three fuzzy rules and CNN templates) were obtained. For an example of color filter, it takes about 1 min to learn the structure (interconnection set) and 2 minutes to learn the param-eters with a Pentium IV 2.0-GHz PC. However, the training can be done off-line, so it is not a problem for the online processing of CNN, which causes just little time.

Fig. 8 shows the outputs of Layer 3, 4, and feedback layer for the training image. Fig. 8(a) to (c) shows the outputs of the three Layer-4 nodes, respectively, i.e., the outputs of the three CNNs in the feedback layer multiplied by the outputs of the three Layer-3 nodes (i.e., firing strength of each rule), respec-tively. Fig. 8(d) to (f) shows the outputs of the three CNNs in the feedback layer, respectively. Fig. 8(g) to (i) shows the outputs of the three Layer-3 nodes, respectively, (firing strength of each rule). The sum of the outputs of the three Layer-4 nodes [i.e., Fig. 8(a) to (c)] forms the RFCNN final output. From Fig. 8(a) to (c), we can see that CNN 1 takes care of the defect texture in the right side of the training image, and CNNs 2 and 3 mainly

Fig. 8. Outputs of Layer 3, 4, and feedback layer for the training image. (a)–(c) Outputs of the three Layer-4 nodes, respectively. (d)–(f) Outputs of the three CNNs in the Feedback Layer, respectively. (g)–(i) Outputs of the three Layer-3 nodes, respectively (firing strength of each rule).

take care of the defect textures in the left side of the training image. The template of each learned CNN is given as follows:

Based on the learned structure and parameters of the RFCNN, we test three images as shown in Fig. 9. Fig. 9(a), (c), and 9(e) shows the testing images and Fig. 9(b), (d), and 9(f) shows the corresponding results of defect inspection. From Fig. 9(a) to (f), we can see that the learned structure and CNN templates of the RFCNN are well suited to detect the defects of color filer images. It has also been tested that detection results are still good if the images are shifted, that is because that the RFCNN only considers the central pixel and its eight connected neighbors and they are still regular patterns after images are shifted. Therefore, if the images are shifted, we need not reteach the network.

The conventional methods using CNN for defect inspection [38]–[41] are using one or a set of CNN templates, which can

(10)

Fig. 9. Experimental (Testing) results of the learned RFCNN. (a), (c), and (e) are input testing images. (b), (d), and (f) are corresponding detection results.

Fig. 10. Training images by GA. (a) and (c) are input images. (b) and (d) are corresponding desired outputs.

be obtained by experiential engineers or learned by examples, to detect defect. To compare the RFCNN with conventional methods, we performed some experiments using a single CNN whose template is learned by the genetic algorithm (GA). We find that the training image, shown in Fig. 7(a), cannot be learned well by using only a single CNN. However, if we have the training images and corresponding desired outputs as shown in Fig. 10(a) to (d), the CNN template can be learned well by GA. This fact implies that different kinds of defects in color filter need different CNN templates. That is, we can first identify the categories of defects and make each CNN template of defect category learned by GA. However, this will cause related questions as follows. First, how many defect categories, which determine how many CNN templates, should be classified? Second, how can we be sure which defects belong to the same category? In other words, what is the corresponding desired output for the uncategorized defects of color filter? Therefore it is difficult to manually use the divide-and-conquer principle to learn the templates of CNNs by GA. For the dilemma mentioned above, the proposed RFCNN provides a good alternative to solve this kind of problem.

To make the RFCNN converge more quickly during learning, GA can be used to learn some CNN templates to initialize the consequent part of the RFCNN. Though this experiment focuses on defect inspection of color filter, the proposed RFCNN can be also applied to those images with regular pattern, such as texture webs.

The main idea of the proposed RFCNN is an integrated system of FIS and CNNs, which can construct fuzzy rules and CNN templates automatically. The example for the defect inspection of color filter has been demonstrated to verify the capability of the RFCNN. In addition to the defect inspection of color filter, we believe such an integrated CNN system, the RFCNN, has potential to solve more complex intelligent prob-lems such as biological phenomena or other applications. Since CNN bears the characteristic of high-speed processing based on analog circuit realization, it will be very useful to realize the RFCNN by analog circuits. As studied in [7], [11], the elementary fuzzy-logic computations, such as the , and fuzzification operator in a fixed neighborhood, have already been designed in CNN. Therefore, it is very promising and feasible to implement the RFCNN in the future work. An implementation scheme to realize the RFCNN includes the following two steps. First, use the RFCNN to learn the fuzzy rules and CNN templates. Second, construct a FIS based on the learned fuzzy rules and CNN templates.

For taking into account the nonidealities or mismatch due to the manufacturing, there are some ways can be done as follows: 1) We may add constraints with upper bound and lower

bound to the learned parameter in learning algorithm. 2) The interval parameter learning is also available in [42]

such that a tolerant range of parameters (weights) devia-tion can be achieved.

3) Since the proposed RFCNN can learn the structure and parameters automatically, we can increase the number of CNN and other nodes automatically to achieve the re-quired accuracy if the target accuracy has not been sat-isfied.

V. CONCLUSION

In this paper, we propose a novel framework, called RFCNN, for automatically constructing a multiple-CNN integrated neural system. This CNN-based FNN can automatically learn its proper network structure and parameters simultaneously. The structure learning includes the creation of fuzzy rules and CNNs with a new online adaptive ICA mixture-model technique. The parameter learning includes the tuning of fuzzy membership functions and CNN templates based on the ordered derivative calculus. The proposed RFCNN provides a solution to the current dilemma on the decision of templates and/or fuzzy rules in the existing integrated (fuzzy) CNN systems. In order to verify the capability of the RFCNN, a real-world defect inspection problem has been demonstrated. The experimental results show that the proposed scheme is effective and promising. Our future work includes extending the RFCNN to include the coupled CNNs and finding more application examples.

(11)

REFERENCES

[1] L. O. Chua and L. Yang, “Cellular neural networks: Theory,” IEEE

Trans. Circuits Syst., vol. 35, pp. 1257–1272, Oct. 1988.

[2] , “Cellular neural networks: Applications,” IEEE Trans. Circuits

Syst., vol. 35, pp. 1273–1290, Oct. 1988.

[3] J. Hámori and T. Roska, Receptive Field Atlas of the Retinotopic Visual

Pathway and Some Other Sensory Organs Using Dynamic Cellular Net-work Models, Budapest, Hungary: Analogical and Neural Computing

Laboratory, MTA-SZTAKI, DNS-8-2000.

[4] I. Szatmári, D. Bálya, G. Tımár, C. Rekeczky, and T. Roska, “Multi-channel spatio-temporal topographic processing for visual search and navigation,” in Proc. SPIE Microtechnologies for the New Millenium, Gran Canaria, Spain, May 2003, Paper 5119-38.

[5] T. Yang, L. B. Yang, C. W. Wu, and L. O. Chua, “Fuzzy cellular neural networks: Applications,” in Proc. Cellular Neural Networks Application

(CNNA’96), pp. 225–230.

[6] T. Yang and L. B. Yang, “Fuzzy cellular neural network: A new paradigm for image processing,” Int. J. Circuit Theory Applicat., vol. 25, no. 6, pp. 469–481, 1997.

[7] M. Balsi and F. Voci, “Implementation of fuzzy rule based image pro-cessing on the CNN universal machine,” in Proc. Eur. Conf. Circuit

Theory and Design (ECCTD’99), 1999, pp. 1167–1170.

[8] , “Fuzzy reasoning for the design of CNN-based image processing systems,” in Proc. IEEE Symp. Circuits and System (ISCAS’00), Geneva, Switzerland, May 28–31, 2000, pp. 405–408.

[9] F. Colodro and A. Torralba, “Cellular neuro-fuzzy networks (CNFNs), a new class of cellular networks,” in Proc. 5th IEEE Int. Conf. Fuzzy

Systems, vol. 1, Sept. 8–11, 1996, pp. 517–521.

[10] Cs. Rekeczky, T. Roska, and A. Ushida, “CNN-based difference-con-trolled adaptive nonlinear image filters,” Int. J. Circuit Theory Applicat., vol. 26, pp. 375–423, 1998.

[11] Cs. Rekeczky, Á. Tahy, Z. Végh, and T. Roska, “CNN based spatio-tem-poral nonlinear filtering and endocardial boundary detection in echocar-diography,” Int. J. Circuit Theory Applicat., vol. 27, pp. 171–207, 1999. [12] J. A. Nossek, “Design and learning with cellular neural networks,” Int.

J. Circuit Theory Applicat., no. 24, pp. 15–24, 1996.

[13] A. Zarandy, “The art of CNN template design,” Int. J. Circuit Theory

Applicat., no. 27, pp. 5–23, 1999.

[14] T. Kozek, T. Roska, and L. O. Chua, “Genetic algorithm for CNN tem-plate learning,” IEEE Trans. Circuits Syst. I, vol. 40, pp. 392–402, June 1993.

[15] B. Kosko, Neural Networks and Fuzzy Systems. Englewood Cliffs, NJ: Prentice-Hall, 1992.

[16] C. T. Lin, Neural Fuzzy Control Systems With Structure and Parameter

Learning. New York: World Scientific, 1994.

[17] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neural-Fuzzy

Synergism to Intelligent Systems. Englewood Cliffs, NJ: Prentice-Hell, 1996.

[18] R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft

Com-puting: A Computational Approach to Learning and Machine Intelligence. Englewood Cliffs, NJ: Prentice-Hall, 1997.

[19] D. Nauck, F. Klawonn, and R. Kruse, Foundations of Neuro-Fuzzy

Sys-tems. New York: Wiley, 1997.

[20] L. Wang and R. Langari, “Building Sugeno-type models using fuzzy discretization and orthogonal parameter estimation techniques,” IEEE

Trans. Fuzzy Syst., vol. 3, pp. 454–458, Nov. 1995.

[21] E. H. Ruspini, “Recent development in fuzzy clustering,” Fuzzy Set and

Possibility Theory, pp. 113–147, 1982.

[22] C. T. Sun and J. S. Jang, “A neuro-fuzzy classifier and its applications,” in Proc. IEEE Int. Conf. Fuzzy Syst., vol. 1, San Francisco, CA, Mar. 1993, pp. 94–98.

[23] C. F. Juang and C. T. Lin, “An online self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 12–32, Feb. 1998.

[24] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component

Anal-ysis. New York: Wiley, 2001.

[25] C. Jutten and J. Herault et al., “Independent components analysis (ICA) versus principal components analysis,” in Signal Processing IV:

The-ories and Applications, EUSIPCO-88, J. Lacoume et al., Eds,

Ams-terdam, The Netherlands: Elsevier, 1988, pp. 643–646.

[26] J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy Models and

Algorithms for Pattern Recognition and Image Processing. Boston, MA: Kluwer, 1999.

[27] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001.

[28] F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy Cluster

Anal-ysis. New York: Wiley, 1999.

[29] G. J. McLachlan and T. Krishnan, The EM Algorithms and

Exten-sions. New York: Wiley, 1997.

[30] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.

[31] L. X. Wang, Adaptive Fuzzy Systems and Control. Englewood Cliffs, NJ: Prentice-Hall, 1994.

[32] T. W. Lee, M. S. Lewicki, and T. J. Sejnowski, “ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation,” IEEE Trans. Pattern Anal.

Machine Intell., vol. 22, pp. 1078–1089, Oct. 2000.

[33] , “ICA mixture models for unsupervised and automatic context switching,” in Proc. Int. Workshop Independent Component Analysis, 1999, pp. 209–214.

[34] P. Werbos, “Beyond regression: New Tools for prediction and analysis in the behavior sciences,” Ph.D. dissertation, Dep. Appl. Math., Harvard Univ., Cambridge, MA, Aug. 1974.

[35] R. J. Williams and D. Zipser, “A learning algorithm for continually running recurrent neural networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.

[36] B. A. Pearlmutter, “Gradient calculations for dynamic recurrent neural networks: A survey,” IEEE Trans. Neural Networks, vol. 6, pp. 1212–1228, Sept. 1995.

[37] S. W. Piché, “Steepest descent algorithms for neural-network controllers and filters,” IEEE Trans. Neural Networks, vol. 5, pp. 198–212, Mar. 1994.

[38] V. Preciado, D. Guinea, R. Montufar, and J. Vicente, “Real-time inspec-tion of metal laminates by means of CNNs,” Proc. SPIE, vol. 4301, no. 39, pp. 260–270, 2001.

[39] D. Guinea, A. Gordaliza, J. Vicente, and M. C. Garcıa-Alegre, “CNN based visual processing for industrial inspection,” Proc. SPIE, vol. 3966, no. 45, pp. 315–322, 2000.

[40] C. L. Chang and C. T. Lin, “CNN-based defect inspection in images with regular pattern,” in Proc. 16th Eur. Conf. Circuit Theory and Design

(ECCTD’03), 2003, pp. I221–I224.

[41] R. Perfetti and L. Terzoli, “Analogic CNN algorithms for textile appli-cations,” Int. J. Circuit Theory Applicat., no. 28, pp. 77–85, 2000. [42] C. T. Lin and Y. C. Lu, “A neural fuzzy system with fuzzy supervised

learning,” IEEE Trans. Syst., Man, Cybern. B, vol. 26, pp. 744–763, May 1996.

Chin-Teng Lin (S’88–M’91–SM’99) received the B.S. degree in control engineering from the National Chiao-Tung University (NCTU), Hsinchu, Taiwan, R.O.C., and the M.S.E.E. and Ph.D. degrees in electrical engineering from Purdue University, Lafayette, IN, in 1986, 1989, and 1992, respectively. Since August 1992, he has been with the College of Electrical Engineering and Computer Science, NCTU, where he is currently the Associate Dean of the college and a Professor in the Electrical and Control Engineering Department. He has also served as the Director of Brain Research Center, NCTU Branch, University System of Taiwan, since September 2003. He served as the Director of the Research and Development Office, NCTU, from 1998 to 2000, and the Chairman of Electrical and Control Engineering Department from 2000 to 2003. His current research interests are neural networks, fuzzy systems, cellular neural networks, FNNs, neural engineering, algorithms and very-large-scale integration design for pattern recognition, intelligent control, and multimedia (including image/video and speech/audio) signal processing, and intelligent transportation system. He is the coauthor of the book Neural Fuzzy Systems- A Neuro-Fuzzy Synergism to

Intelligent Systems (Englewood Cliffs, NJ: Prentice-Hall 1996), and the author

of Neural Fuzzy Control Systems with Structure and Parameter Learning (Singapore, World Scientific). He has published over 75 journal papers in the areas of neural networks, fuzzy systems, multimedia hardware/software, and soft computing, including 56 IEEE journal papers.

(12)

Dr. Lin has won the Outstanding Research Award granted by National Science Council, Taiwan, since 1997 to the present; the Outstanding Electrical Engineering Professor Award granted by the Chinese Institute of Electrical Engineering, in 1997; the Outstanding Engineering Professor Award granted by the Chinese Institute of Engineering, in 2000; and the 2002 Taiwan Outstanding Information-Technology Expert Award. He was elected as one of 38th Ten Outstanding Rising Stars in Taiwan, R.O.C., in 2000. He currently serves as the Associate Editor of IEEE TRANSACTIONS ONCIRCUITS ANDSYSTEMS—I: REGULAR PAPERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESSBRIEFS, IEEE TRANSACTIONS ONSYSTEMS, MAN, CYBERNETICS—B, and IEEE TRANSACTIONS ONFUZZYSYSTEMS. He is also a member of the IEEE Circuit and Systems Society (CASS), the IEEE Neural Network Society, the IEEE Computer Society, the IEEE Robotics and Automation Society, and the IEEE System, Man, Cybernetics Society. He is the Distinguished Lecturer representing the NSATC of IEEE CASS from 2003 to 2005. He has been the Council member of International Fuzzy System Association, since 2000, the member of the Board of Government, of Asia Pacific Neural Network Assembly, since 2000, and the Executive Council member (Supervisor) of the Chinese Automation Association, since 1998. He is a member of Tau Beta Pi, Eta Kappa Nu, and Phi Kappa Phi.

Chun-Lung Chang received the B.S. degree in

automatic control engineering from the Feng-Chia University, Taichung, Taiwan, R.O.C., and the M.S. degree in power mechanical engineering from the National Tsing-Hua University, Hsinchu, Taiwan, R.O.C., in 1990 and 1992, respectively. He is currently working toward the Ph.D. degree in electrical and control engineering at National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

He is also a Researcher in Mechanical Industry Re-search Laboratories, Industrial Technology ReRe-search Institute, Hsinchu, Taiwan, R.O.C. His current research interests are neural net-works, fuzzy control, image processing, and computer vision.

Wen-Chang Cheng received the B.S. degree in

electronics engineering from National Cheng Kung University, Tainan, Taiwan, R.O.C., the M.S. degree in electronics engineering from National Chung Cheng University, Chiayi, Taiwan, R.O.C., in 1997 and 1999, respectively. He is currently working toward the Ph.D. degree in electrical and control engineering at National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

His current research interests include neuro-fuzzy systems, neural networks, image processing, machine learning, and artificial intelligence.