An entropy-based quantum neuro-fuzzy inference system for classification applications

(1)

Neurocomputing 70 (2007) 2502–2516

An entropy-based quantum neuro-fuzzy inference system for

classiﬁcation applications

Cheng-Jian Lin

a,

, I-Fang Chung

b

, Cheng-Hung Chen

c

a_{Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taiching country, Taiwan} b_{Institute of Bioinformatics, National Yang-Ming University, Taiwan}

c

Department of Electrical and Control Engineering, National Chiao-Tung University, Taiwan Received 13 May 2005; received in revised form 3 August 2006; accepted 7 August 2006

Available online 26 October 2006

Abstract

In this paper, an entropy-based quantum neuro-fuzzy inference system (EQNFIS) for classiﬁcation applications is proposed. The EQNFIS model is a ﬁve-layer structure, which combines the traditional Takagi-Sugeno-Kang (TSK). Layer 2 of the EQNFIS model contains quantum membership functions, which are multilevel activation functions. Each quantum membership function is composed of the sum of sigmoid functions shifted by quantum intervals. A self-constructing learning algorithm, which consists of the self-clustering algorithm (SCA), quantum fuzzy entropy, and the backpropagation algorithm, is also proposed. The proposed SCA method is a fast, one-pass algorithm that dynamically estimates the number of clusters in an input data space. Quantum fuzzy entropy is employed to evaluate the information on pattern distribution in the pattern space. With this information, we can determine the number of quantum levels. The backpropagation algorithm is used to tune the adjustable parameters. Simulations were conducted to show the performance and applicability of the proposed model.

Keywords: Classiﬁcation; Entropy-based fuzzy model; Quantum function; Self-clustering method; Neural fuzzy network

1. Introduction

Classification is one of the most frequent decision-making tasks performed by humans. A classification problem occurs when an object needs to be assigned to a predefined group or class based on the number of observed attributes related to that object. Many problems in business, science, industry, and medicine can be treated as classification problems. Traditional statistical classifica-tion procedures, such as discriminaclassifica-tion analysis, are built on the Bayesian decision theory[1]. In these procedures, an underlying probability model must be assumed in order to calculate the a posteriori probability upon which a classification decision is made. One major limitation of statistical models is that they work well only when the underlying assumptions are correct. The effectiveness of these methods depends to a large extent on the various

assumptions or conditions under which the models are developed. Users must have a good knowledge of both data properties and model capabilities before the models can be successfully applied.

Neural networks[17]have emerged as an important tool for classification tasks. The recent and vast research activities in neural classification have established that neural networks are promising alternatives to various conventional classification methods. However, it is difficult to understand the meaning associated with each neuron and each weight in the neural networks. A fuzzy entropy measure[9]is employed to partition the input feature space into decision regions and to select relevant features with good separability for the classification task. However, as compared with the neural networks, learning ability is lock of fuzzy logical. When the views above are summarized, it can be said that, in contrast to pure neural or fuzzy methods, the neural fuzzy method [3,6,13,14,16,20] pos-sesses the advantages of both neural networks and fuzzy systems. Neuro-fuzzy systems (NFS) bring the low-level

www.elsevier.com/locate/neucom

doi:10.1016/j.neucom.2006.08.008

Corresponding author.

(2)

learning and computational power of neural networks into fuzzy systems and give the high-level human-like thinking and reasoning of fuzzy systems to neural networks.

Two typical types of neuro-fuzzy systems are the Mamdani-type and Takagi–Sugeno–Kang (TSK)-type neuro-fuzzy systems. For Mamdani-type neuro-fuzzy systems[11,21], the minimum fuzzy implication is used in fuzzy reasoning. For TSK-type neuro-fuzzy systems

[4,5,19], the antecedent is deﬁned in the same way as the

Mamdani-type, while the consequent is a linear function of the input variables. Many researchers[4,5]have shown that using a TSK-type neuro-fuzzy system achieves superior performance in network size and learning accuracy than using Mamdani-type neuro-fuzzy systems.

Recently, quantum neural networks (QNNs) used to limit conventional neural networks (NNs) were developed

[2,7,15]. Conventional NNs and QNNs satisfy the

require-ments outlined in [10] for a universal function approx-imator. More speciﬁcally, QNNs can identify overlaps between data due to their ability to approximate any arbitrary membership proﬁle up to any degree of accuracy. However, QNNs and NNs are generally disadvantaged by their ‘‘black box’’ format, lack a systematic way to determine the appropriate model structure, have no localizability, and converge slowly.

In this paper, an entropy-based quantum neuro-fuzzy inference system (EQNFIS) is proposed. The EQNFIS model is a ﬁve-layer structure, which combines the traditional TSK. Layer 2 of the EQNFIS model contains quantum membership functions, which are multilevel activation functions. Each quantum membership function is composed of the sum of sigmoid functions shifted by quantum intervals. The quantum intervals add an addi-tional degree of freedom that can be exploited during the learning process to capture and quantify the structure of the input space.

A self-constructing learning algorithm for the EQNFIS is also proposed, as follows. First, a structure learning scheme is used to determine proper input space partitioning and to find the center of each cluster. Furthermore, we use quantum fuzzy entropy to determine the number of quantum levels, which reflect the actual distribution of classification patterns. Second, a supervised learning scheme is used to adjust the parameters to obtain the desired outputs. The proposed learning algorithm uses the self-clustering algorithm (SCA), quantum fuzzy entropy to perform structure learning, and the backpropagation algorithm to perform parameter learning. Finally, we evaluate the performance of the proposed EQNFIS model using two classification problems.

This paper is organized as follows. Section 2 describes the quantum membership function and the structure of the EQNFIS model. Section 3 describes the learning algorithm of the EQNFIS model. The self-clustering algorithm, quantum fuzzy entropy, and backpropagation algorithm are presented in this section. In Section 4, the EQNFIS model is used to classify the Iris data and the Wisconsin

breast cancer data to demonstrate its learning capability. We also compare our approach with other methods in the literature. Finally, conclusions are given in the last section. 2. The structure of the EQNFIS

The fuzzy if-then rule shown below is used by the EQNFIS: Rj : IF x1is Q1jand . . . and xnis Qnj THEN y is a0jþ Xn i¼1 aijxi, ð1Þ

where xi and y are the input and output variables,

respectively; Qij is the linguistic term of the precondition

part with quantum membership function m_Q_ij; a0j and aij

are the parameters of consequent part; n is the number of input dimensions; Rjis jth fuzzy rule.

The membership function of the precondition part discussed in this paper is different from the typical Gaussian membership function. We adopt the quantum membership function to approximate desired results. Therefore, the response of the jth quantum membership function for the ith feature vector can be written as Qij ¼ 1 nsij Xnsij r¼1 1 1 þ expðbðximijþ jyrijjÞÞ ! U ðxi; 1; mijÞ " þ expðbðximij jy r ijjÞÞ 1 þ expðbðximij jyrijjÞÞ ! U ðxi; mij; 1Þ # , ð2Þ where U ðxi; a; bÞ ¼ 1 if apxiob 0 otherwise

, b is the slope factor, yr_ijis the quantum interval, mijis the center of the quantum

membership function, and nsijis the number of levels in the

quantum membership function for the jth rule of the ith input. Therefore, we can describe the fuzzy if-then rule as follows:

Rj : IF x1is mðm1j; yr1j1jÞand . . . and xiis mðmij; yrijijÞand. . .

and xn is mðmnj; y rnj

njÞ

THEN y is a0jþa1jx1þ. . . þ aijxiþ. . . þ anjxn. ð3Þ

Fig. 1 shows the response of a three-level quantum

membership function.

The structure of the EQNFIS, which is systematized into n input variables, p-term nodes for each input variable, one output node, and n p membership function nodes, is shown inFig. 2. We shall introduce the operation functions of the nodes in each layer of the EQNFIS model. In the following description, u(l) denotes an output of a node in the lth layer.

Layer 1 (Input Node): No computation is done in this layer. Each node in this layer is an input node, which corresponds to one input variable and which only transmits input values to the next layer directly.

(3)

Layer 2 (Membership Function Node): Nodes in this layer correspond to one linguistic label of the input variables in layer 1 and a unit of memory. That is, the membership value specifying the degree to which an input value and a unit of memory belong to a fuzzy set is calculated in layer 2. The quantum membership function, the operation performed in layer 2 is

uð2Þ_ij ¼ 1 nsij Xnsij r¼1 1 1 þ expðbðuð1Þ_i mijþ jyrijjÞÞ ! " U ðuð1Þ_i ; 1; mijÞ þ expðbðu ð1Þ i mij jyrijjÞÞ 1 þ expðbðuð1Þ_i mij jyrijjÞÞ ! U ðuð1Þ_i ; mij; 1Þ # , ð5Þ where U ðxi; a; bÞ ¼ 1 if apxiob 0 otherwise

, b is the slope factor, yr_ijis the quantum interval, mijis the center of the quantum

membership function, and nsijis the number of levels in the

quantum membership function for the jth rule of the ith input.

Layer 3 (Rule Node): Nodes in this layer represent the preconditioned part of one fuzzy logic rule. They receive one-dimensional membership degrees of the associated rule from the nodes of a set in layer 2. Here, we use the product operator mentioned above to perform IF-condition match-ing of fuzzy rules. As a result, the output function of each inference node is uð3Þ_j ¼ Y i uð2Þ_ij ! , (6) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -30 -20 -10 0 10 20 30 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 30 20 10 0 -10 -20 -30 -30 -20 -10 0 10 20 30

a

b

(4)

where the Q_iuð2Þ_ij of a rule node represents the ﬁring strength of its corresponding rule.

Layer 4 (Consequent Node): Nodes in this layer are called consequent nodes. The input to a node in layer 4 is the output delivered from layer 3, and the other inputs are the input variables from layer 1, as depicted inFig. 2. For this kind of node, we have

uð4Þ_j ¼uð3Þ_j a0jþ Xn i¼1 aijxi ! , (7)

where the summation is over all the inputs and where aijare

the corresponding parameters of the consequent part. Layer 5 (Output Node): Each node in this layer corresponds to one output variable. The node integrates all the actions recommended by layers 3 and 4 and acts as a defuzziﬁer with y ¼ uð5Þ¼ Pp j¼1u ð4Þ j Pp j¼1u ð3Þ j ¼ Pp j¼1u ð3Þ j a0jþPni¼1aijxi Pp j¼1u ð3Þ j , (8) where p is the number of the fuzzy rule.

3. A learning algorithm for the EQNFIS model

In this section, we present a learning algorithm for the proposed EQNFIS model. The following two schemes are part of this learning algorithm. First, a structure learning scheme is used to determine proper input space partitioning and to find the center of each cluster. Furthermore, we use quantum fuzzy entropy to decide the number of quantum levels that reflect the actual distribution of classification patterns. Second, a supervised learning scheme is used to adjust the parameters for the desired outputs. The proposed learning algorithm uses the self-clustering algo-rithm (SCA), quantum fuzzy entropy to perform structure learning, and the backpropagation algorithm to perform parameter learning.

3.1. Structure learning

The ﬁrst step in structure learning is to determine the number of rules using the SCA from the training data, as well as to determine the number of fuzzy sets in the universal of discourse for each input variable, since one

y ∑n i=1 a₀₁+ … … … … Layer 5 (Output nodes) Layer 4 (Consequent nodes) Layer 3 (Rule nodes) Layer 2 (Membership function nodes) Layer 1 (Input nodes) x1 ai1xi ∑ n i=1 a02 + ai2xi ∑n i=1 a03 + ai3xi Q Q Q Q Q Q x2 Fig. 2. Structure of the proposed EQNFIS.

(5)

cluster in the input space corresponds to one potential fuzzy logic rule, with mijand yr_ijrepresenting the center and

the quantum interval, respectively. Simultaneously, we employ quantum fuzzy entropy to determine the appro-priate number of quantum levels. After the SCA, the quantum intervals and the number of quantum levels are determined. It is then easy to decide on the quantum membership function.

3.1.1. The self-clustering algorithm

Layer 2 of the EQNFIS model can be viewed as a function that maps input patterns. Hence, the discrimina-tive ability of these new features is determined by the centers of the quantum membership function. To achieve good classiﬁcation, centers are best selected based on their ability to provide large class separation.

A clustering method, called the SCA, is proposed to implement scatter partitioning of the input space. Without any optimization, the online SCA is a fast, one-pass algorithm for a dynamic estimation of the number of clusters in a set of data and for ﬁnding the current centers of clusters in the input data space. It is a distance-based connectionist-clustering algorithm. In any cluster, the maximum distance between a sample point and the cluster center is less than a threshold value which has been set as a clustering parameter and which would affect the number of clusters to be estimated. The notations in the SCA are described as follows:

Pi the ith input sample

C the cluster center Cj the jth cluster center

R the number of cluster

Cm the current sample Pi belongs to the cluster with

the minimum distance

Dc the diagonal distance of cluster C Wc the boundary width of cluster C Dthr the threshold value of the distance

Distij the distance between the current sample Piand the

jth cluster center

Distim the distance between the current sample Piand the

cluster center Cmwith the minimum distance

x, y x- and y-dimension in the diagram

In the clustering process, the data samples come from a data stream. The process starts with an empty set of clusters. When a new cluster is created, the cluster center, C, is deﬁned, and its cluster distance and cluster width, Dc and Wc, is initially set to zero. When more samples are presented one after another, some created clusters will be updated by changing the positions of their centers and increasing the cluster distances and cluster width. Which cluster will be updated and how much it will be changed depends on the position of the current sample in the input space. A cluster will not be updated any more when its cluster distance, Dc, reaches the value that is equal to the threshold value Dthr. In the clustering process, the

thresh-old parameter Dthr is an important parameter. A low

threshold value leads to the learning of coarse clusters (i.e., less rules are generated), whereas a high threshold value leads to the learning of ﬁne clusters (i.e., more rules are generated). Therefore, the selection of the threshold value Dthr will critically affect the simulation

results, and the value will be based on practical experi-mentation or on trial-and-error tests. Generally, Dthris set

from 0.5 to 1 time the summation of the samples variance in this study.

In this paper, we use two-dimensional feature spaces as an example to explain the proposed clustering algorithm.

. . ..C1 . Dc1=0 P₁ C1 Wc21=0 Wc11=0 P8 C3 Dc3=0 Wc13=0 Wc23=0 Wc21P5 Wc11 P6 C1 Dc2 Wc22 Wc12 C2 P7 Dc1 C3 Wc21 Dc1 P9 Wc11 C2 P1C₁_Wc 11 _P 2 Dc1 Wc21 P4 Wc22=0 Wc12=0 Dc2=0 C2 P3

a

b

c

d

Fig. 3. A brief clustering process using the SCA with samples P1to P9in 2-D space. (a) The sample P1causes the SCA to create a new cluster center C1. (b) P2: update cluster center C1, P3: create a new cluster center C2, P4: do nothing. (c) P5: update cluster C1, P6: do nothing, P7: update cluster center C2, P8: create a new cluster C3. (d) P9: update cluster C1.

(6)

Fig. 3 brieﬂy shows the SCA clustering process in two-input space. The SCA is described as follows.

Step 1: We have to disarrange the order of the original data samples by randomization. Create the first cluster by simply taking the position of the first sample from the input stream as the first cluster center C1, and setting its cluster

distance Dc1and cluster width Wc11and Wc21to zero, as

shown inFig. 3(a).

Step 2: If all samples of the data stream have been processed, the algorithm is ﬁnished. Otherwise, the current input sample, Pi, is taken and the distances between this

sample and all R already created cluster centers Cj,

Distij¼ jjPiCjjj, j ¼ 1,2,y,R, are calculated.

Step 3: If there is any distance value Distijequal to, or

less than, at least one of the distance Dcj, j ¼ 1,2,y,R, it

means that the current sample Pi belongs to a cluster Cm

with the minimum distance

Distim¼ jjPiCmjj ¼minðjjPiCjjjÞ; j ¼ 1; 2; . . . ; R.

(9) In this case, neither a new cluster is created, nor any existing cluster is updated, as in the cases of P4 and P6

shown inFig. 3, for example. The algorithm then returns to Step2. Otherwise, the algorithm goes to the next step.

Step 4: Find a cluster with center Cmand cluster distance

Dcm from all R existing cluster centers by calculating the

values Sij¼WcijþDcj, j ¼ 1,2,y,R, and then choosing

the cluster center Cmwith the minimum value Sim:

Sim ¼WcimþDcm¼minðSijÞ; j ¼ 1; 2; . . . ; R. (10)

In Eq. (9), the maximum distance from any cluster center to the samples that belong to this cluster is not greater than the threshold, Dthr, though the algorithm does not keep

any information of passed samples. However, we ﬁnd that the formulation only considers the distance between the input data and cluster center in Eq. (10). But the special situation shows that the distances between a given point P10 and both cluster centers Dist10,1 and Dist10,2 are the

same as shown inFig. 4. In the aforementioned technique, the cluster C2, which has small dimension distances Dc2,

will be selected to expand according to Eq. (10). However, this causes a problem in that the cluster numbers increase

quickly. To avoid this problem, we make a judgment, as follows:

If (the distance and Dist10,1is equal to the distance and

Dist10,2)

and (Dc14Dc2)

Then Dcm¼Dc1.

From the above rule, we ﬁnd that when the distances between the input data and both clusters are the same, the formulation will choose the cluster that has large dimen-sion distances Dc1.

Step 5: If Simis greater than Dthr, the sample Pidoes not

belong to any existing clusters. A new cluster is created in the same way as described in Step 1, as in the cases of P3and P8shown inFig. 3, and the algorithm returns to

Step 2.

Step 6: If Sim is not greater than Dthr, the cluster Cmis

updated by moving its center, Cm, and increasing the value

of its cluster distance, Dcm, and cluster width Wc1m, Wc2m.

The parameters are updated by the following equation: Wcnew_1m ¼ðjjCm_xPi_xjj þWc1mÞ

2 , (11)

Wcnew_2m ¼ðjjCm_yPi_yjj þWc2mÞ

2 , (12)

Cnew_{m_x}¼ jjPi_xDnew1mjj, (13)

Cnew_{m_y}¼ jjPi_yDnew2mjj, (14)

Dcnew

m ¼Sim=2, (15)

where Cm_xis the value of the x dimension for Cm, Cm_yis

the value of the y dimension for Cm, Pi_xis the value of the

x dimension for Pi, and Pi_yis the value of the y dimension

for Pi, as in the cases of P2, P5, P7, and P9shown inFig. 3.

The algorithm returns to Step 2.

In this way, the maximum distance from any cluster center to the samples that belong to this cluster is not greater than the threshold value Dthr, though the algorithm

does not keep any information of passed samples. After that, the number of rules, the center and the quantum interval of the quantum membership function are deﬁned by the following equation:

mij¼Cj; j ¼ 1; 2; . . . ; R, (16)

yr_ij¼ 1 ððnsijþ1Þ=2Þ

rDj;

r ¼ 1; 2; . . . ; nsij; j ¼ 1; 2; . . . ; R ð17Þ

R ¼ the number of clusters (18) 3.1.2. Quantum fuzzy fntropy

After that, the center and the quantum interval of the quantum membership function are determined. The number of quantum levels in each dimension has

. . Dc1 Dist10,1 C1 P10 C2 Dc2 Dist10,2

(7)

a profound effect on learning efficiency and classification accuracy. If the number of quantum levels is too large, it will take too long to finish the training and classification processes, and overfitting may result. On the other hand, if the number of quantum levels is too small, the size of each decision region may be too big to fit the distribution of input patterns, and classification performance may suffer.

Therefore, the selection of the optimal number of quantum levels is an important task. In this subsection, we will investigate a systematic method to select the appropriate number of quantum levels. The proposed criterion is based on quantum fuzzy entropy, since it has the ability to reﬂect the actual distribution of pattern space.

Fig. 5brieﬂy shows distribution of the pattern space for a

cluster after the SCA clustering process in two-input space, and that describes our proposed quantum fuzzy entropy of the quantum interval for each dimension of the cluster. The steps involved in selecting the quantum level number for each dimension of the each cluster are described as follows: Step 1: Set the initial number of quantum levels ns to 1, i.e. the number of quantum levels is equal to one.

Step 2: Locate the centers and the quantum intervals. The self-clustering algorithm will be used to locate the center and the quantum interval of each cluster.

Step 3: Assign a quantum membership function to each cluster. In order to apply quantum fuzzy entropy to calculate the distribution information of patterns in a cluster, we have to assign a quantum membership function to each cluster.

Step 4: Compute the total quantum fuzzy entropy for all clusters in each dimension for ns ¼ 1 and 2. We compute the quantum fuzzy entropy for all clusters in each dimension to obtain the distribution information of patterns projected in this dimension. Quantum fuzzy entropy is deﬁned as follows:

(1) Let X ¼ {x1, x2, y, xnc} be a classiﬁcation set with

elements xidistributed in a pattern space, where i ¼ 1;

2;y; nc.

(2) Let ~Q be a quantum fuzzy set deﬁned in the quantum interval of a pattern space. The mapped quantum

membership degree of the element xiwith the quantum

fuzzy set ~Q is denoted by m_Q~ðxiÞ.

(3) Let CL1;CL2;y;CLprepresent p classes into which the

n elements are divided.

(4) Let TCLjðxncÞ denote a set of elements of class j in the

cluster X. It is a subset of the cluster X.

(5) The sub-degree SDjwith the quantum fuzzy set ~Q for

the elements of class j in the quantum interval, where j ¼ 1; 2; y; p, is deﬁned as SDj¼ P x2TCLjðxncÞmQ~ðxÞ P x2Xm_Q~ðxÞ . (19)

(6) The quantum fuzzy entropy QF ECLjð ~QÞ of the elements

of class j in the quantum interval is deﬁned as

QF ECLjð ~QÞ ¼ SDjlog SDj. (20)

(7) The quantum fuzzy entropy QFEð ~QÞ in the cluster X for the elements within the quantum interval is deﬁned as

QFEð ~QÞ ¼X

p

j¼1

QF ECLjð ~QÞ. (21)

(8) In this step, we can compute the quantum fuzzy entropy for the quantum levels ns ¼ 1 and ns ¼ 2, as shown inFig. 6.

Step 5: If the total quantum fuzzy entropy of ns+1 quantum levels is less than that of ns quantum levels, then ns ¼ ns+1. Then go to Step 2. Otherwise, go to Step 6.

Step 6: The term ns represents the number of quantum levels in a speciﬁed dimension. Since the quantum fuzzy entropy does not decrease, we stop increasing the quantum level in this dimension, and we let ns be the number of quantum levels in this dimension.

3.1.3. The parameter learning

After the network structure is determined by the SCA, the network then enters the parameter learning phase to

8 7.5 7 6.5 x2 6 5.5 5 4.5 4 8 8.5 7.5 7 6.5 x2 6 5.5 5 1 2 3 4 x1 5 6 7 c1 c2 c3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1

(8)

8 8.5 7.5 7 6.5 x2 6 5.5 5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1 8 8.5 7.5 7 6.5 x2 6 5.5 5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5

a

b

Fig. 6. Th e patte rn distribu tion with corres pondin g quan tum me mbership functio n. (a) Th e n u mber of quantum levels is one; (b) the numbe r o f quan tum le ve ls is two.

(9)

adjust the parameters of the network based on the training patterns. The learning process involves minimizing a given cost function. The gradient of the cost function is computed and adjusted along the negative gradient. The backpagation algorithm is used for this supervised learning method. When we consider the single output case for clarity, our goal to minimize the cost function E is

deﬁned as E ¼1

2½y y

d2_, ₍₂₂₎

where ydis the desired output and y is the current output. Then the parameter learning algorithm based on back-propagation is described as follows:

The error term to be propagated is calculated as de¼

qE qy¼y

d_y. ₍₂₃₎

The parameter of consequent part is updated by the amount Da0j¼ qE qa0j ¼ qE quð5Þ _quð5Þ quð4Þ_j " # quð4Þ_j qa0j " # ¼ deu ð3Þ j Pp j¼1u ð3Þ j (24) and Daij¼ qE qaij ¼ qE quð5Þ quð5Þ quð4Þ_j " # quð4Þ_j qaij " # ¼ deu ð3Þ j xi Pp j¼1u ð3Þ j . (25) The parameter of consequent part in the output layer is updated according to the following equation:

a0jðt þ 1Þ ¼ a0jðtÞ þ ZaDa0j, (26)

aijðt þ 1Þ ¼ aijðtÞ þ ZaDaij, (27)

where factor Za is the learning rate parameter of the

parameter and t denotes the jth iteration number. The output error (i.e., the difference between the desired output and the current output) is then backpropagated to the quantum function neurons of the hidden layer to update their centers and quantum intervals. According to the chain rule, the updated center is as follows:

The centers and quantum intervals of the quantum function neurons in this layer are updated as follows: mijðt þ 1Þ ¼ mijðtÞ þ ZmDmij, (31)

yr_ijðt þ 1Þ ¼ yr_ijðtÞ þ Z_yDyr_ij, (32) where Zm and Zy are the learning rate parameters of the

center and the quantum interval of the quantum function neurons, respectively.

4. Illustrative examples

In this section, we evaluate the performance of the proposed EQNFIS model using two better-known bench-mark data sets used for classiﬁcation. The ﬁrst example uses the Iris data, and the second example uses the Wisconsin breast cancer data. These two data sets are available from the University of California, Irvine, via the ftp address

ftp://ftp.ics.uci.edu/pub/machine-learning-da-tabases, which is an anonymous site.

In the following simulations, the parameters and number of training epochs were based on the desired accuracy. In short, the trained EQNFIS model was stopped once its high learning efﬁciency was demonstrated.

Dmij¼ qE qmij ¼ qE quð5Þ quð5Þ qmij ¼de ða0jþPni¼1aijxiÞ Ppj¼1u ð3Þ j Pp j¼1ðu ð3Þ j ða0jþPni¼1aijxiÞÞ ðPp_j¼1uð3Þ_j Þ2 " # Yp j¼1 iaj Q_ij 1 nsij Xnsij r¼1 b ðexpðb ðximijþ jy r ijjÞÞÞ ð1 þ expðb ðximijþ jyrijjÞÞÞ 2 " [ ðxi; 1; mijÞ þ b ðexpðb ðximijþ jyrijjÞÞÞ ð1 þ expðb ðximijþ jyrijjÞÞÞ 2[ ðxi; mij; 1Þ # . ð28Þ The updated quantum interval is as follows:If yr_ijX0, then

Dyr_ij¼ qE qyr_ij¼ qE quð5Þ _quð5Þ qyr_ij " # ¼de ða0jþPni¼1aijxiÞ Pp_j¼1uð3Þj Pp j¼1ðu ð3Þ j ða0jþPni¼1aijxiÞÞ ðPp_j¼1uð3Þ_j Þ2 " # Yp j¼1 iaj Q_ij 1 nsij b ðexpðb ðximijþy r ijÞÞÞ ð1 þ expðb ðximijþyrijÞÞÞ 2 " [ ðxi; 1; mijÞ b ðexpðb ðximijþyrijÞÞÞ ð1 þ expðb ðximijþyrijÞÞÞ 2[ ðxi; mij; 1Þ # ð29Þ else yr_ijo0 Dyr_ij¼ qE qyr_ij¼ qE quð5Þ quð5Þ qyr_ij " # ¼de a0jþPni¼1aijxi

Pp_j¼1uð3Þ_j Pp_j¼1ðuð3Þ_j ða0jþPni¼1aijxiÞÞ

ðPp_j¼1ðuð3Þ_j Þ2 " # Yp j¼1 iaj Q_ij 1 nsij b ðexpðb ðximijy r ijÞÞÞ ð1 þ expðb ðximijyrijÞÞÞ 2 " [ ðxi; 1; mijÞ þ b ðexpðb ðximijyrijÞÞÞ ð1 þ expðb ðximijyrijÞÞÞ 2[ ðxi; mij; 1Þ # . ð30Þ

(10)

4.1. Example 1: Iris data classification

The Fisher–Anderson iris data consists of four input measurements—sepal length (sl), sepal width (sw), petal length (pl), and petal width (pw)—of 150 specimens of the iris plant. Three species of the iris were used: Iris sestosa, Iris versiolor, and Iris virginica. Fifth instances of each species were included. The measurements are shown inFig. 7.

In the Iris data experiment, 25 instances with four features from each species were randomly selected as the training set (i.e., a total of 75 training patterns were used as the training data set), and the remaining instances were used as the testing set. The 75 training patterns were obtained via a random selection process from the original Iris dataset of 150 patterns. For the SCA, we chose the parameter Dthr¼4.5. Furthermore, we determined the

different number of quantum levels for each dimension of each cluster using quantum fuzzy entropy and tabulated them in Table 1. After structure learning, three clusters were generated.

The network then entered the parameter learning phase. We set the learning rate to Z ¼ 0.01 and trained the EQNFIS model with different quantum levels for each dimension of each cluster. After 100 training steps, the ﬁnal root-mean-square (RMS) error was 0.0138. Three

fuzzy logic rules were generated. The three designed fuzzy rules were: Rule 1: IF sl is m(6.38;0.45,0.77,1.12,1.50,1.87) and sw is m(2.64;0.17,0.30,0.45,0.62,0.79) and pl is m(5.77;0.91,1.46) and pw is m(1.77;0.44) THEN y1is 4.9+0.27sl-0.36sw-0.03pl+0.46pw and y2 is 0.20-0.36sl+0.32sw+0.57pl-0.78pw and y3 is 0.91-0.20sl-0.33sw+0.43pl-0.51pw Rule 2: IF sl is m(5.52;1.09) and sw is m(2.89;0.26,0.40) and pl is m(4.09;1.37) and pw is m(1.29;0.29,0.55) THEN y1is 0.49-0.01sl+0.45sw-0.04pl-0.15pw and y2 is 0.27+0.41sl-0.67sw+0.05pl-0.33pw and 8 7.5 7 6.5 6 5.5 5 4.5 4

Sepal length _{Sepal length}

0 50 100 150 Sample 7 6 5 Petal length 4 3 2 1 0 50 Sample 100 150 2.5 2 1.5 1 0.5 0 Petal width 0 50 100 150 Sample 0 50 100 150 Sample 4.5 4 3.5 3 2.5 2

Fig. 7. Iris data: Iris sestosa (W), Iris versiolor (J), and Iris virginica (&). Table 1

The number of quantum levels for each dimension of cluster No. of ns cluster Dimension

#1 #2 #3 #4

#1 #2

(11)

y3 is 0.65+0.20sl+0.27sw+0.12pl-0.67pw Rule 3: IF sl is m(4.90;0.90) and sw is m(3.30;1.36) and

pl is m(1.29;0.58) and pw is m(0.18;-0.53) THEN y1is 0.03+0.28sl-0.22sw-0.30pl-0.79pw and

y2 is 0.07+0.79sl-0.57sw-0.68pl-0.72pw and y3 is 0.63+0.62sl-0.71sw-0.89pl+0.09pw

Fig. 8(a)–(f) show the distribution of the training

patterns and the ﬁnal assignment of the fuzzy rules (i.e., the distribution of the input membership functions). The boundary of each rectangle represents a rule with a ﬁring strength of 0.5. We compared the testing accuracy of our model with that of other methods—the traditional

5.5 5 4.5 4 Sepal Width 3.5 3 2.5 2 1.5 1 Petal Width 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 Petal Width 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 3 4 5 6 7 8 9 Sepal Length 8 7 6 5 4 3 2 1 0 Petal Length 8 7 6 5 4 3 2 1 0 Petal Length 3 4 5 6 7 8 9 Sepal Length 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Sepal Width 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Sepal Width 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 Petal Width 3 4 5 6 7 8 9 Sepal Length 3 2 1 0 4 5 6 7 8 Petal Length

a

b

c

d

e

f

Fig. 8. The distribution of input training patterns and ﬁnal assignment of three rules. (a) For the Sepal Length and Sepal Width dimensions. (b) For the Petal Length and Petal Width dimensions. (c) For the Sepal Length and Petal Length dimensions. (d) For the Sepal Width and Petal Width dimensions. (e) For the Sepal Width and Petal Length dimensions. (f) For the Sepal Length and Petal Width dimensions.

(12)

multiplayer neural network (NN) with 12 hidden nodes and 84 parameters, the standard radial basis function network (RBFN) with the SCA including 8 hidden nodes and 88 parameters, and the EQNFIS with 3 fuzzy rules and 80 parameters—using the same quantum levels (ns ¼ 2, 3 and 5) for each dimension of each cluster. Five experiments were used. These experiments calculated the classiﬁcation accuracy and the values of the average produced on the testing set using the traditional multiplayer NN, the RBFN with the SCA, the EQNFIS model, using 2, 3, and 5 quantum levels, and the proposed EQNFIS with quantum fuzzy entropy.

During the learning phase, 100 epochs of training were performed. The learning curves from the proposed EQNFIS model with quantum fuzzy entropy, the EQNFIS with the three quantum levels for each dimension of each cluster, and the RBFN with the SCA model are shown in

Fig. 9. The ﬁgure reveals a smaller rms error and a faster

convergence for the EQNFIS model compared to the RBFN model. In this example, five experiments were used. These experiments have different orders of training samples. Table 2 shows that the experiments with the EQNFIS model for five different orders of data samples, having an accuracy percentage ranging from 96% to 98.67%. The means of re-substitution accuracy was 97.33%. The average classification accuracy of the EQNFIS model with quantum fuzzy entropy was better than that of other methods. InTable 3, we compared the learning speed (i.e., CPU time) of the EQNFIS model with those of the NN and RBFN. The average learning times of the EQNFIS, NN and RBFN were 1.9843, 2.127 and

4.6219 s, respectively. The average learning time was measured on a personal computer with an Intel Pentium 4 (2500 MHz) CPU inside.Table 4 shows the comparison

0.06 0.05 RMS error 0.04 0.03 0.02 0.01 0 10 20 30 40 50 60 70 80 90 100 Epochs

EQNFIS with quantum fuzzy entropy EQNFIS (3)

RBFN

Fig. 9. Learning curves of the EQNFIS with quantum fuzzy entropy, the EQNFIS with the three quantum levels for each dimension of each cluster, and the RBFN with the SCA model.

Table 2

Classiﬁcation accuracy using various methods for the Iris data Experiment # Model

Neural network RBFN with SCA EQNFIS (2) EQNFIS (3) EQNFIS (5) EQNFIS with quantum fuzzy entropy 1 2 92 93.33 94.67 96 94.67 96 3 97.33 94.67 94.67 96 96 97.33 4 97.33 98.67 97.33 97.33 97.33 98.67 5 94.67 94.67 94.67 96 94.67 96 Average (%) 95.47 96 96 96.8 96.27 97.33 Table 3

The average learning time using various methods for the Iris data Experiment # Model Neural network RBFN with SCA EQNFIS with quantum fuzzy entropy

1 4.6563 2.1406 1.9688 2 4.6094 2.1250 2.0156 3 4.5781 2.0781 2.0313 4 4.6250 2.1563 1.9219 5 4.6406 2.1350 1.9844 Average (second) 4.6219 2.1270 1.9843 Table 4

Average re-substitution accuracy comparison of various models for the Iris data classiﬁcation problem

Methods Average re-substitution accuracy (%)

FEBFC[9] 96.91 SANFIN[20] 97.33 FMMC[18] 97.3 FUNLVQ+GFENCE[8] 96.3 Wu and Chen’s[22] 96.21 EQNFIS 97.33

(13)

of the classification results of the EQNFIS model with other classifiers [8,9,18,20,22]on the Iris data. The results show that the average classification accuracy of the EQNFIS model is better than other methods.

4.2. Example 2: Wisconsin breast cancer diagnostic data The Wisconsin breast cancer diagnostic data set contains 699 patterns distributed into two output classes, benign and malignant. Each pattern consists of nine input features: clump thickness, uniformity of cell size, unifor-mity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. A total of 458 patterns are in the benign class and the other 241 patterns are in the malignant class. Since there were 16 patterns containing missing values, we used 683 patterns to evaluate the performance of the proposed EQNFIS model. To compare the performance with other models, we used half of the 683 patterns as the training set and the remaining patterns as the testing set.

Experimental conditions were the same as the previous experiment. We also used half of the original data patterns as the training data (randomly selected) and the remaining patterns as the testing data. For the SCA, we chose the parameter Dthr¼35. Furthermore, we determined the

different number of quantum levels for each dimension of each cluster using quantum fuzzy entropy and tabulated in

them Table 5. After the structure learning phase, two

clusters were generated.

The network then entered the parameter learning phase. We set the learning rate to Z ¼ 0:05 and trained the EQNFIS model with different quantum levels for each dimension of each cluster. Five experiments also were used. These experiments calculated the classiﬁcation accuracy and the values of the average produced on the testing set by the neural network with 7 hidden nodes and 77 parameters, the RBFN with the SCA including 4 hidden nodes and 80 parameters, the EQNFIS model with 2 fuzzy rules and 77 parameters using 1 and 2 quantum levels, and the proposed EQNFIS with quantum fuzzy entropy. During the supervised learning phase, 100 epochs of training were performed. Fig. 10 shows the learning curves from the proposed EQNFIS model with quantum fuzzy entropy, the EQNFIS with the two quantum levels for each dimension of each cluster, and the RBFN with the SCA model. Our model can obtain a smaller rms error and converge more quickly.

Table 6 shows that the experiments with the EQNFIS

model with quantum fuzzy entropy result in high accuracy, with an accuracy percentage ranging from 97.37% to

Table 6

Classiﬁcation accuracy for the Wisconsin breast cancer diagnostic data Experiment # Model

Neural network RBFN with SCA EQNFIS(1) EQNFIS(2) EQNFIS with quantum fuzzy entropy 1 96.49 95.32 97.37 97.37 97.66 2 97.08 95.61 97.66 98.54 98.54 3 94.44 93.86 97.37 97.37 97.37 4 97.37 94.74 97.66 97.37 97.37 5 96.49 94.74 97.66 97.37 97.66 Average (%) 96.37 94.85 97.54 97.6 97.72 0.1 0.08 RMS error 0.06 0.04 0.02 0 0 10 20 30 40 50 60 70 80 90 100 Epochs

EQNFIS with quantum fuzzy entropy EQNFIS (2)

RBFN

Fig. 10. Learning curves from the EQNFIS with quantum fuzzy entropy, the EQNFIS with the two quantum levels for each dimension of each cluster, and the RBFN with the SCA model.

Table 5

The number of quantum level for each dimension of cluster No. of ns cluster Dimension

#1 #2 #3 #4 #5 #6 #7 #8 #9

#1 1 1 1 1 1 2 1 1 1

(14)

98.54%. The means of re-substitution accuracy was 97.72%. The average classification accuracy of the EQNFIS model with quantum fuzzy entropy was better than that of other methods.Table 7shows the CPU time of the cost of the EQNFIS model, the NN and RBFN. The average learning times of the EQNFIS, NN and RBFN were 5.9781, 6.1094 and 9.5375 s, respectively. We com-pared the testing accuracy of our model with that of other methods [9,12,13,17,20]. Table 8 shows the comparison between the learned EQNFIS models and other fuzzy logic system, neural network, and neuro-fuzzy classifiers. The average classification accuracy of the EQNFIS model was better than that of other methods.

5. Conclusion

In this paper, an entropy-based quantum neuro-fuzzy inference system (EQNFIS) was proposed for classification applications. The EQNFIS model is a five-layer structure, which combines the traditional Takagi–Sugeno–Kang (TSK). A self-constructing learning algorithm, which consists of the self-clustering algorithm (SCA), quantum fuzzy entropy, and the backpropagation algorithm, was also proposed. The advantages of the proposed EQNFIS model are summarized as follows: (1) it converges quickly; (2) it uses an online, fast, and one-pass self-constructing learning algorithm; (3) it has much lower rms error; and (4) it has a higher accuracy classification rate than other models. Finally, simulation results have shown that the

average classiﬁcation accuracy of the EQNFIS model is better than other methods. In addition to the simulations done in this paper, the proposed EQNFIS model has been used to solve face detection problem from color images in our laboratory.

Acknowledgement

This work was supported by National Science Council, R.O.C., under Grant no. NSC94-2218-E-324-004.

References

[1] P.O. Duda, P.E. Hart, Pattern Classiﬁcation and Scene Analysis, Wiley, New York, 1973.

[2] L. Fei, Z. Shengmei, Z. Baoyu, Quantum neural network in speech recognition, Proc. IEEE Int. Conf. Signal Process. 6 (2000) 1267–1270.

[3] S. Halgamuge, M. Glesner, Neural networks in designing fuzzy systems for real world applications, Fuzzy Sets Syst. 65 (1994) 1–12. [4] J.-S.R. Jang, ANFIS: adaptive-network-based fuzzy inference

sys-tem, IEEE Trans. Syst. Man and Cybern. 23 (1993) 665–685. [5] C.F. Juang, C.T. Lin, An on-line self-constructing neural fuzzy

inference network and its applications, IEEE Trans. Fuzzy Syst. 6 (1) (1998) 12–31.

[6] N. Kasabov, Learning fuzzy rules and approximate reasoning in fuzzy neural networks and hybrid systems, Fuzzy Sets Syst. 82 (1996) 135–149.

[7] R. Kretzschmar, R. Bueler, N.B. Karayiannis, F. Eggimann, Quantum neural networks versus conventional feedforward neural networks: an experimental study, Proc. IEEE Int. Conf. Signal Process. 1 (2000) 328–337.

[8] H.M. Lee, A neural network classiﬁer with disjunctive fuzzy information, Neural Networks 11 (6) (1998) 1113–1125.

[9] H.M. Lee, C.M. Chen, J.M. Chen, Y.L. Jou, An efﬁcient fuzzy classiﬁer with feature selection based on fuzzy entropy, IEEE Trans. Syst., Man Cybern. B 31 (2001) 426–432.

[10] M. Leshno, V.Y. Lin, A. Pinkus, S. Schocken, Multilayer feedfor-ward networks with a nonpolynomial activation function can approximate any function, Neural Networks 6 (6) (1993) 861–867. [11] C.J. Lin, C.T. Lin, An ART-based fuzzy adaptive learning control

network, IEEE Trans. Fuzzy Syst. 5 (4) (1997) 477–496.

[12] B.C. Lovel, A.P. Bradley, The multiscale classiﬁer, IEEE Trans. Pattern Anal. Machine Intell. 18 (1996) 124–137.

[13] D. Nauck, R. Kruse, A neuro-fuzzy method to learn fuzzy classiﬁcation rules from data, Fuzzy Sets Syst. 89 (1997) 277–288. [14] S. Paul, S. Kumar, Subsethood-product fuzzy neural inference system

(SuPFuNIS), IEEE Trans. Neural Networks 13 (3) (2002) 578–599. [15] G. Purushothaman, N.B. Karayiannis, Quantum neural networks

(QNNs): inherently fuzzy feedforward neural networks, IEEE Trans. Neural Networks 8 (3) (1997) 679–693.

[16] M. Russo, FuGeNeSys—A fuzzy genetic neural system for fuzzy modeling, IEEE Trans. Fuzzy Syst. 6 (1998) 373–388.

[17] R. Setiono, H. Liu, Neural-network feature selector, IEEE Trans. Neural Network 8 (3) (1997) 654–662.

[18] P.K. Simpson, Fuzzy min-max neural networks-Part I: Classiﬁcation, IEEE Trans. Neural Networks, 3 (1992) 776–786.

[19] T. Takagi, M. Sugeno, Fuzzy identiﬁcation of systems and its applications to modeling and control, IEEE Trans. Syst. Man Cybern. SMC-15 (1985) 116–132.

[20] J.S. Wang, C.S. George Lee, Self-adaptive neuro-fuzzy inference systems for classiﬁcation applications, IEEE Trans. Fuzzy Syst. 10 (6) (2002) 790–802.

[21] L.X. Wang, J.M. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Sys. Man Cybern. 22 (6) (1992) 1414–1427. Table 7

The average learning time using various methods for the Wisconsin breast cancer diagnostic data

Experiment # Model Neural network RBFN with SCA EQNFIS with quantum fuzzy entropy

1 9.5938 6.1094 6.0156 2 9.4531 6.0469 6.0781 3 9.5469 6.0625 5.8750 4 9.5156 6.0938 5.9688 5 9.5781 6.2344 5.9531 Average (second) 9.5375 6.1094 5.9781 Table 8

Average accuracy comparison of various models for the wisconsin breast cancer diagnostic data

Models Average accuracy (%)

NNFS[17] 94.15 FEBFC[9] 95.14 NEFCLASS[13] 92.7 SANFIS[20] 96.3 MSC[12] 94.9 EQNFIS 97.72

(15)

[22] T.P. Wu, S.M. Chen, A new method for constructing membership functions and fuzzy rules from training examples, IEEE Trans. Syst. Man. Cybern. B 29 (1999) 25–40.

Cheng–Jian Lin received the B.S. degree in electrical engineering from Ta-Tung University, Taiwan, R.O.C., in 1986 and the M.S. and Ph.D. degrees in electrical and control engineering from the National Chiao-Tung University, Taiwan, R.O.C., in 1991 and 1996. From April 1996 to July 1999, he was an Associate Professor in the Department of Electronic Engineering, Nan-Kai College, Nantou, Taiwan, R.O.C. Since August 1999, he has been with the Department of Computer Science and Information Engineering, Chaoyang University of Technology. Currently, he is a Professor of Computer Science and Information Engineering Department, Chaoyang University of Technol-ogy, Taichung, Taiwan, R.O.C. He served as the chairman of Computer Science and Information Engineering Department from 2001 to 2005. His current research interests are neural networks, fuzzy systems, pattern recognition, intelligence control, bioinformatics, and FPGA design. He has published more than 100 papers in the referred journals and conference proceedings. Dr. Lin is a member of the Phi Tau Phi. He is also a member of the Chinese Fuzzy Systems Association (CFSA), the Chinese Automation Association, the Taiwanese Association for Artiﬁcial Intelligence (TAAI), the IEICE (The Institute of Electronics, Information and Communication Engineers), and the IEEE Computational Intelli-gence Society. He is an executive committee member of the Taiwanese Association for Artiﬁcial Intelligence (TAAI). Dr. Lin currently serves as the Associate Editor of International Journal of Applied Science and Engineering.

I–Fang Chung received the B. S. and M. S. degrees in control engineering from the National Chiao-Tung University (NCTU), Taiwan, in 1993 and 1995, respectively. He received the Ph.D. degree in Electrical and Control Engineer-ing from NCTU in 2000. From 2000 to 2003, he was a Research Assistant Professor in Electrical and Control Engineering, NCTU. During 2003 to 2004, he worked as a postdoctoral fellow in the Institute of Medical Science, the laboratory of DNA information analysis of Human Genome Center of Tokyo University in Japan. Since 2004, he has served an assistant professor at the Institute of Bioinformatics, National Yang-Ming University, Taiwan. His current research interests are bioinformatics, machine learning, biomedical engineering, biomedical signal processing, and fuzzy neural networks.

Cheng–Hung Chen was born in Kaohsiung, Taiwan, R.O.C. in 1979. He received the B.S. and M.S. degree in computer science and information engineering from the Chaoyang University of Technology, Taiwan, R.O.C., in 2002 and 2004. He is currently pursuing the Ph.D. degree in electrical and control engineering from the National Chiao-Tung University, Tai-wan, R.O.C. His current research interests are neural networks, fuzzy logic systems, intelligence control and pattern recognition.