Engineering Applications of Artificial Intelligence 16 (2003) 709–716
Elicitation of classification rules by fuzzy data mining
Yi-Chung Hu
a,*, Gwo-Hshiung Tzeng
baDepartment of Business Administration, Chung Yuan Christian University, Chung-Li 320, Taiwan, ROC bInstitute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan, ROC
Received25 November 2002; receivedin revisedform 22 May 2003; accepted18 September 2003
Abstract
Data mining techniques can be used to find potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based on the partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to findfuzzy if–then rules for classification problems: one to findfrequent fuzzy grids by using a pre-specified simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classification rules from frequent fuzzy grids. To improve the classification performance of the proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the confidence of each classification rule. For classification generalization ability, the simulation results from the iris data demonstrate that the proposed methodmay effectively derive fuzzy classification rules from training samples.
r2003 Elsevier Ltd. All rights reserved.
Keywords: Data mining; Fuzzy sets; Classification problems; Association rules
1. Introduction
Pattern classification is a problem that partitions a pattern space into classes andthen assigns a pattern to
one of those classes (Kim andBang, 2000). In fact,
classification problems have playedan important role in industrial engineering such as the group technology (Chuang et al., 1999), in engineering applications, such
as OCR recognition andfacial recognition (Kim and
Bang, 2000).
Data mining is the exploration andanalysis of the
data in order to discover meaningful patterns (Berry and
Linoff, 1997). The aim of this paper is to propose a fuzzy data mining method that can automatically find a set of fuzzy if–then rules for classification problems. Actually, data mining problems involving classification can be viewedwithin a common framework of rule
discovery (Agrawal et al., 1993a). The advantage for
mining fuzzy if–then rules for classification problems is that knowledge acquisition can be achieved for users by carefully checking these rules discovered from training
patterns. Additionally, data mining can also ease the knowledge acquisition bottleneck in building prototype
expert systems (Hong et al., 2000) or rule-basedsystems.
The discovery of association rule is an important topic in data mining techniques. In addition, association
rules elicitedfrom transaction databases have been
applied to help decision makers determine which items
are frequently purchasedtogether by customers (Berry
andLinoff, 1997; Han andKamber, 2001). Initially,
Agrawal et al. (1993b) proposeda methodto findthe
frequent itemsets. Subsequently, Agrawal et al. (1996)
proposedan influential algorithm namedthe Apriori algorithm consisting two phases. In the first phase, frequent itemsets are generated, whereas a candidate k-itemset (kX1) containing k items, is frequent (i.e., frequent k-itemset) if its support is larger than or equal to a user-specifiedminimum support. In the second phase, association rules are generatedby frequent itemsets discovered in the first phase.
Additionally, the comprehensibility of fuzzy repre-sentation by human users is also a criterion in designing a fuzzy system. The simple fuzzy partition methods
are thus preferable (Ishibuchi et al., 1999). In this
method, each attribute, which is used to describe each
sample data, is viewed as linguistic variables (Zadeh,
*Corresponding author. Tel.: 2655130; fax: +886-3-2655199.
E-mail address:ychu@cycu.edu.tw (Y.-C. Hu).
0952-1976/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2003.09.007
1975a, b, 1976). Basedon the partition methods usedin
simple-fuzzy-partition-basedmethod(SFPBM) (Hu
et al., 2002) this paper proposes a fuzzy data mining methodfor eliciting fuzzy classification rules for classification problems.
Since the classification performance can be improved by adjusting the grade of certainty of fuzzy rules, the
adaptive rules proposed by Nozaki et al. (1996) are
incorporatedinto the proposedmethodto adjust the
fuzzy confidence of each fuzzy rule. For classification generalization ability, the simulation results from the iris
data (Anderson, 1935) demonstrate that the proposed
methodperforms well in comparison with other fuzzy classification methods. This shows that applications of the proposedmethodto engineering problems are feasible.
The rest of this paper is organizedas follows. STDM andMTDM are introducedin Section 2. In Section 3, we present definitions of the fuzzy support and the fuzzy confidence, and the two phases of the proposed method is presentedin detail. In Section 4, the performance of the proposedmethodis examinedby computer simula-tion on the iris data. Discussions and conclusions are presentedin Section 5.
2. Simple fuzzy partition methods
Concepts of linguistic variables (Zadeh, 1975a, b,
1976). Formally, a linguistic variable is characterized
by a quintuple (Pedrycz and Gomide, 1998;
Zimmer-mann, 1996) denoted by (x; T ðxÞ; U ; G; M), in which x is the name of the variable; T ðxÞ denotes the set of names of linguistic values or terms of x; U denotes a universe of discourse; G is a syntactic rule for generating values of x; and M is a semantic rule for associating a linguistic value with a meaning.
Actually, each attribute can be partitionedby its various linguistic values with pre-specifiedmembership functions, such as triangular shape functions. Simple fuzzy grids or grid partitions (Ishibuchi et al., 1995;Jang andSun, 1995) in feature space are thus obtained. The advantage of the simple fuzzy partition method is that the linguistic interpretation of each fuzzy set is easily obtained. Fuzzy partition methods have been widely
usedin pattern recognition andfuzzy reasoning, such as
applications to pattern classification byIshibuchi et al.
(1992, 1995, 1999),Ravi andZimmermann (2000), and
Ravi et al. (2000), to fuzzy neural networks (Jang, 1993),
andto the fuzzy rule generation byWang andMendel
(1992).
If both x1 and x2 are partitionedby three various
linguistic values, then a feature space can be divided into nine two-dimensional (2-dim) fuzzy grids, as shown in
Fig. 1. The shaded fuzzy subspace denoted by AWidth
3;1
ALength3;3 stands for a 2-dim fuzzy grid whose linguistic
value is ‘‘small AND large’’.
Two partition types usedin SFPBM are employedin the proposed method: one is the multiple type division method(MTDM), andthe other is the single type division method (STDM). If K is the maximum number of various linguistic values on each quantitative attribute, then MTDM allow us to partition each quantitative attribute into various (3+4+?+K) lin-guistic values. In other words, we sequentially divide each quantitative attribute into 3, 4, y ,K various linguistic values. As for STDM, only K various linguistic values are defined.
For simplicity, the membership function with trian-gular shape is usedfor each linguistic value in the quantitative attributes. However, we emphasize that
Pedrycz (1994) hadpointedout the usefulness and Nomenclature
d number of attributes usedto describe each
sample data, where 1pd
k dimension of one fuzzy grid, where 1pkpd
K maximum number of various linguistic values
defined in each quantitative attribute, where KX3
Axm
Ki;jm jmth linguistic value of Ki various linguistic
values defined in attribute xm; where
1pmpd; 3pKipK for the MTDM, Ki¼ K
for the STDM, and1pjmpKi
mxm
Ki;jm membership function of A
xm
Ki;jm
tp pth training sample, where tp¼ ðtp1; tp2; ?tpdÞ;
and tpiis the attribute value with respect to the
ith attribute. Length (x2) Width (x1) Ax2 1 , 3 0.0 1.0 Ax2 2 , 3 Ax2 3 , 3 0.0 Ax1 3 , 3 Ax1 2 , 3 Ax1 1 , 3 1.0
effectiveness of the triangular membership functions in the fuzzy modeling. A membership function such as
mWidth K;j1 is representedas follows: mWidth K;j1 ðxÞ ¼ maxf1 jx a K j1j=b K; 0g; ð1Þ where aKj1 ¼ mi þ ðma miÞðj1 1Þ=ðK 1Þ; ð2Þ bK¼ ðma miÞ=ðK 1Þ; ð3Þ
where ma is the maximum value of domain, and mi is the minimum value. Each linguistic value is actually viewed as a candidate one-dimensional (1-dim) fuzzy gridin the proposedmethod. It is clear that the set of candidate 1-dim fuzzy grids generated for a pre-specified K by STDM is containedin that generatedby MTDM.
For example, if we divide both ‘‘Width’’ (denoted by x1)
and‘‘Length’’ (denotedby x2) by four various linguistic
values, then {AWidth
4;1 ; AWidth4;2 ; AWidth4;3 ; AWidth4;4 ; A Length
4;1 ;
ALength4;2 ; ALength4;3 ; ALength4;4 } is generatedby STDM, and
{AWidth 3;1 ; AWidth3;2 ; AWidth3;3 ; A Length 3;1 ; A Length 3;2 ; A Length 3;3 ; AWidth4;1 ; AWidth4;2 ; AWidth4;3 ; AWidth4;4 ; A4;1Length; ALength4;2 ; ALength4;3 ; ALength4;4 } is generatedby MTDM when K ¼ 4:
A significant task is how to use the candidate 1-dim fuzzy grids to generate the other frequent fuzzy grids andthe fuzzy classification rules. An effective methodis thus described in following section.
3. Discovering fuzzy classification rules
In the proposed method, frequent fuzzy grids and fuzzy classification rules are generatedby phases I and II, respectively. One fuzzy partition method(i.e., STDM or MTDM) must be specifiedbefore performing the proposedalgorithm.
The main difference between the proposed method and SFPBM is that SFPBM did not consider all information distributed in the pattern space during the mining process. That is, SFPBM ignoredthose fuzzy subspaces containing any two linguistic values belonging
to different Kipartitions. Thus, SFPBM cannot generate
a fuzzy space like Ax1
K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk
Kk;jk: For example, if two quantitative attributes, say
x1 and x2; are partitionedinto 4 linguistic values (i.e.,
K ¼ 4) with MTDM, then fuzzy subspaces or candidate fuzzy grids Ax1
K1;j1 A
x2
K2;j2are not considered by SFPBM
when K1 is not equal to K2 (e.g., AWidth4;2 A
Length
3;3 or
AWidth3;1 ALength4;1 ). However, it is possible that the
ignoredsubspaces, which are further consideredin the proposedmethod, are useful. It shouldbe notedthat
since Ki¼ 2 is somewhat coarser, KiX3 is considered in
SFPBM andthe proposedmethod.
In this section, we describe the individual phase of the proposedmethodin Sections 3.1 and3.2.
3.1. Phase I: generate frequent fuzzy grids
Suppose each quantitative attribute, xm; is divided
into K various linguistic values. Without loss of
generality, given a candidate k-dim fuzzy grid Ax1
K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk
Kk;jk which is a fuzzy set,
3pK1; K2; ?; Kk1; KkpK for the MTDM and K1¼
K2¼ ? ¼ Kk1¼ Kk¼ K for the STDM, the degree to
which tp belongs to this fuzzy grid(i.e.,
Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkðtpÞ) can be computed as mx1 K1;j1ðtp1Þ m x2 K2;j2ðtp2Þ?m xk1 Kk1;jk1ðtpk1Þ m xk Kk;jkðtpkÞ: To
check whether this fuzzy gridto be frequent or not, the fuzzy support (Ishibuchi et al., 2001;Hu et al., 2002) of Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk
Kk;jk with the
alge-braic product, which is a t-norm operator in the fuzzy intersection, is defined as follows:
FS Ax1 K1;j1 A x2 K2;j2 ? A x2 K2;j2 A xk1 Kk1;jk1 A xk Kk;jk ¼X n p¼1 mAx1 K1;j1 A x2 K2;j2 A xk1 Kk1;jk1ðtpÞ=n ¼ X n p¼1 mx1 K1;j1ðtp1Þ m x2 K2;j2ðtp2Þ?m xk1 Kk1;jk1 " ðtpk1Þ m xk Kk;jk tpk # =nÞ: ð4Þ When FSðAx1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkÞ is
larger than or equal to the user-specifiedminimum fuzzy support (min FS), Ax1
K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk
Kk;jk is a frequent k-dim fuzzy grid. For any two
frequent grids, say Ax1
K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk and Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xkþ1 Kkþ1;jkþ1; since mAx1 K1;j1Ax1K2;j2?Axk1Kk1;jk1AxkKk;jkAxkþ1Kkþ1;jkþ1ðtpÞpmAx1K1;j1 Ax1 K2;j2 ? A xk Kk1;jk1 A xk Kk;jkðtpÞ from (4), A x1 K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xkþ1 Kkþ1;jkþ1DA x1 K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk
Kk;jk thus holds. It is obvious
that any subset of a frequent fuzzy gridmust also be frequent.
Like SFPBM, Table FGTTFS is implementedto generate frequent fuzzy grids. FGTTFS consists of the following substructures:
(a) Fuzzy gridtable (FG): each row represents a fuzzy grid, and each column represents a linguistic value Axm
Ki;jm:
(b) Transaction table (TT): each column represents tp;
and each element records the membership degree of the corresponding fuzzy grid.
(c) Column FS: stores the fuzzy support corresponding to the fuzzy gridin FG.
An initial tabular FGTTFS is shown asTable 1as an
example, from which we can see that there are two samples t1and t2; with two attributes x1and x2: Both x1
and x2 are divided into three linguistic values (i.e.,
K ¼ 3). Assume that x2 is the attribute of class labels.
Since each row of FG is a bit string consisting of 0 and 1, FG[u] andFG[v] (i.e., uth row and vth row of FG) can be pairedto generate certain desiredresults by applying the Boolean operations. For example, if we apply the OR operation on two rows, FG[1]=(1, 0, 0, 0, 0, 0) (i.e., Ax1
3;1) andFG[4]=(0, 0, 0, 1, 0, 0) (i.e., A
x2
3;1), then (FG[1] OR FG[4])=(1, 0, 0, 1, 0, 0) corresponding to a
candidate 2-dim fuzzy grid Ax1
3;1 A x2 3;1 is generated. Then, FSðAx1 3;1 A x2 3;1 A x2 3;1Þ ¼ A x1 3;1 A x2 3;1ðt1Þ þ Ax3;11 Ax2 3;1(t2)=½mx3;11ðt11Þ m x2 3;1ðt12Þ þ m x2 3;1ðt12Þ m x1 3;1ðt22Þm x2 3;1ðt22Þ=
2 ¼ ðTT½1 TT½4Þ is obtainedto compare with the min FS. However, any two linguistic values defined in the same attribute cannot be containedin the same candidate k-dim fuzzy grid (kX2). Therefore, for example, (1, 1, 0, 0, 0, 0) and(0, 0, 0, 1, 0, 1) are invalid. In the Apriori algorithm, two frequent (k21)-itemsets are joined to be a candidate k-itemset, andthese two frequent itemsets share (k22) items. Similarly, two frequent (k21)-dim grids that share (k22) linguistic values can be used to derive a candidate k-dim
(2pkpd) fuzzy grid. For example, if Ax1
3;2 A x2 3;1 and Ax1 3;2 A x3
3;3 are frequent, then these two grids share A
x1
3;2
can be usedto generate Ax1
3;2 A x2 3;1 A x3 3;3: Then, A x1 3;2 Ax2 3;1 A x3
3;3ðtpÞ ¼ Ax3;21ðtpÞA3;1x2ðtpÞAx3;33ðtpÞ is computed. 3.2. Phase II: generate fuzzy classification rules
The general type R of the fuzzy associative classifica-tion rule is statedas follows:
Rule R: Ax1 K1;i1 A x2 K2;i2 ? A xk1 Kk1;ik1 A xk Kk;ik ) Axa C;ia with FCðRÞ; ð5Þ
where xa (1papd) is the class label andFC(R) is the
fuzzy confidence of rule Ax1
K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk Kk;jk ) A xa
C;ia: The above rule represents
that: if x1 is AxK11;j1 and x2 is A x2 K2;j2 and y and xk is Axk Kk;jk; then xais A xa
C;ia: The left-handside of ‘‘ ) ‘‘ is the
antecedent part of R; andthe right-handside is the
consequent part. FC(R) can be viewedas the grade of
certainty of R. Since ðAx1
K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk Kk;jk A xa C;iaÞDðA x1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkÞ
holds, R can be generatedby Ax1
K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk Kk;jk A xa C;ia and A x1 K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk
Kk;jk: We define the fuzzy confidence
(Ishibuchi et al., 2001; Hu et al., 2002) of R (i.e., FC(R)) as follows: FCðRÞ ¼ FS Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xa C;ia = FS Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk : ð6Þ
Unlike SFPBM, the proposedmethodtries to reserve all fuzzy rules because it is not easy to specify an appropriate thresholdfor users. The user-specified
minimum fuzzy confidence (min FC) (Ishibuchi et al.,
2001; Hu et al., 2002) is set to zero for simplicity. We still apply Boolean operations to obtain the antecedent part andconsequent part of each rule. For example, if there exists FG[u]=(1, 0, 0, 0, 0, 0) andFG[v]=(1, 0, 0,
1, 0, 0) corresponding to frequent fuzzy grids Luand Lv;
where LvCLu; respectively; then FG[u] AND FG[v]=(1,
0, 0, 0, 0, 0), corresponding to the frequent fuzzy grid Ax1
3;1; is generatedto be the antecedent part of rule, say R:
Then, FG[u] XOR FG[v]=(0, 0, 0, 1, 0, 0), correspond-ing to the frequent fuzzy grid Ax2
3;1; is generatedto be the
consequent part of rule R. Then, FSðAx1
3;1 Ax23;1Þ=FSðAx13;1Þ is easily obtainedby (6).
The redundant rules must be further eliminated in
order to achieve the goal of compactness (Hu et al.,
2002). If there exist two rules R and S; having the same
consequent part andthe antecedent part of R is
containedin that of S, then R is redundant and can be discarded, and S is temporarily reserved. For example, if S is ‘‘Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1) A xa C;ia’’, then R
can be eliminated. This is because that the number of antecedent conditions should be minimized.
On the other hand, for improving the classification
performance of fuzzy rule-basedsystems, Nozaki et al.
(1996)proposed the adaptive rules to adjust the grade of certainty of each rule. These useful rules are further incorporated into the proposed methods. The adaptive procedure for adjusting fuzzy confidences is presented as follows:
Set the maximum number of iterations Jmax:
Set J to be zero. Repeat
J ¼ J þ 1
For each training sample tp do
a. Findthe ‘‘firing’’ fuzzy rule Rb:
b. If tpis correctly classifiedthen FC(Rb) is adjusted as follows:
FCðRbÞ ¼ FCðRbÞ þ Z1ð1 FCðRbÞÞ ð7Þ
otherwise, FC(Rb) is adjusted as follows:
FCðRbÞ ¼ FCðRbÞ þ Z2FCðRbÞ ð8Þ
where Z1and Z2 are learning rates.
Table 1
Initial table FGTTFS for an example
Fuzzy gridFG TT FS Ax1 3;1 A x1 3;2 A x1 3;3 A x2 3;1 A x2 3;2 A x2 3;3 t1 t2 Ax1 3;1 1 0 0 0 0 0 m x1 3;1 (t11) m x1 3;1(t21) FS(A x1 3;1) Ax1 3;2 0 1 0 0 0 0 m x1 3;2ðt11Þ m x1 3;2(t21) FS(A x1 3;2) Ax1 3;3 0 0 1 0 0 0 m x1 3;3(t11) m x1 3;3(t21) FS(A x1 3;3) Ax2 3;1 0 0 0 1 0 0 m x2 3;1 (t12) m x2 3;1(t22) FS(A x2 3;1) Ax2 3;2 0 0 0 0 1 0 m x2 3;2(t12) m x2 3;2(t22) FS(A x2 3;2) Ax2 3;3 0 0 0 0 0 1 m x2 3;3(t12) m x2 3;3(t22) FS(A x2 3;3)
End. Until J ¼ Jmax
The firing rule is foundby determining the class label
of tp through the use of fuzzy rules derived by the
proposedmethod. Without losing generality, if the
antecedent part of a fuzzy classification rule Rt is
Ax1
K1;i1 A
x2
K2;i2 ? A
xt
Kt;it; then we can calculate its
firing strength otfor tp as follows:
ot¼ mxK11;j1ðtp1Þm
x2
K2;j2ðtp2Þ?m
xt
K;itðtptÞ ð9Þ
Then tp can be determined to categorize to the class
label which is the consequent part of the ‘‘firing’’ rule, say Rb; if
otFCðRbÞ ¼ max
j fojFCðRjÞjRjATRg; ð10Þ
where TR is the set of fuzzy rules generatedby the proposed method. The adaptive rules are further
employed to adjust the fuzzy confidence of Rb: If tp is
correctly classifiedthen FC(Rb) is increased; otherwise,
FC(Rb) is decreased.Nozaki et al. (1996)also suggested
that the learning rates shouldbe specifiedas
0oZ15Z
2o1: Actually, Z1¼ 0:001; Z2¼ 0:1 and Jmax¼
500 are usedin the experiment. In the subsequent section, experimental results from the iris data demon-strate the effectiveness of the proposedmethod. How-ever, the aim of the experiment is to show the feasibility andthe problem-solving capability of the proposed
methodfor classification problems. That is, methods
about the acquisition of appropriate parameter specifi-cations to obtain higher classification accuracy rates and smaller number of fuzzy if–then rules are not considered in this paper.
4. Experimental results
The classification performances of the proposed methodwith two fuzzy partition types are examined by computer simulations. We employ the proposed methodto findfuzzy classification rules from the iris data that consists of three classes and each class consists of 50 samples. Moreover, class 2 overlaps with class 3. Suppose that the attributes ‘‘sepal length’’, ‘‘sepal width’’, ‘‘petal length’’, and ‘‘petal width’’ are denoted by x1; x2; x3; and x4respectively. x5denote ‘‘class label’’ (i.e., d ¼ 5) to which tp¼ ðtp1; tp2; ?; tp5Þ; (1ppp150)
belongs. Only three linguistic values can be defined in x5; they are Aclasslabel3;1 : ‘‘Class 1’’, Aclasslabel3;2 : ‘‘Class 2’’, and Aclasslabel
3;3 : ‘‘Class 3’’ without doubt.
K ¼ 6 is first considered for each attribute except x5:
Simulation results with different user-specified minimum
supports are shown inTables 2 and3using MTDM and
STDM, respectively. Tables 2 and3 indicate that
classification rates are more sensitive to larger min FS (i.e., min FS=0.18, 0.20). Therefore, the smaller min FS for both MTDM andSTDM shouldbe a better choice
when all non-redundant rules are reserved. That is, larger min FS may lead to discarding more useful fuzzy grids, thus reducing the effectiveness of fuzzy rules. From Tables 2 and3, we can see that the best classification rate 100.00% obtainedby MTDM is higher than that (i.e., 97.33%) obtainedby STDM. In comparison with STDM, MTDM uses more fuzzy if– then rules to classify samples. The best results of SFPBM andthe proposedmethodare also summarized inTable 4. Except for min FS=0.20, we can see that the best results of the proposedmethodwith MTDM outperforms those of SFPBM with MTDM for each value of min FS. It is obvious that the proposedmethod with STDM outperforms SFPBM with STDM for each value of min FS.
Simulation results with min FS=0.05 anddifferent
values of K are shown inTables 5 and6with MTDM
andSTDM, respectively. FromTables 5 and6, we can
see that the classification rates seem not to be sensitive to K for both partition methods. Therefore, it seems that K is not a serious problem from the viewpoint of
Table 2
Simulation results by the proposedmethodwith the MTDM with K ¼ 6
Min FS Classification rate (%) Number of rules
0.05 100.00 101 0.10 100.00 71 0.15 100.00 48 0.18 96.67 35 0.20 96.00 28 Table 3
Simulation results by the proposedmethodwith the STDM with K ¼ 6
Min FS Classification rate (%) Number of rules
0.05 97.33 30 0.10 97.33 17 0.15 95.33 11 0.18 93.33 5 0.20 93.33 5 Table 4
Classification rates (%) of SFPBM andthe proposedmethodwith K ¼ 6 andvarious min FS
Min FS Division method
MTDM STDM SFPBM The proposed method SFPBM The proposed method 0.05 96.67 100.00 96.67 97.33 0.10 96.67 100.00 96.67 97.33 0.15 96.67 100.00 92.67 95.33 0.20 96.67 96.00 88.67 93.33
classification rates. The classification rate of the MTDM also outperform that of the STDM for each value of K, andthe best result (i.e., 99.33%) from the STDM is
slightly worse than that of the MTDM. FromTables 2–
6, we can also see that since the min FS andthe min FC
are not optimizedto reduce the number of rules, a large number of rules are generatedwhen the MTDM is used for various K: Although how to set the appropriate values to the min FS andthe min FC is a significant work, this topic is not discussed in this paper for simplicity.
Some significant fuzzy if–then rule-basedclassifica-tion systems using simple fuzzy partirule-basedclassifica-tion methods have been proposed, such as the simple-fuzzy-grid method (Ishibuchi et al., 1992), the multi-rule-table method (Ishibuchi et al., 1992), the pruning method(Nozaki et al., 1996), andthe GA-basedmethod(Ishibuchi et al.,
1995). In addition, simulation results of the
aforemen-tioned methods demonstrated by Nozaki et al. (1996)
are summarizedin Table 7. The best results of the
proposedmethodwith MTDM or STDM are also shown in this table. From the viewpoint of classification rates, we can see that the proposedmethodwith STDM or MTDM works well in comparison with other fuzzy if–then rule-basedclassifiers. It is notedthat the best results of SFPBM with MTDM or STDM can be obtainedby setting appropriate values to min FS and min FC (e.g., min FS=0.10 andmin FC=0.80).
In the above simulation, all 150 samples are usedfor the training process to generate fuzzy rules. To examine the generalization ability of the proposedmethod, we perform the leave-one-out technique, which is an almost unbiasedestimator of the true error rate of a classifier (Weiss andKulikowski, 1991). In each iteration of the leave-one-out technique, fuzzy if–then rules are gener-atedfrom 149 training samples andtestedon the single remaining sample. This procedure is iterated until all the
given 150 samples are usedas a test sample. Now, we try to choose another values of min FS to examine the relationship between min FS andthe generalization ability of the proposedmethod. Simulation results with lower values of min FS (i.e., 0.05, 0.10, 0.15) are shown in Table 8. We can see that the proposedmethodwith MTDM seems not to be sensitive to min FS, andthe best classification rate is 96.67%; however, the proposed methodwith STDM is more sensitive to min FS, andthe best classification rate is 95.33%. Therefore, from the viewpoint of the generalization ability, we may conclude that the proposedmethodwith MTDM works more robustly than with STDM does.
Basedon the leave-one-out technique, we try to make a comparison between the proposedmethodandthe above-mentionedfuzzy rule-basedsystems. We
sum-marize the simulation results inTable 9. The best result
of the proposedmethodwith MTDM or STDM is also shown in this table. From the viewpoint of classification rates, we can see that the proposedmethodwith MTDM performs well in comparison with other fuzzy if–then
Table 5
Simulation results by the proposedmethodwith the MTDM with various K
K Classification rate (%) Number of rules
4 100.00 46
5 100.00 71
6 100.00 101
7 100.00 131
Table 6
Simulation results by the proposedmethodwith the STDM with various K
K Classification rate (%) Number of rules
4 97.33 25
5 98.00 25
6 97.33 30
7 99.33 30
Table 7
Simulation results by various fuzzy if-then rule-basedclassification systems MethodClassification rate (%) The proposedmethodwith MTDM 100.00 The proposedmethodwith STDM 99.33 Simple-fuzzy-grid98.67 Multi-rule-table 95.33 Pruning 100.00 GA-based99.47 SFPBM with MTDM 96.67 SFPBM with STDM 96.67 Table 8
Classification rates by the leave-one-out technique for MTDM and STDM
MethodMinimum fuzzy support
0.05 0.10 0.15
MTDM 95.33 96.67 95.33
STDM 92.67 94.00 95.33
Table 9
Simulation results by the leave-one-out technique for various fuzzy if-then rule-basedclassification systems
MethodClassification rate (%) The proposedmethodwith MTDM 96.67 The proposedmethodwith STDM 95.33 Simple-fuzzy-grid96.67 Multi-rule-table 94.67 Pruning 93.33 GA-based94.67 SFPBM with MTDM 96.67 SFPBM with STDM 96.67
rule-basedclassifiers. However, it shouldbe notedthat the classification performance of the GA-basedmethod can be highly improvedby carefully tuning parameters (e.g., 97.33%). We also findthat the best rate (96.67%) of SFPBM with STDM outperforms that of the proposedmethodwith STDM (95.33%). This means that the reservation of all non-redundant rules for the latter methodmay leadto overfitting.
On the other hand, classification rates of nine fuzzy classification methods, including fuzzy integral with perceptron criterion, fuzzy integral with quadratic criterion, minimum operator, fast heuristic search with Sugeno integral, simulatedannealing with Sugeno integral, fuzzy k-nearest neighbor, fuzzy c-means, fuzzy c-means for histograms andhierarchical fuzzy c-means, for the iris data estimated by the leave-one-out
technique were reportedby Grabisch andDispot
(1992). From the summarizedresults shown in
Table 10, we can see that the best result (i.e. 96.67%) was obtainedby using the fuzzy integral with quadratic criterion or the fuzzy k-NNR method. It is clear that the best result of the proposedmethodwith MTDM (i.e., 96.67%) is equal to the best result of these nine fuzzy methods, whereas the best result of the proposed methodwith STDM (i.e., 95.33%) is slightly worse than those of the fuzzy integral with quadratic criterion, the minimum operator andthe fuzzy k-nearest neighbor.
5. Discussions and conclusions
In this paper, we propose a two-phase fuzzy data mining technique that can findfuzzy association rules for classification problems basedon SFPBM proposed by Hu et al. (2002). There are three main differences between the proposedmethodandSFPBM. First, ignoredfuzzy subspaces are consideredin the proposed method. Second, all non-redundant fuzzy if–then rules take part in the mining process by setting zero to min
FC. Specially, adaptive rules proposed byNozaki et al.
(1996) are further incorporatedinto the proposed methodfor improving the classification performance.
From summarizedresults shown inTable 4, we can see
that the proposedmethodwith STDM or MTDM performs well in comparison with SFPBM with STDM or MTDM.
The generalization ability of the proposedmethodis examined by the iris data, indicating that best classifica-tion rate of the MTDM apparently outperforms that of the STDM. Simulation results with various parameter specifications (i.e., min FS and K) also demonstrate that
the proposedmethodmay effectively derive fuzzy
classification rules.
On the other hand, we do not discuss how to set the appropriate values to the min FS andthe min FC for simplicity. Actually, this is a significant work. Since the parameter specification (i.e., min FS andmin FC) is not
optimizedto reduce the number of rules, as we have
shown in the previous section, a large number of rules are generatedwhen STDM or MTDM is usedfor various K. Therefore, it is necessary to develop methods
such as the genetic algorithms (Goldberg, 1989) to
automatically determine the appropriate values of min FS andthe min FC to obtain higher classification performances with a compact set of fuzzy if–then classification rules. Then, the proposedmethodmay be
viewedas an effective knowledge acquisition tool for
classification problems.
Moreover, since fuzzy knowledge representation can facilitate interaction of the expert system andthe users (Zimmermann, 1996), it is necessary to extendthe proposedmethodto findother types of fuzzy associa-tion rules to ease the fuzzy knowledge acquisiassocia-tion bottleneck in building prototype expert systems or fuzzy rule-basedsystems. The aforementionedissues are left
for future works. Additionally, Hong et al. (2001)
discussed the relationship between the computation time
andthe number of rules for the fuzzy data mining
technique. We consider that their study will provide useful suggestions to improve our method.
References
Anderson, E., 1935. The irises of the gaspe peninsula. Bulletin of the American Iris Society 59, 2–5.
Agrawal, R., Imielinski, T., Swami, A., 1993a. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering 5 (6), 914–925.
Agrawal, R., Imielinski, T., Swami, A., 1993b. Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, D.C. pp. 207–216.
Table 10
Classification accuracy rates of various fuzzy classification methods for the iris data Fuzzy methods
Perceptron criterion Quadratic criterion Minimum operator Fast heuristic search Simulated annealing
95.33% 96.67% 96.00% 92.00% 91.33%
Fuzzy k-nearest neighbor Fuzzy c-means Fuzzy c-means for histograms Hierarchical fuzzy c-means
96.67% 93.33% 93.33% 95.33%
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., 1996. Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (Eds.), Ad-vances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, pp. 307–328.
Berry, M., Linoff, G., 1997. Data Mining Techniques: For Marketing, Sales, andCustomer Support. Wiley, New York.
Chuang, J.H., Wang, P.H., Wu, M.C., 1999. Automatic classification of block-shapedparts basedon their 2D projections. Computers andIndustrial Engineering 36 (3), 697–718.
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA.
Grabisch, M., Dispot, F., 1992. A comparison of some methods of fuzzy classification on real data. In: Proceedings of the Second International Conference on Fuzzy Logic andNeural Networks, Iizuka, Japan, pp.659–662.
Han, J.W., Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco.
Hong, T.P., Wang, T.T., Wang, S.L., Chien, B.C., 2000. Learning a coverage set of maximally general fuzzy rules by rough sets. Expert Systems with Applications 19 (2), 97–103.
Hong, T.P., Kuo, C.S., Chi, S.C., 2001. Trade-off between computa-tion time andnumber of rules for fuzzy mining from quantitative data. International Journal of Uncertainty Fuzziness and Knowl-edge-Based Systems 9 (5), 587–604.
Hu, Y.C., Chen, R.S., Tzeng, G.H., 2002. Mining fuzzy association rules for classification problems. Computers andIndustrial Engineering 43 (4), 735–750.
Ishibuchi, H., Nozaki, K., Tanaka, H., 1992. Distributedrepresenta-tion of fuzzy rules andits applicaDistributedrepresenta-tion to pattern classificaDistributedrepresenta-tion. Fuzzy Sets andSystems 52 (1), 21–32.
Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H., 1995. Selec-ting fuzzy if–then rules for classification problems using genetic algorithms. IEEE Transactions on Fuzzy Systems 3 (3), 260–270.
Ishibuchi, H., Nakashima, T., Murata, T., 1999. Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Transactions on Systems, Man, and Cybernetics 29 (5), 601–618.
Ishibuchi, H., Yamamoto, T., Nakashima, T., 2001. Fuzzy data mining: effect of fuzzy discretization. In: Proceedings of the First
IEEE International Conference on Data Mining, San Jose, USA, pp.241–248.
Jang, J.S.R., 1993. ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, andCybernetics 23 (3), 665–685.
Jang, J.S.R., Sun, C.T., 1995. Neuro-fuzzy modeling and control. Proceedings of the IEEE 83 (3), 378–406.
Kim, D., Bang, S.Y., 2000. A handwritten numeral character classification using tolerant rough set. IEEE Transactions on Pattern Analysis andMachine Intelligence 22 (9), 923–937. Nozaki, K., Ishibuchi, H., Tanaka, H., 1996. Adaptive fuzzy
rule-basedclassification systems. IEEE Transactions on Fuzzy Systems 4 (3), 238–250.
Pedrycz, W., 1994. Why triangular membership functions? Fuzzy Sets andSystems 64, 21–30.
Pedrycz, W., Gomide, F., 1998. An Introduction to Fuzzy Sets: Analysis andDesign. MIT Press, Cambridge, MA.
Ravi, V., Zimmermann, H.-J., 2000. Fuzzy rule basedclassification with FeatureSelector andmodifiedthresholdaccepting. European Journal of Operational Research 123 (1), 16–28.
Ravi, V., Reddy, P.J., Zimmermann, H.-J., 2000. Pattern classification with principal component analysis andfuzzy rule bases. European Journal of Operational Research 126, 526–533.
Wang, L.X., Mendel, J.M., 1992. Generating fuzzy rules by learning from examples. IEEE Transactions on Systems, Man, and Cybernetics 22 (6), 1414–1427.
Weiss, S.M., Kulikowski, C.A., 1991. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, andExpert Systems. Morgan Kaufmann, Los Altos, CA.
Zadeh, L.A., 1975a. The concept of a linguistic variable and its application to approximate reasoning (Part 1). Information Science 8 (3), 199–249.
Zadeh, L.A., 1975b. The concept of a linguistic variable and its application to approximate reasoning (Part 2). Information Science 8 (4), 301–357.
Zadeh, L.A., 1976. The concept of a linguistic variable and its application to approximate reasoning (Part 3). Information Science 9 (1), 43–80.
Zimmermann, H.-J., 1996. Fuzzy Set Theory andIts Applications. Kluwer, Boston.