Elicitation of classification rules by fuzzy data mining

(1)

Engineering Applications of Artiﬁcial Intelligence 16 (2003) 709–716

Elicitation of classiﬁcation rules by fuzzy data mining

Yi-Chung Hu

a,

*, Gwo-Hshiung Tzeng

b

a_{Department of Business Administration, Chung Yuan Christian University, Chung-Li 320, Taiwan, ROC} b_{Institute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan, ROC}

Received25 November 2002; receivedin revisedform 22 May 2003; accepted18 September 2003

Abstract

Data mining techniques can be used to find potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based on the partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to findfuzzy if–then rules for classification problems: one to findfrequent fuzzy grids by using a pre-specified simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classification rules from frequent fuzzy grids. To improve the classification performance of the proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the confidence of each classification rule. For classification generalization ability, the simulation results from the iris data demonstrate that the proposed methodmay effectively derive fuzzy classification rules from training samples.

Keywords: Data mining; Fuzzy sets; Classiﬁcation problems; Association rules

1. Introduction

Pattern classiﬁcation is a problem that partitions a pattern space into classes andthen assigns a pattern to

one of those classes (Kim andBang, 2000). In fact,

classiﬁcation problems have playedan important role in industrial engineering such as the group technology (Chuang et al., 1999), in engineering applications, such

as OCR recognition andfacial recognition (Kim and

Bang, 2000).

Data mining is the exploration andanalysis of the

data in order to discover meaningful patterns (Berry and

Linoff, 1997). The aim of this paper is to propose a fuzzy data mining method that can automatically find a set of fuzzy if–then rules for classification problems. Actually, data mining problems involving classification can be viewedwithin a common framework of rule

discovery (Agrawal et al., 1993a). The advantage for

mining fuzzy if–then rules for classiﬁcation problems is that knowledge acquisition can be achieved for users by carefully checking these rules discovered from training

patterns. Additionally, data mining can also ease the knowledge acquisition bottleneck in building prototype

expert systems (Hong et al., 2000) or rule-basedsystems.

The discovery of association rule is an important topic in data mining techniques. In addition, association

rules elicitedfrom transaction databases have been

applied to help decision makers determine which items

are frequently purchasedtogether by customers (Berry

andLinoff, 1997; Han andKamber, 2001). Initially,

Agrawal et al. (1993b) proposeda methodto ﬁndthe

frequent itemsets. Subsequently, Agrawal et al. (1996)

proposedan influential algorithm namedthe Apriori algorithm consisting two phases. In the first phase, frequent itemsets are generated, whereas a candidate k-itemset (kX1) containing k items, is frequent (i.e., frequent k-itemset) if its support is larger than or equal to a user-specifiedminimum support. In the second phase, association rules are generatedby frequent itemsets discovered in the first phase.

Additionally, the comprehensibility of fuzzy repre-sentation by human users is also a criterion in designing a fuzzy system. The simple fuzzy partition methods

are thus preferable (Ishibuchi et al., 1999). In this

method, each attribute, which is used to describe each

sample data, is viewed as linguistic variables (Zadeh,

*Corresponding author. Tel.: 2655130; fax: +886-3-2655199.

E-mail address:[email protected] (Y.-C. Hu).

(2)

1975a, b, 1976). Basedon the partition methods usedin

simple-fuzzy-partition-basedmethod(SFPBM) (Hu

et al., 2002) this paper proposes a fuzzy data mining methodfor eliciting fuzzy classiﬁcation rules for classiﬁcation problems.

Since the classiﬁcation performance can be improved by adjusting the grade of certainty of fuzzy rules, the

adaptive rules proposed by Nozaki et al. (1996) are

incorporatedinto the proposedmethodto adjust the

fuzzy conﬁdence of each fuzzy rule. For classiﬁcation generalization ability, the simulation results from the iris

data (Anderson, 1935) demonstrate that the proposed

methodperforms well in comparison with other fuzzy classiﬁcation methods. This shows that applications of the proposedmethodto engineering problems are feasible.

The rest of this paper is organizedas follows. STDM andMTDM are introducedin Section 2. In Section 3, we present deﬁnitions of the fuzzy support and the fuzzy conﬁdence, and the two phases of the proposed method is presentedin detail. In Section 4, the performance of the proposedmethodis examinedby computer simula-tion on the iris data. Discussions and conclusions are presentedin Section 5.

2. Simple fuzzy partition methods

Concepts of linguistic variables (Zadeh, 1975a, b,

1976). Formally, a linguistic variable is characterized

by a quintuple (Pedrycz and Gomide, 1998;

Zimmer-mann, 1996) denoted by (x; T ðxÞ; U ; G; M), in which x is the name of the variable; T ðxÞ denotes the set of names of linguistic values or terms of x; U denotes a universe of discourse; G is a syntactic rule for generating values of x; and M is a semantic rule for associating a linguistic value with a meaning.

Actually, each attribute can be partitionedby its various linguistic values with pre-speciﬁedmembership functions, such as triangular shape functions. Simple fuzzy grids or grid partitions (Ishibuchi et al., 1995;Jang andSun, 1995) in feature space are thus obtained. The advantage of the simple fuzzy partition method is that the linguistic interpretation of each fuzzy set is easily obtained. Fuzzy partition methods have been widely

usedin pattern recognition andfuzzy reasoning, such as

applications to pattern classiﬁcation byIshibuchi et al.

(1992, 1995, 1999),Ravi andZimmermann (2000), and

Ravi et al. (2000), to fuzzy neural networks (Jang, 1993),

andto the fuzzy rule generation byWang andMendel

(1992).

If both x1 and x2 are partitionedby three various

linguistic values, then a feature space can be divided into nine two-dimensional (2-dim) fuzzy grids, as shown in

Fig. 1. The shaded fuzzy subspace denoted by AWidth

3;1

ALength_3;3 stands for a 2-dim fuzzy grid whose linguistic

value is ‘‘small AND large’’.

Two partition types usedin SFPBM are employedin the proposed method: one is the multiple type division method(MTDM), andthe other is the single type division method (STDM). If K is the maximum number of various linguistic values on each quantitative attribute, then MTDM allow us to partition each quantitative attribute into various (3+4+?+K) lin-guistic values. In other words, we sequentially divide each quantitative attribute into 3, 4, y ,K various linguistic values. As for STDM, only K various linguistic values are deﬁned.

For simplicity, the membership function with trian-gular shape is usedfor each linguistic value in the quantitative attributes. However, we emphasize that

Pedrycz (1994) hadpointedout the usefulness and Nomenclature

d number of attributes usedto describe each

sample data, where 1pd

k dimension of one fuzzy grid, where 1pkpd

K maximum number of various linguistic values

deﬁned in each quantitative attribute, where KX3

Axm

Ki;jm jmth linguistic value of Ki various linguistic

values deﬁned in attribute xm; where

1pmpd; 3pKipK for the MTDM, Ki¼ K

for the STDM, and1pjmpKi

mxm

Ki;jm membership function of A

xm

Ki;jm

tp pth training sample, where tp¼ ðtp1; tp2; ?tpdÞ;

and tpiis the attribute value with respect to the

ith attribute. Length (x2) Width (x1) Ax2 1 , 3 0.0 1.0 Ax2 2 , 3 Ax2 3 , 3 0.0 Ax1 3 , 3 Ax1 2 , 3 Ax1 1 , 3 1.0

(3)

effectiveness of the triangular membership functions in the fuzzy modeling. A membership function such as

mWidth K;j1 is representedas follows: mWidth K;j1 ðxÞ ¼ maxf1 jx a K j1j=b K_{; 0g;} _ð1Þ where aK_j₁ ¼ mi þ ðma miÞðj1 1Þ=ðK 1Þ; ð2Þ bK¼ ðma miÞ=ðK 1Þ; ð3Þ

where ma is the maximum value of domain, and mi is the minimum value. Each linguistic value is actually viewed as a candidate one-dimensional (1-dim) fuzzy gridin the proposedmethod. It is clear that the set of candidate 1-dim fuzzy grids generated for a pre-speciﬁed K by STDM is containedin that generatedby MTDM.

For example, if we divide both ‘‘Width’’ (denoted by x1)

and‘‘Length’’ (denotedby x2) by four various linguistic

values, then {AWidth

4;1 ; AWidth4;2 ; AWidth4;3 ; AWidth4;4 ; A Length

4;1 ;

ALength_4;2 ; ALength_4;3 ; ALength_4;4 } is generatedby STDM, and

{AWidth 3;1 ; AWidth3;2 ; AWidth3;3 ; A Length 3;1 ; A Length 3;2 ; A Length 3;3 ; AWidth4;1 ; AWidth_4;2 ; AWidth_4;3 ; AWidth_4;4 ; A_4;1Length; ALength_4;2 ; ALength_4;3 ; ALength_4;4 } is generatedby MTDM when K ¼ 4:

A signiﬁcant task is how to use the candidate 1-dim fuzzy grids to generate the other frequent fuzzy grids andthe fuzzy classiﬁcation rules. An effective methodis thus described in following section.

3. Discovering fuzzy classiﬁcation rules

In the proposed method, frequent fuzzy grids and fuzzy classiﬁcation rules are generatedby phases I and II, respectively. One fuzzy partition method(i.e., STDM or MTDM) must be speciﬁedbefore performing the proposedalgorithm.

The main difference between the proposed method and SFPBM is that SFPBM did not consider all information distributed in the pattern space during the mining process. That is, SFPBM ignoredthose fuzzy subspaces containing any two linguistic values belonging

to different Kipartitions. Thus, SFPBM cannot generate

a fuzzy space like Ax1

K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk

Kk;jk: For example, if two quantitative attributes, say

x1 and x2; are partitionedinto 4 linguistic values (i.e.,

K ¼ 4) with MTDM, then fuzzy subspaces or candidate fuzzy grids Ax1

K1;j1 A

x2

K2;j2are not considered by SFPBM

when K1 is not equal to K2 (e.g., AWidth4;2 A

Length

3;3 or

AWidth_3;1 ALength_4;1 ). However, it is possible that the

ignoredsubspaces, which are further consideredin the proposedmethod, are useful. It shouldbe notedthat

since Ki¼ 2 is somewhat coarser, KiX3 is considered in

SFPBM andthe proposedmethod.

In this section, we describe the individual phase of the proposedmethodin Sections 3.1 and3.2.

3.1. Phase I: generate frequent fuzzy grids

Suppose each quantitative attribute, xm; is divided

into K various linguistic values. Without loss of

generality, given a candidate k-dim fuzzy grid Ax1

K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk

Kk;jk which is a fuzzy set,

3pK1; K2; ?; Kk1; KkpK for the MTDM and K1¼

K2¼ ? ¼ Kk1¼ Kk¼ K for the STDM, the degree to

which tp belongs to this fuzzy grid(i.e.,

Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkðtpÞ) can be computed as mx1 K1;j1ðtp1Þ m x2 K2;j2ðtp2Þ?m xk1 Kk1;jk1ðtpk1Þ m xk Kk;jkðtpkÞ: To

check whether this fuzzy gridto be frequent or not, the fuzzy support (Ishibuchi et al., 2001;Hu et al., 2002) of Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk

Kk;jk with the

alge-braic product, which is a t-norm operator in the fuzzy intersection, is deﬁned as follows:

FS Ax1 K1;j1 A x2 K2;j2 ? A x2 K2;j2 A xk1 Kk1;jk1 A xk Kk;jk ¼X n p¼1 mAx1 K1;j1 A x2 K2;j2 A xk1 Kk1;jk1ðtpÞ=n ¼ X n p¼1 mx1 K1;j1ðtp1Þ m x2 K2;j2ðtp2Þ?m xk1 Kk1;jk1 " ðtpk1Þ m xk Kk;jk tpk # =nÞ: ð4Þ When FSðAx1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkÞ is

larger than or equal to the user-speciﬁedminimum fuzzy support (min FS), Ax1

K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk

Kk;jk is a frequent k-dim fuzzy grid. For any two

frequent grids, say Ax1

K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk and Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xkþ1 Kkþ1;jkþ1; since m_Ax1 K1;j1Ax1K2;j2?Axk1Kk1;jk1AxkKk;jkAxkþ1Kkþ1;jkþ1ðtpÞpmAx1K1;j1 Ax1 K2;j2 ? A xk Kk1;jk1 A xk Kk;jkðtpÞ from (4), A x1 K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xkþ1 Kkþ1;jkþ1DA x1 K1;j1 Ax2 K2;j2 ? A xk1 Kk1;jk1 A xk

Kk;jk thus holds. It is obvious

that any subset of a frequent fuzzy gridmust also be frequent.

Like SFPBM, Table FGTTFS is implementedto generate frequent fuzzy grids. FGTTFS consists of the following substructures:

(a) Fuzzy gridtable (FG): each row represents a fuzzy grid, and each column represents a linguistic value Axm

Ki;jm:

(b) Transaction table (TT): each column represents tp;

and each element records the membership degree of the corresponding fuzzy grid.

(c) Column FS: stores the fuzzy support corresponding to the fuzzy gridin FG.

An initial tabular FGTTFS is shown asTable 1as an

example, from which we can see that there are two samples t1and t2; with two attributes x1and x2: Both x1

(4)

and x2 are divided into three linguistic values (i.e.,

K ¼ 3). Assume that x2 is the attribute of class labels.

Since each row of FG is a bit string consisting of 0 and 1, FG[u] andFG[v] (i.e., uth row and vth row of FG) can be pairedto generate certain desiredresults by applying the Boolean operations. For example, if we apply the OR operation on two rows, FG[1]=(1, 0, 0, 0, 0, 0) (i.e., Ax1

3;1) andFG[4]=(0, 0, 0, 1, 0, 0) (i.e., A

x2

3;1), then (FG[1] OR FG[4])=(1, 0, 0, 1, 0, 0) corresponding to a

candidate 2-dim fuzzy grid Ax1

3;1 A x2 3;1 is generated. Then, FSðAx1 3;1 A x2 3;1 A x2 3;1Þ ¼ A x1 3;1 A x2 3;1ðt1Þ þ Ax3;11 Ax2 3;1(t2)=½mx3;11ðt11Þ m x2 3;1ðt12Þ þ m x2 3;1ðt12Þ m x1 3;1ðt22Þm x2 3;1ðt22Þ=

2 ¼ ðTT½1 TT½4Þ is obtainedto compare with the min FS. However, any two linguistic values deﬁned in the same attribute cannot be containedin the same candidate k-dim fuzzy grid (kX2). Therefore, for example, (1, 1, 0, 0, 0, 0) and(0, 0, 0, 1, 0, 1) are invalid. In the Apriori algorithm, two frequent (k21)-itemsets are joined to be a candidate k-itemset, andthese two frequent itemsets share (k22) items. Similarly, two frequent (k21)-dim grids that share (k22) linguistic values can be used to derive a candidate k-dim

(2pkpd) fuzzy grid. For example, if Ax1

3;2 A x2 3;1 and Ax1 3;2 A x3

3;3 are frequent, then these two grids share A

x1

3;2

can be usedto generate Ax1

3;2 A x2 3;1 A x3 3;3: Then, A x1 3;2 Ax2 3;1 A x3

3;3ðtpÞ ¼ Ax3;21ðtpÞA3;1x2ðtpÞAx3;33ðtpÞ is computed. 3.2. Phase II: generate fuzzy classification rules

The general type R of the fuzzy associative classiﬁca-tion rule is statedas follows:

Rule R: Ax1 K1;i1 A x2 K2;i2 ? A xk1 Kk1;ik1 A xk Kk;ik ) Axa C;ia with FCðRÞ; ð5Þ

where xa (1papd) is the class label andFC(R) is the

fuzzy conﬁdence of rule Ax1

K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk Kk;jk ) A xa

C;ia: The above rule represents

that: if x1 is AxK11;j1 and x2 is A x2 K2;j2 and y and xk is Axk Kk;jk; then xais A xa

C;ia: The left-handside of ‘‘ ) ‘‘ is the

antecedent part of R; andthe right-handside is the

consequent part. FC(R) can be viewedas the grade of

certainty of R. Since ðAx1

K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 Axk Kk;jk A xa C;iaÞDðA x1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jkÞ

holds, R can be generatedby Ax1

K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk Kk;jk A xa C;ia and A x1 K1;j1 A x2 K2;j2 ? Axk1 Kk1;jk1 A xk

Kk;jk: We deﬁne the fuzzy conﬁdence

(Ishibuchi et al., 2001; Hu et al., 2002) of R (i.e., FC(R)) as follows: FCðRÞ ¼ FS Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk A xa C;ia = FS Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1 A xk Kk;jk : ð6Þ

Unlike SFPBM, the proposedmethodtries to reserve all fuzzy rules because it is not easy to specify an appropriate thresholdfor users. The user-speciﬁed

minimum fuzzy conﬁdence (min FC) (Ishibuchi et al.,

2001; Hu et al., 2002) is set to zero for simplicity. We still apply Boolean operations to obtain the antecedent part andconsequent part of each rule. For example, if there exists FG[u]=(1, 0, 0, 0, 0, 0) andFG[v]=(1, 0, 0,

1, 0, 0) corresponding to frequent fuzzy grids Luand Lv;

where LvCL_u; respectively; then FG[u] AND FG[v]=(1,

0, 0, 0, 0, 0), corresponding to the frequent fuzzy grid Ax1

3;1; is generatedto be the antecedent part of rule, say R:

Then, FG[u] XOR FG[v]=(0, 0, 0, 1, 0, 0), correspond-ing to the frequent fuzzy grid Ax2

3;1; is generatedto be the

consequent part of rule R. Then, FSðAx1

3;1 Ax23;1Þ=FSðAx13;1Þ is easily obtainedby (6).

The redundant rules must be further eliminated in

order to achieve the goal of compactness (Hu et al.,

2002). If there exist two rules R and S; having the same

consequent part andthe antecedent part of R is

containedin that of S, then R is redundant and can be discarded, and S is temporarily reserved. For example, if S is ‘‘Ax1 K1;j1 A x2 K2;j2 ? A xk1 Kk1;jk1) A xa C;ia’’, then R

can be eliminated. This is because that the number of antecedent conditions should be minimized.

On the other hand, for improving the classiﬁcation

performance of fuzzy rule-basedsystems, Nozaki et al.

(1996)proposed the adaptive rules to adjust the grade of certainty of each rule. These useful rules are further incorporated into the proposed methods. The adaptive procedure for adjusting fuzzy conﬁdences is presented as follows:

Set the maximum number of iterations Jmax:

Set J to be zero. Repeat

J ¼ J þ 1

For each training sample tp do

a. Findthe ‘‘ﬁring’’ fuzzy rule Rb:

b. If tpis correctly classiﬁedthen FC(Rb) is adjusted as follows:

FCðRbÞ ¼ FCðRbÞ þ Z1ð1 FCðRbÞÞ ð7Þ

otherwise, FC(Rb) is adjusted as follows:

FCðRbÞ ¼ FCðRbÞ þ Z2FCðRbÞ ð8Þ

where Z₁and Z₂ are learning rates.

Table 1

Initial table FGTTFS for an example

Fuzzy gridFG TT FS Ax1 3;1 A x1 3;2 A x1 3;3 A x2 3;1 A x2 3;2 A x2 3;3 t1 t2 Ax1 3;1 1 0 0 0 0 0 m x1 3;1 (t11) m x1 3;1(t21) FS(A x1 3;1) Ax1 3;2 0 1 0 0 0 0 m x1 3;2ðt11Þ m x1 3;2(t21) FS(A x1 3;2) Ax1 3;3 0 0 1 0 0 0 m x1 3;3(t11) m x1 3;3(t21) FS(A x1 3;3) Ax2 3;1 0 0 0 1 0 0 m x2 3;1 (t12) m x2 3;1(t22) FS(A x2 3;1) Ax2 3;2 0 0 0 0 1 0 m x2 3;2(t12) m x2 3;2(t22) FS(A x2 3;2) Ax2 3;3 0 0 0 0 0 1 m x2 3;3(t12) m x2 3;3(t22) FS(A x2 3;3)

(5)

End. Until J ¼ Jmax

The ﬁring rule is foundby determining the class label

of tp through the use of fuzzy rules derived by the

proposedmethod. Without losing generality, if the

antecedent part of a fuzzy classiﬁcation rule Rt is

Ax1

K1;i1 A

x2

K2;i2 ? A

xt

Kt;it; then we can calculate its

ﬁring strength otfor tp as follows:

ot¼ mxK11;j1ðtp1Þm

x2

K2;j2ðtp2Þ?m

xt

K;itðtptÞ ð9Þ

Then tp can be determined to categorize to the class

label which is the consequent part of the ‘‘ﬁring’’ rule, say Rb; if

otFCðRbÞ ¼ max

j fojFCðRjÞjRjATRg; ð10Þ

where TR is the set of fuzzy rules generatedby the proposed method. The adaptive rules are further

employed to adjust the fuzzy conﬁdence of Rb: If tp is

correctly classiﬁedthen FC(Rb) is increased; otherwise,

FC(Rb) is decreased.Nozaki et al. (1996)also suggested

that the learning rates shouldbe speciﬁedas

0oZ₁5_Z

2o1: Actually, Z1¼ 0:001; Z2¼ 0:1 and Jmax¼

500 are usedin the experiment. In the subsequent section, experimental results from the iris data demon-strate the effectiveness of the proposedmethod. How-ever, the aim of the experiment is to show the feasibility andthe problem-solving capability of the proposed

methodfor classiﬁcation problems. That is, methods

about the acquisition of appropriate parameter speciﬁ-cations to obtain higher classiﬁcation accuracy rates and smaller number of fuzzy if–then rules are not considered in this paper.

4. Experimental results

The classification performances of the proposed methodwith two fuzzy partition types are examined by computer simulations. We employ the proposed methodto findfuzzy classification rules from the iris data that consists of three classes and each class consists of 50 samples. Moreover, class 2 overlaps with class 3. Suppose that the attributes ‘‘sepal length’’, ‘‘sepal width’’, ‘‘petal length’’, and ‘‘petal width’’ are denoted by x1; x2; x3; and x4respectively. x5denote ‘‘class label’’ (i.e., d ¼ 5) to which tp¼ ðtp1; tp2; ?; tp5Þ; (1ppp150)

belongs. Only three linguistic values can be deﬁned in x5; they are Aclasslabel3;1 : ‘‘Class 1’’, Aclasslabel3;2 : ‘‘Class 2’’, and Aclasslabel

3;3 : ‘‘Class 3’’ without doubt.

K ¼ 6 is ﬁrst considered for each attribute except x5:

Simulation results with different user-speciﬁed minimum

supports are shown inTables 2 and3using MTDM and

STDM, respectively. Tables 2 and3 indicate that

classiﬁcation rates are more sensitive to larger min FS (i.e., min FS=0.18, 0.20). Therefore, the smaller min FS for both MTDM andSTDM shouldbe a better choice

when all non-redundant rules are reserved. That is, larger min FS may lead to discarding more useful fuzzy grids, thus reducing the effectiveness of fuzzy rules. From Tables 2 and3, we can see that the best classiﬁcation rate 100.00% obtainedby MTDM is higher than that (i.e., 97.33%) obtainedby STDM. In comparison with STDM, MTDM uses more fuzzy if– then rules to classify samples. The best results of SFPBM andthe proposedmethodare also summarized inTable 4. Except for min FS=0.20, we can see that the best results of the proposedmethodwith MTDM outperforms those of SFPBM with MTDM for each value of min FS. It is obvious that the proposedmethod with STDM outperforms SFPBM with STDM for each value of min FS.

Simulation results with min FS=0.05 anddifferent

values of K are shown inTables 5 and6with MTDM

andSTDM, respectively. FromTables 5 and6, we can

see that the classiﬁcation rates seem not to be sensitive to K for both partition methods. Therefore, it seems that K is not a serious problem from the viewpoint of

Table 2

Simulation results by the proposedmethodwith the MTDM with K ¼ 6

Min FS Classiﬁcation rate (%) Number of rules

0.05 100.00 101 0.10 100.00 71 0.15 100.00 48 0.18 96.67 35 0.20 96.00 28 Table 3

Simulation results by the proposedmethodwith the STDM with K ¼ 6

Min FS Classiﬁcation rate (%) Number of rules

0.05 97.33 30 0.10 97.33 17 0.15 95.33 11 0.18 93.33 5 0.20 93.33 5 Table 4

Classiﬁcation rates (%) of SFPBM andthe proposedmethodwith K ¼ 6 andvarious min FS

Min FS Division method

MTDM STDM SFPBM The proposed method SFPBM The proposed method 0.05 96.67 100.00 96.67 97.33 0.10 96.67 100.00 96.67 97.33 0.15 96.67 100.00 92.67 95.33 0.20 96.67 96.00 88.67 93.33

(6)

classiﬁcation rates. The classiﬁcation rate of the MTDM also outperform that of the STDM for each value of K, andthe best result (i.e., 99.33%) from the STDM is

slightly worse than that of the MTDM. FromTables 2–

6, we can also see that since the min FS andthe min FC

are not optimizedto reduce the number of rules, a large number of rules are generatedwhen the MTDM is used for various K: Although how to set the appropriate values to the min FS andthe min FC is a signiﬁcant work, this topic is not discussed in this paper for simplicity.

Some significant fuzzy if–then rule-basedclassifica-tion systems using simple fuzzy partirule-basedclassifica-tion methods have been proposed, such as the simple-fuzzy-grid method (Ishibuchi et al., 1992), the multi-rule-table method (Ishibuchi et al., 1992), the pruning method(Nozaki et al., 1996), andthe GA-basedmethod(Ishibuchi et al.,

1995). In addition, simulation results of the

aforemen-tioned methods demonstrated by Nozaki et al. (1996)

are summarizedin Table 7. The best results of the

proposedmethodwith MTDM or STDM are also shown in this table. From the viewpoint of classiﬁcation rates, we can see that the proposedmethodwith STDM or MTDM works well in comparison with other fuzzy if–then rule-basedclassiﬁers. It is notedthat the best results of SFPBM with MTDM or STDM can be obtainedby setting appropriate values to min FS and min FC (e.g., min FS=0.10 andmin FC=0.80).

In the above simulation, all 150 samples are usedfor the training process to generate fuzzy rules. To examine the generalization ability of the proposedmethod, we perform the leave-one-out technique, which is an almost unbiasedestimator of the true error rate of a classiﬁer (Weiss andKulikowski, 1991). In each iteration of the leave-one-out technique, fuzzy if–then rules are gener-atedfrom 149 training samples andtestedon the single remaining sample. This procedure is iterated until all the

given 150 samples are usedas a test sample. Now, we try to choose another values of min FS to examine the relationship between min FS andthe generalization ability of the proposedmethod. Simulation results with lower values of min FS (i.e., 0.05, 0.10, 0.15) are shown in Table 8. We can see that the proposedmethodwith MTDM seems not to be sensitive to min FS, andthe best classiﬁcation rate is 96.67%; however, the proposed methodwith STDM is more sensitive to min FS, andthe best classiﬁcation rate is 95.33%. Therefore, from the viewpoint of the generalization ability, we may conclude that the proposedmethodwith MTDM works more robustly than with STDM does.

Basedon the leave-one-out technique, we try to make a comparison between the proposedmethodandthe above-mentionedfuzzy rule-basedsystems. We

sum-marize the simulation results inTable 9. The best result

of the proposedmethodwith MTDM or STDM is also shown in this table. From the viewpoint of classiﬁcation rates, we can see that the proposedmethodwith MTDM performs well in comparison with other fuzzy if–then

Table 5

Simulation results by the proposedmethodwith the MTDM with various K

K Classiﬁcation rate (%) Number of rules

4 100.00 46

5 100.00 71

6 100.00 101

7 100.00 131

Table 6

Simulation results by the proposedmethodwith the STDM with various K

K Classiﬁcation rate (%) Number of rules

4 97.33 25

5 98.00 25

6 97.33 30

7 99.33 30

Table 7

Simulation results by various fuzzy if-then rule-basedclassiﬁcation systems MethodClassiﬁcation rate (%) The proposedmethodwith MTDM 100.00 The proposedmethodwith STDM 99.33 Simple-fuzzy-grid98.67 Multi-rule-table 95.33 Pruning 100.00 GA-based99.47 SFPBM with MTDM 96.67 SFPBM with STDM 96.67 Table 8

Classiﬁcation rates by the leave-one-out technique for MTDM and STDM

MethodMinimum fuzzy support

0.05 0.10 0.15

MTDM 95.33 96.67 95.33

STDM 92.67 94.00 95.33

Table 9

Simulation results by the leave-one-out technique for various fuzzy if-then rule-basedclassiﬁcation systems

MethodClassiﬁcation rate (%) The proposedmethodwith MTDM 96.67 The proposedmethodwith STDM 95.33 Simple-fuzzy-grid96.67 Multi-rule-table 94.67 Pruning 93.33 GA-based94.67 SFPBM with MTDM 96.67 SFPBM with STDM 96.67

(7)

rule-basedclassifiers. However, it shouldbe notedthat the classification performance of the GA-basedmethod can be highly improvedby carefully tuning parameters (e.g., 97.33%). We also findthat the best rate (96.67%) of SFPBM with STDM outperforms that of the proposedmethodwith STDM (95.33%). This means that the reservation of all non-redundant rules for the latter methodmay leadto overfitting.

On the other hand, classiﬁcation rates of nine fuzzy classiﬁcation methods, including fuzzy integral with perceptron criterion, fuzzy integral with quadratic criterion, minimum operator, fast heuristic search with Sugeno integral, simulatedannealing with Sugeno integral, fuzzy k-nearest neighbor, fuzzy c-means, fuzzy c-means for histograms andhierarchical fuzzy c-means, for the iris data estimated by the leave-one-out

technique were reportedby Grabisch andDispot

(1992). From the summarizedresults shown in

Table 10, we can see that the best result (i.e. 96.67%) was obtainedby using the fuzzy integral with quadratic criterion or the fuzzy k-NNR method. It is clear that the best result of the proposedmethodwith MTDM (i.e., 96.67%) is equal to the best result of these nine fuzzy methods, whereas the best result of the proposed methodwith STDM (i.e., 95.33%) is slightly worse than those of the fuzzy integral with quadratic criterion, the minimum operator andthe fuzzy k-nearest neighbor.

5. Discussions and conclusions

In this paper, we propose a two-phase fuzzy data mining technique that can ﬁndfuzzy association rules for classiﬁcation problems basedon SFPBM proposed by Hu et al. (2002). There are three main differences between the proposedmethodandSFPBM. First, ignoredfuzzy subspaces are consideredin the proposed method. Second, all non-redundant fuzzy if–then rules take part in the mining process by setting zero to min

FC. Specially, adaptive rules proposed byNozaki et al.

(1996) are further incorporatedinto the proposed methodfor improving the classiﬁcation performance.

From summarizedresults shown inTable 4, we can see

that the proposedmethodwith STDM or MTDM performs well in comparison with SFPBM with STDM or MTDM.

The generalization ability of the proposedmethodis examined by the iris data, indicating that best classiﬁca-tion rate of the MTDM apparently outperforms that of the STDM. Simulation results with various parameter speciﬁcations (i.e., min FS and K) also demonstrate that

the proposedmethodmay effectively derive fuzzy

classiﬁcation rules.

On the other hand, we do not discuss how to set the appropriate values to the min FS andthe min FC for simplicity. Actually, this is a signiﬁcant work. Since the parameter speciﬁcation (i.e., min FS andmin FC) is not

optimizedto reduce the number of rules, as we have

shown in the previous section, a large number of rules are generatedwhen STDM or MTDM is usedfor various K. Therefore, it is necessary to develop methods

such as the genetic algorithms (Goldberg, 1989) to

automatically determine the appropriate values of min FS andthe min FC to obtain higher classiﬁcation performances with a compact set of fuzzy if–then classiﬁcation rules. Then, the proposedmethodmay be

viewedas an effective knowledge acquisition tool for

classiﬁcation problems.

Moreover, since fuzzy knowledge representation can facilitate interaction of the expert system andthe users (Zimmermann, 1996), it is necessary to extendthe proposedmethodto ﬁndother types of fuzzy associa-tion rules to ease the fuzzy knowledge acquisiassocia-tion bottleneck in building prototype expert systems or fuzzy rule-basedsystems. The aforementionedissues are left

for future works. Additionally, Hong et al. (2001)

discussed the relationship between the computation time

andthe number of rules for the fuzzy data mining

technique. We consider that their study will provide useful suggestions to improve our method.

References

Anderson, E., 1935. The irises of the gaspe peninsula. Bulletin of the American Iris Society 59, 2–5.

Agrawal, R., Imielinski, T., Swami, A., 1993a. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering 5 (6), 914–925.

Agrawal, R., Imielinski, T., Swami, A., 1993b. Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, D.C. pp. 207–216.

Table 10

Classiﬁcation accuracy rates of various fuzzy classiﬁcation methods for the iris data Fuzzy methods

Perceptron criterion Quadratic criterion Minimum operator Fast heuristic search Simulated annealing

95.33% 96.67% 96.00% 92.00% 91.33%

Fuzzy k-nearest neighbor Fuzzy c-means Fuzzy c-means for histograms Hierarchical fuzzy c-means

96.67% 93.33% 93.33% 95.33%

(8)

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., 1996. Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (Eds.), Ad-vances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, pp. 307–328.

Berry, M., Linoff, G., 1997. Data Mining Techniques: For Marketing, Sales, andCustomer Support. Wiley, New York.

Chuang, J.H., Wang, P.H., Wu, M.C., 1999. Automatic classiﬁcation of block-shapedparts basedon their 2D projections. Computers andIndustrial Engineering 36 (3), 697–718.

Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA.

Grabisch, M., Dispot, F., 1992. A comparison of some methods of fuzzy classiﬁcation on real data. In: Proceedings of the Second International Conference on Fuzzy Logic andNeural Networks, Iizuka, Japan, pp.659–662.

Han, J.W., Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco.

Hong, T.P., Wang, T.T., Wang, S.L., Chien, B.C., 2000. Learning a coverage set of maximally general fuzzy rules by rough sets. Expert Systems with Applications 19 (2), 97–103.

Hong, T.P., Kuo, C.S., Chi, S.C., 2001. Trade-off between computa-tion time andnumber of rules for fuzzy mining from quantitative data. International Journal of Uncertainty Fuzziness and Knowl-edge-Based Systems 9 (5), 587–604.

Hu, Y.C., Chen, R.S., Tzeng, G.H., 2002. Mining fuzzy association rules for classiﬁcation problems. Computers andIndustrial Engineering 43 (4), 735–750.

Ishibuchi, H., Nozaki, K., Tanaka, H., 1992. Distributedrepresenta-tion of fuzzy rules andits applicaDistributedrepresenta-tion to pattern classiﬁcaDistributedrepresenta-tion. Fuzzy Sets andSystems 52 (1), 21–32.

Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H., 1995. Selec-ting fuzzy if–then rules for classiﬁcation problems using genetic algorithms. IEEE Transactions on Fuzzy Systems 3 (3), 260–270.

Ishibuchi, H., Nakashima, T., Murata, T., 1999. Performance evaluation of fuzzy classiﬁer systems for multidimensional pattern classiﬁcation problems. IEEE Transactions on Systems, Man, and Cybernetics 29 (5), 601–618.

Ishibuchi, H., Yamamoto, T., Nakashima, T., 2001. Fuzzy data mining: effect of fuzzy discretization. In: Proceedings of the First

IEEE International Conference on Data Mining, San Jose, USA, pp.241–248.

Jang, J.S.R., 1993. ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, andCybernetics 23 (3), 665–685.

Jang, J.S.R., Sun, C.T., 1995. Neuro-fuzzy modeling and control. Proceedings of the IEEE 83 (3), 378–406.

Kim, D., Bang, S.Y., 2000. A handwritten numeral character classiﬁcation using tolerant rough set. IEEE Transactions on Pattern Analysis andMachine Intelligence 22 (9), 923–937. Nozaki, K., Ishibuchi, H., Tanaka, H., 1996. Adaptive fuzzy

rule-basedclassiﬁcation systems. IEEE Transactions on Fuzzy Systems 4 (3), 238–250.

Pedrycz, W., 1994. Why triangular membership functions? Fuzzy Sets andSystems 64, 21–30.

Pedrycz, W., Gomide, F., 1998. An Introduction to Fuzzy Sets: Analysis andDesign. MIT Press, Cambridge, MA.

Ravi, V., Zimmermann, H.-J., 2000. Fuzzy rule basedclassiﬁcation with FeatureSelector andmodiﬁedthresholdaccepting. European Journal of Operational Research 123 (1), 16–28.

Ravi, V., Reddy, P.J., Zimmermann, H.-J., 2000. Pattern classiﬁcation with principal component analysis andfuzzy rule bases. European Journal of Operational Research 126, 526–533.

Wang, L.X., Mendel, J.M., 1992. Generating fuzzy rules by learning from examples. IEEE Transactions on Systems, Man, and Cybernetics 22 (6), 1414–1427.

Weiss, S.M., Kulikowski, C.A., 1991. Computer Systems That Learn: Classiﬁcation and Prediction Methods from Statistics, Neural Nets, Machine Learning, andExpert Systems. Morgan Kaufmann, Los Altos, CA.

Zadeh, L.A., 1975a. The concept of a linguistic variable and its application to approximate reasoning (Part 1). Information Science 8 (3), 199–249.

Zadeh, L.A., 1975b. The concept of a linguistic variable and its application to approximate reasoning (Part 2). Information Science 8 (4), 301–357.

Zadeh, L.A., 1976. The concept of a linguistic variable and its application to approximate reasoning (Part 3). Information Science 9 (1), 43–80.

Zimmermann, H.-J., 1996. Fuzzy Set Theory andIts Applications. Kluwer, Boston.