Integration of Fuzzy Classifiers with Decision Trees
I-Jen Chiang Jane Yurig-jen Hsu
chiang@robot.csie.ntu.edu.tw yjhsu@csie.ntu.edu.tw Department of Comuter Science and Information Engineering
National Taiwan University Taipei, Taiwan 107, ,R.O.C.
Abstract
It is often difficult to make accurate predictions given uncertain and noisy data for classification. Unfortunately, most real-world problems have to deal with such imperfect data. This paper presents a new model for fuzzy classification by integrating fuzzy classifiers with decision trees. In this approach, a fuzzy classification tree is constructed from thc train- ing data set. Instead of defining a specific class for a given instance, the proposed fuzzy classification scheme computes its degree of possibility for each class. The performance of the system is evaluated by empiri- cally compared with a standard decision tree classifier C4.5 on several benchmark data sets the UCI machine learning repository.
1 Introduction
Classification techniques such as decision trees have been widely used for discovering regularities in complex data. Successful applications include pro- cess control, pattern recognition, and diagnosis. Most existing classification techniques have difficulty in dealing with uncertain and noisy d a t a 5 , 6 1 9 . Un- fortunately, for many real world problems, uncertainty and noise in d a t a cannot be ignored.
Previously, we have introduced the concept of fuzzy classzficatzon t r e e s
(FCT)
for domains with vague classifications. Rather than performing a two-stage process that couples decision trees with either p r e - f u z z ~ f i c a t z o n ~ ~ ~ or post-fuzzzficatzon 1 ~ 3 ) 7 1 the fuzzy classification lree presents a theoretically sound integration of fuzzy classifiers with decision trees. T h e structure is very robust with respect t o a large amount of noise in the data for classification.In this paper, we will present the algorithm for constructing fuzzy classi- fication trees, as well as some empirical results on five different data sets from the UCI repository. Section 2 briefly reviews the definitions of fuzzy classifi- cation trees. Section 3 describes the basic algorithm for constructing a fuzzy classification tree from a d a t a set. The empirical results comparing FCT with C4.5 are sumniarivd in section 4, followed by the conclusion.
2 Definitions
This section introduces the problem of fuzzy classification.
For any classification problem, the collection of all possible instances con- stitute the instance space, which is denoted by X . Let C = {Cl,
Ca,
. .
.C,
j is a set of classes. A classifier or decision function, D , is a function that maps each instance into a class. That is, D ( x ) = Ci, where x E X and Ci EC
A fuzzy classifier is a function,
F
:X
+ {(PI,. . .,
pn)lpi E [0, l]}, such that each pi is a function defining the possibility that an instance belongs to the classCi.
A fuzzy classification tree (FCT) is used t o implement the fuzzy classifier. Given an F C T , let N L denote the node labelled by L , and BL denote the branch leading into node N L . The children of N L are labelled as N L . ~ , where
i
E
{ 1 , 2 , 3 , ..
.). Each node N L is associated with a class CL and the possibility function PL. Each branch BL is associated with a membership function ,ULwhich is a function that defines the degree of possibility for any instance x E
X
to be classified as class CL based on attribute A.. The possibility functionPL
is the composition of the membership functions along the branches from root to node N L . IfNL
is the root node,PL
is set t o be 1.In general, there are many FCTs that implement the same fuzzy classifier. The entropy has been used to evaluate the FCTd. The entropy function of node N L is
The entropy, i.e., information content, of TL can be defined as
b L
-
where bL is number of branches from node N L ,
PL
be the sum of possibilityPL(x)
for all x in node N L ,P i
be the sum of possibility P L ( x ) for allx
in node N L of class cE
C. The information gain a t node N L is defined byGain(TestL) = Info(SL) - InfoT(SL) due to the test TestL.
3 Construction of f i z z y Classification Trees
In this section, an algorithm for fuzzy classification is presented. The main algorithm for constructing an F C T is shown in Figure 1.
Initially, the system is given a set of training instances,
SI
denoted bySo
in i t , which isSo.
LetC
contain the set of labels corresponding to the unexpanded leaf nodes of the tree. LetSL
denote the set of instances that have bee assigned to node N L . The algorithm starts by creating a root node N I , adding its label t oC,
and initializingSI
to beSo.
Algorithm BUILD-FCT
[Input] A set of real-valued training instances
SO.
[Output] An F C T
1.
L - 1 2.c c
(1) 3. SI c-so
4. Until C =4
5. 6. 7 . 8. 9.10.
11. 12. L L random(C)c
-
c
\
{ N , ) Va,, TL-
spawnnew-tree(NL, a , ) Best +-- T k s.t. Info(Tk) = m a x I n f o ( q )Gazn
-
Info(TL) - Info(Best)af Gazn
>
E3
Add the labels of all leaf nodes of Best into
C
Asszgn subsets of
SL
intoSL
1 , .. . ,
SL
k Figure 1: The algorithm for constructing an FCT.The procedure spawnnewAree(NL, U,) defines an expansion from node
N L according to attribute ai. The details of procedure is shown in Figure 2. All attributes of an instance are viewed as coordinates in an n-dimensional Euclidean space.
Consider any cluster of class
Ci
along the coordinate a, resulted from step3 . We first calculate its center of gravity by standard geometric method. Suppose that there are k branches generated by an attribute. The proce- dure for computing the entropy of any given FCT is defined by the algorithm in Figure 3 .
Algorithm S P A W N B E W-TREE
[Input] A leaf node N L and an attribute ai. [Output] A subset expanded from
N L .
1. V j Project instances in SL of class
C,
onto attribute a,2. Smooth the resulting histograms using k-median method 3. Partztzon each smoothed histogram into clusters
4. Create a new branch from N L for each cluster.
5 . Define the membership function for each branch
Figure 2: The spawnnew-tree algorithm
Algorithm EVAL UATE-ENTROPY [Input] An F C T with root node N L . [Output] The entropy value of 7~
1. i - 0 2 . VC, 3. i C i S 1
4.
PLC' = u3(x) 6. P L = -c
p? VXE S L VX€ S L 5. PLZ = ut(.) c, at N L pc3 p=3 7. Info(SL) = - *In-&- C , at N LC4.5 F C T
The clustering method determines the performance of FCT. As the result on the aonosphere problem, the accuracy of F C T is lower than the accuracy of C4.5. Since the values of all the attributes are distributed in [-1, 11, the size of the clusters has an effect on the accuracy of FCT. To improve the clustering method is one of our further objectives.
Golf Monk1 Monk2 Monk3 Ionosphere 100% 76.6% 65.3% 92.6% 96.5% 100% 86.5% 73.4% 93.2% 96.2%
5 Conclusion
This paper has presented an algorithm that integrates the fuzzy classifiers with decision trees. The algorithm attempts to expand the F C T while minimizing
its entropy at each step.
We have compared F C T with C4.5 with the empirical results of five data sets in the above section. From the noise-free data (Golf) to the data with a
great amount of noise (Monk2), the accuracy rate of F C T is better than C4.5. C4.5 classifies an instance into exactly one class. The instances with attribute values around class boundaries are forced to be classified into a single class, which may result in wrong predictions, especially in the noisy domains. Instead of making a rigid classification, it is sometimes necessary t o identify more than one possible classifications for a given instance.
FCTs allow multiple predictions t o be made, each of which is associated with a degree of possibility. In application domains that involve a large amount of data with uncertainty, such as medicine or business, fuzzy classification trees can serve as a useful tool for generating fuzzy rules or discovery knowledge in database. 1. 2. 3. 4. 5. 6. 7. 8. 9.
Z. Chi and H. Yan, “ID3-Derived Fuzzy Rules and Optimized Defuzzifi- cation for Handwritten Numeral Recognition,”
IEEE
Trans. Fuzzy Sys- tems, vo1.4, No.1, 1996, 24-31.I. -J. Chiang and J . Y. -j Hsu, “Fuzzy Classification Trees,” Proc. o f
Ninth Internat. Symposaum on Artaficaal Intellagence, Cancon, Mexico, 1996.
E(. J . Cios and L. M. Sztandera, “Continuous ID3 algorithm with Fuzzy Entropy Measures,” Proc. of the Internat. Conference on Fuzzy Systems, J . R. Quinlan, (24.5 programs for machzne learnzng, Morgan Kaufmann, San Mateo, CA, 1993.
M. Sugeno and
G. T.
Kang, “Structure Identification of Fuzzy Model,” Fuzzy Sets and Systems, V01.28, 1988, 15-33.T. Takagi and M . Sugeno, “Fuzzy Identification of Systems and its Appli- cations to Modeling and Control,” IEEE Trans. System Man Cybernet., T. Tani, M. Sakoda and K. Tanaka, “Fuzzy Modeling by ID3 Algorithm and its Applications to Prediction of Heater Outlet Temperature,” Proc. of Second IEEE Internat. Conference on Fuzzy Systems, 1992, 923-930. R. Weber, “Automatic Knowledge Acquisition for Fuzzy Control Ap- plication,” Proc. of the Internat. Symposaum on Fuzzy Systems, 1992, Y. Yuan and M. J . Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets
and Systems, 69, 1995, 125-139. 1992, 469-476.
V01.5, 1985, 116-132.