Integration of fuzzy classifiers with decision trees

(1)

Integration of Fuzzy Classifiers with Decision Trees

I-Jen Chiang Jane Yurig-jen Hsu

chiang@robot.csie.ntu.edu.tw yjhsu@csie.ntu.edu.tw Department of Comuter Science and Information Engineering

National Taiwan University Taipei, Taiwan 107, ,R.O.C.

Abstract

It is often difficult to make accurate predictions given uncertain and noisy data for classification. Unfortunately, most real-world problems have to deal with such imperfect data. This paper presents a new model for fuzzy classification by integrating fuzzy classifiers with decision trees. In this approach, a fuzzy classification tree is constructed from thc training data set. Instead of defining a specific class for a given instance, the proposed fuzzy classification scheme computes its degree of possibility for each class. The performance of the system is evaluated by empiri- cally compared with a standard decision tree classifier C4.5 on several benchmark data sets the UCI machine learning repository.

1 Introduction

Classification techniques such as decision trees have been widely used for discovering regularities in complex data. Successful applications include process control, pattern recognition, and diagnosis. Most existing classification techniques have difficulty in dealing with uncertain and noisy d a t a 5 , 6 1 9 . Un- fortunately, for many real world problems, uncertainty and noise in d a t a cannot be ignored.

Previously, we have introduced the concept of fuzzy classzficatzon t r e e s

(FCT)

for domains with vague classifications. Rather than performing a two-stage process that couples decision trees with either p r e - f u z z ~ f i c a t z o n ~ ~ ~ or post-fuzzzficatzon 1 ~ 3 ) 7 1 the fuzzy classification lree presents a theoretically sound integration of fuzzy classifiers with decision trees. T h e structure is very robust with respect t o a large amount of noise in the data for classification.

In this paper, we will present the algorithm for constructing fuzzy classification trees, as well as some empirical results on five different data sets from the UCI repository. Section 2 briefly reviews the definitions of fuzzy classifi- cation trees. Section 3 describes the basic algorithm for constructing a fuzzy classification tree from a d a t a set. The empirical results comparing FCT with C4.5 are sumniarivd in section 4, followed by the conclusion.

(2)

2 Definitions

This section introduces the problem of fuzzy classification.

For any classification problem, the collection of all possible instances con- stitute the instance space, which is denoted by X . Let C = {Cl,

Ca,

. .

.C,

j is a set of classes. A classifier or decision function, D , is a function that maps each instance into a class. That is, D ( x ) = Ci, where x E X and Ci E

C

A fuzzy classifier is a function,

F

:

X

+ {(PI,. . .

,

pn)lpi E [0, l]}, such that each pi is a function defining the possibility that an instance belongs to the class

Ci.

A fuzzy classification tree (FCT) is used t o implement the fuzzy classifier. Given an F C T , let N L denote the node labelled by L , and BL denote the branch leading into node N L . The children of N L are labelled as N L . ~ , where

i

E

{ 1 , 2 , 3 , .

.

.). Each node N L is associated with a class CL and the possibility function PL. Each branch BL is associated with a membership function ,UL

which is a function that defines the degree of possibility for any instance x E

X

to be classified as class CL based on attribute A.. The possibility function

PL

is the composition of the membership functions along the branches from root to node N L . If

NL

is the root node,

PL

is set t o be 1.

In general, there are many FCTs that implement the same fuzzy classifier. The entropy has been used to evaluate the FCTd. The entropy function of node N L is

The entropy, i.e., information content, of TL can be defined as

b L

-

where bL is number of branches from node N L ,

PL

be the sum of possibility

PL(x)

for all x in node N L ,

P i

be the sum of possibility P L ( x ) for all

x

in node N L of class c

E

C. The information gain a t node N L is defined by

Gain(TestL) = Info(SL) - InfoT(SL) due to the test TestL.

3 Construction of f i z z y Classification Trees

In this section, an algorithm for fuzzy classification is presented. The main algorithm for constructing an F C T is shown in Figure 1.

(3)

Initially, the system is given a set of training instances,

SI

denoted by

So

in i t , which is

So.

Let

C

contain the set of labels corresponding to the unexpanded leaf nodes of the tree. Let

SL

denote the set of instances that have bee assigned to node N L . The algorithm starts by creating a root node N I , adding its label t o

C,

and initializing

SI

to be

So.

Algorithm BUILD-FCT

[Input] A set of real-valued training instances

SO.

[Output] An F C T

1.

L - 1 2.

c c

(1) 3. SI c-

so

4. Until C =

4

5. 6. 7 . 8. 9.

10.

11. 12. L L random(C)

c

-

c

\

{ N , ) Va,, TL

-

spawnnew-tree(NL, a , ) Best +-- T k s.t. Info(Tk) = m a x I n f o ( q )

Gazn

-

Info(TL) - Info(Best)

af Gazn

>

E

3

Add the labels of all leaf nodes of Best into

C

Asszgn subsets of

SL

into

SL

1 , .

. . ,

SL

k Figure 1: The algorithm for constructing an FCT.

The procedure spawnnewAree(NL, U,) defines an expansion from node

N L according to attribute ai. The details of procedure is shown in Figure 2. All attributes of an instance are viewed as coordinates in an n-dimensional Euclidean space.

Consider any cluster of class

Ci

along the coordinate a, resulted from step

3 . We first calculate its center of gravity by standard geometric method. Suppose that there are k branches generated by an attribute. The proce- dure for computing the entropy of any given FCT is defined by the algorithm in Figure 3 .

(4)

Algorithm S P A W N B E W-TREE

[Input] A leaf node N L and an attribute ai. [Output] A subset expanded from

N L .

1. V j Project instances in SL of class

C,

onto attribute a,

2. Smooth the resulting histograms using k-median method 3. Partztzon each smoothed histogram into clusters

4. Create a new branch from N L for each cluster.

5 . Define the membership function for each branch

Figure 2: The spawnnew-tree algorithm

Algorithm EVAL UATE-ENTROPY [Input] An F C T with root node N L . [Output] The entropy value of 7~

1. i - 0 2 . VC, 3. i C i S 1

4.

PLC' = u3(x) 6. P L = -

c

p? VXE S L VX€ S L 5. PLZ = ut(.) c, at N L pc3 p=3 7. Info(SL) = - *In-&- C , at N L

(5)

C4.5 F C T

The clustering method determines the performance of FCT. As the result on the aonosphere problem, the accuracy of F C T is lower than the accuracy of C4.5. Since the values of all the attributes are distributed in [-1, 11, the size of the clusters has an effect on the accuracy of FCT. To improve the clustering method is one of our further objectives.

Golf Monk1 Monk2 Monk3 Ionosphere 100% 76.6% 65.3% 92.6% 96.5% 100% 86.5% 73.4% 93.2% 96.2%

5 Conclusion

This paper has presented an algorithm that integrates the fuzzy classifiers with decision trees. The algorithm attempts to expand the F C T while minimizing

(6)

its entropy at each step.

We have compared F C T with C4.5 with the empirical results of five data sets in the above section. From the noise-free data (Golf) to the data with a

great amount of noise (Monk2), the accuracy rate of F C T is better than C4.5. C4.5 classifies an instance into exactly one class. The instances with attribute values around class boundaries are forced to be classified into a single class, which may result in wrong predictions, especially in the noisy domains. Instead of making a rigid classification, it is sometimes necessary t o identify more than one possible classifications for a given instance.

FCTs allow multiple predictions t o be made, each of which is associated with a degree of possibility. In application domains that involve a large amount of data with uncertainty, such as medicine or business, fuzzy classification trees can serve as a useful tool for generating fuzzy rules or discovery knowledge in database. 1. 2. 3. 4. 5. 6. 7. 8. 9.

Z. Chi and H. Yan, “ID3-Derived Fuzzy Rules and Optimized Defuzzifi- cation for Handwritten Numeral Recognition,”

IEEE

Trans. Fuzzy Sys- tems, vo1.4, No.1, 1996, 24-31.

I. -J. Chiang and J . Y. -j Hsu, “Fuzzy Classification Trees,” Proc. o f

Ninth Internat. Symposaum on Artaficaal Intellagence, Cancon, Mexico, 1996.

E(. J . Cios and L. M. Sztandera, “Continuous ID3 algorithm with Fuzzy Entropy Measures,” Proc. of the Internat. Conference on Fuzzy Systems, J . R. Quinlan, (24.5 programs for machzne learnzng, Morgan Kaufmann, San Mateo, CA, 1993.

M. Sugeno and

G. T.

Kang, “Structure Identification of Fuzzy Model,” Fuzzy Sets and Systems, V01.28, 1988, 15-33.

T. Takagi and M . Sugeno, “Fuzzy Identification of Systems and its Appli- cations to Modeling and Control,” IEEE Trans. System Man Cybernet., T. Tani, M. Sakoda and K. Tanaka, “Fuzzy Modeling by ID3 Algorithm and its Applications to Prediction of Heater Outlet Temperature,” Proc. of Second IEEE Internat. Conference on Fuzzy Systems, 1992, 923-930. R. Weber, “Automatic Knowledge Acquisition for Fuzzy Control Ap- plication,” Proc. of the Internat. Symposaum on Fuzzy Systems, 1992, Y. Yuan and M. J . Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets

and Systems, 69, 1995, 125-139. 1992, 469-476.

V01.5, 1985, 116-132.