Introduction of Pruning method - Pruning Methods for Fuzzy ID3

Chapter 3. Pruning Methods for Fuzzy ID3

3.1. Introduction of Pruning method

There are many pruning methods which can be grouped into two classes:

pre-pruning and post-pruning. The former approaches stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data, and the other approaches that allow the tree to ovefit the data, and then post-prune it. Although the first kind of approaches might seem more direct, the second kind of approaches of post-pruning has been found to be more successful in practice. The reason involves that pre-pruning methods have difficulty in estimating precisely when to stop growing the tree. Moreover, a very important benefit of post-pruning methods is that they can generate a sequence of trees instead of a single one, which allows expert to seek the optimal one out based on his professional knowledge. In this thesis, we propose to adopt the post-pruning method for our study.

For the generation of fuzzy decision trees, the tree size is a very important issue.

The aim of pruning is to reduce the number of nodes while the accuracy is retained. In this thesis, we propose to use mathematical method to investigate how the rules of pruning algorithms influence fuzzy decision trees.

3.2. Description of Our Pruning Methods

We have used the GA to improve the performance of the classification task and decrease the rule number as well. In this thesis, we propose three pruning methods to

further minimize the number of rules. The first pruning method is described as follows:

1 ) For each rules, when any data point is classified, we maintain the production value of the membership value and the certainty of each class,

, where n is an index of each class.

) (n J

2 ) corresponding to the correct class of the data point gets positive sign and the others get negative sign.

) (n J

3 ) Sum for all classes of , and then we get the credit of the rule to classify this data point.

...

4 ) Repeat from 1) until all data points are classified by this rule and we get the final credit of this rule.

5 ) Remove the redundant rules whose the credits are less than certain threshold and/or have big drops.

In the second pruning method, when any data point is classified, we define the second credit formula V_j² as

where c is the class number of the training data, is the production value of the membership value which corresponds to the maximum possibility of the class for the j-th rule; is the possibility ratio assigned to the maximum class of the j-th rule.

And is the production value of the membership value which corresponds to the

second largest possibility of the class for the j-th rule, and is the possibility ratio assigned to the second largest class for the j-th rule; other and follow similarly. Value is the maximum possibility of the class which is the sums of each class assigned by each rule; is the second largest possibility of the class which is the sums of each class assigned by each rule. The credit value gets positive sign when the class of the and is the same. On the contrary, if the class of

The third pruning method is improved from the second kind of the pruning method. We revise the third credit formula V_j³ as

where is the production value of the membership value which corresponds to the

maximum possibility of the class for the j-th rule, is the production value of the membership value which corresponds to the second largest possibility of the class for the j-th rule. Value is the maximum possibility of the class which is the sums of each class assigned by each rule, is the second largest possibility of the class which is the sums of each class assigned by each rule. The credit value gets positive sign when the class of the and is the same. On the contrary, if the class of

the p², the credit value V_j³ is assigned to 10000. Instead of p², ^p²_c₋₁ may be another solution.

The credit of each rule computed and then arranged in an order of from the largest to the smallest. This number represents the effectiveness of the rule in performing the classification task. If the rule is essential for classification, then it would get a high credit value. On the contrary, if the credit is low, this rule could be an insignificant or redundant rule. The reason is explained as follows. The rule that classifies the instance to the true class or to the wrong class will be cumulatively counted. In this way, we can prune the insignificant or inconsistent rules to obtain a smaller and efficient rule base set. After deleting the inefficient rule or rules, we retune the parameters of pruned Fuzzy ID3 tree again by GA.

For example, after we get the credit of each rule of the training set as shown in Table I, we then sort and plot the total credit of all rules of each credit computation of the above three pruning methods. They are as shown in Figs. 3.1(a)–(c), respectively.

We find that the credit of the 6-th rule and 7-th rule are much smaller than others, which indicates that these two rules may be redundant. Hence, we can select the following thresholds: (a) between 1 and 0.589; (b) between 0.645 and 6.965; (c) between 1.331 and 1.15, for these three methods, respectively, and remove the redundant rules. The pruned fuzzy decision trees of the training set as shown in Table I, are shown in Fig. 3.2(a)–(c), respectively. The flowchart of our genetic algorithm based fuzzy ID3 method is illustrated in Fig. 3.3.

à à

(a)

(b)

(c)

Fig. 3.1. The credit plot on each rule: (a) result of the first pruning method, (b) result of the second pruning method, and (c) result of the third pruning method.

(a)

(b)

(c)

Fig. 3.2. Fuzzy decision trees after pruning: (a) result of the first pruning method, (b) result of the second pruning method, (c) result of the third pruning method.

Fig. 3.3. Flowchart of genetic algorithm based fuzzy ID3 method.

Chapter 4. Simulation and Experiment

As mentioned in Chapter 2, we introduce a fuzzy ID3 algorithm to construct a fuzzy classification system whose membership functions and leaf conditions are tuned by GA. In this chapter, we apply the algorithm to classify some data sets, which include continuous, discrete, and mixed-mode data sets [14], [16]. We also use this method together with three pruning methods to classify these data sets and compare the results. This simulation was done on Pentium 4 CPU 3.4 GHz personal computers with 2GB RAM.

4.1. Description of The Data Sets

The ten well known data sets employed for experiments are obtained from the University of California, Irvine, Repository of Machine Learning databases (UCI) [20]. Their characters are briefly described below.

1 ) Crude_oil: Gerrid and Lantz analyzed Crude_oil samples from three zones of sandstone. The Crude_oil data set with 56 examples has five attributes and three classes named wilhelm, submuilinia, and upper. The attributes are vanadium (in percent ash), iron (in percent ash), beryllium (in percent ash), saturated hydrocarbons (in percent area), and aromatic hydrocarbons (in percent area).

2 ) Glass Identification Database: The data set represents the problem of identifying glass samples taken from the scene of an accident. The 214 examples were originally collected by B. German of the Home Office

Forensic Science Service at Aldermaston, Reading, Berkshire in the UK.

The nine attributes are all real valued and fully known, representing refractive index and the percent weight of oxides such as silicon, sodium, and magnesium. The six classes are named as building windows float processed, building windows not float processed, vehicle windows float processed, containers, tableware, and headlamps

3 ) Iris Plant Database: The Iris data set, Fisher’s classic test data (Fisher, 1936), has three classes with four-dimensional data consisting of 150 examples. The four attributes are: sepal length, sepal width, petal length, and petal width. This data set gives good results with almost all classic learning methods and has become a sort of benchmark data.

4 ) Myo_electric: The Myo_electric data set is extracted from a problem in discriminating between electrical signals observed at the human skin surface. This is a four-dimensional data set consisting of 72 examples divided into two classes.

5 ) Norm4: The data set has 800 examples consisting of 200 examples each from the four components of a mixture of four class 4-variate normals.

6 ) BUPA liver disorders: This UCI data set was donated by R. S. Forsyth.

The problem is to predict whether or not a male patient has a liver disorder based on blood tests and alcohol consumption. There are two classes, six continuous attributes, and 345 examples.

7 ) Promoter Gene Sequences Database: Promoters have a region where a protein (RNA polymerase) must make contact and the helical DNA sequence must have a valid conformation so that the two pieces of the contact region spatially align. The data set with 106 examples has 57 attributes and two classes. All attributes are discrete.

8 ) StatLog Project Heart Disease dataset: This UCI data set is from the Cleveland Clinic Foundation, courtesy of R. Detrano. The problem concerns the prediction of the presence or absence of heart disease given the results of various medical tests carried out on a patient. There are two classes, seven continuous attributes, six discrete attributes, and 270 examples.

9 ) Golf: The data set with 28 examples has four attributes and two classes named play, and don’t play. There are 2 continuous and 2 discrete attributes. The attributes are outlook, temperature, humidity, and windy.

10) StatLog Project Australian Credit Approval: This credit data originates from Quinlan. This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. The Australian data set with 690 examples has 14 attributes and two classes. There are 6 continuous and 8 discrete attributes.

In older to clearly summarize the ten data sets, we list the properties of them in Table Ⅲ and the partial examples of our testing data sets are illustrated in Fig. 4.1.

TABLE III

SUMMARY OF THE DATABASES EMPLOYED

Data set # of examples # of attributes # of continuous

attributes # of classes

Crude oil 56 5 5 3

Glass 214 9 9 6

Iris 150 4 4 3

Myo_electric 72 4 4 2

Norm4 800 4 4 4

Bupa 345 6 6 2

Promoters 106 57 0 2

Heart 270 13 6 2

Golf 28 4 2 2

Australian 690 14 6 2

Fig. 4.1. The partial examples of the Crude oil.

4.2. Simulation and Comparison

We use all the data sets to be the training data and the same examples to be the testing data for the performance evaluation with our proposed GA based fuzzy ID3 method. In rule pruning, we remove the redundant rules that will still maintain or only slightly reduced the learning accuracy to be considered as acceptable. We take down the accuracy and the number of fuzzy rules before and after pruning. We have applied three pruning methods; their results are shown in Tables IV, V, and VI, respectively.

For classifying Glass data set, we consider only five attributes that are Na, Mg, Al, K, Ba according to feature subset select [21]. In addition, we divide the sub-feature into two partitions for Norm4, and divide three partitions for Glass. If we do not reduce the attributes of this data set, we will obtain too many rules after tree construction.

Without the restrictions above, the fuzzy ID3 still can not increase in the learning accuracy.

TABLE IV

PERFORMANCE OF THE DATA SETS BEFORE AND AFTER PRUNING BY THE FIRST PRUNING METHOD

Before rule pruning After rule pruning Data set

TABLE V

PERFORMANCE OF THE DATA SETS BEFORE AND AFTER PRUNING BY THE SECOND PRUNING METHOD

Before rule pruning After rule pruning Data set

From Tables IV, V, and VI, we find that most of data sets slightly reduce the accuracy after rule pruning. This has happened possibly because the rule pruning process has removed some rules, which were correctly classifying these data sets. And the residual rules are not able to correctly classify few examples. We can also see that the number of the rules is decreased for all data sets, which shows the effectiveness of our rule pruning process.

From Table VII, we compare the accuracy with different pruning methods for each data set; moreover, we can find that for Myo_electric and Golf data sets, the accuracy have the same with different pruning method. For the others except Promoters data, the third pruning method is superior to others in accuracy. Table VIII compares the number of rules with different pruning methods for each data set. We can find that for Iris, Myo_electric, and Norm4 data sets, the number of rules is the same with different pruning method. For the others expect Crude_oil, Glass, and Bupa, the second pruning method is smaller than others in the number of rules.

TABLE VII

COMPARISON OF THE ACCURACIES WITH DIFFERENT PRUNING

METHODS

After rule pruning

First Pruning Second Pruning Third Pruning Data set

Training acc. Training acc. Training acc.

Crude_oil 97.7 97.7 100.0

Glass 69.0 73.6 74.8

Iris 97.5 91.6 97.5

Myo_electric 92.9 92.9 92.9

Norm4 72.0 83.9 92.8

First Pruning Second Pruning Third Pruning Data set

# of rules # of rules # of rules

Crude_oil 8.0 12.0 10.0

Glass 9.0 10.0 8.0

Iris 3.0 3.0 3.0

Myo_electric 3.0 3.0 3.0

Norm4 4.0 4.0 4.0

For classifying system, the main concern is its accuracy; therefore, we compare performance with the best two pruning methods, i.e., the first and third second pruning method further. We use five-fold cross validation testing which divides the each data set into five folds. Namely, the instances are randomly divided among the five folds. The first fold is the testing data, the others are used for training. Then the learned structure is then tested against the first fold. The same procedure is repeated considering the second fold to be the testing data and the others to be the training data, the procedure is operated until the fifth fold. Average accuracy and the number of rules are recorded in Tables IX and X, respectively. This procedure is repeated three times.

Table IX shows the comparison of the accuracies of the first pruning and the third pruning methods. On average, we find that for Heart data set, the accuracy of the first pruning and second pruning are the same. For the others, the third pruning method outperforms the first pruning method in accuracy. Similarly, Table X shows that the rule number of the third pruning method is smaller than that of the first pruning method. Finally, for our proposed GA based fuzzy ID3 with third pruning method is compared to C5.0 [6]. The reason why we choose C5.0 is that C5.0 is a decent version of C4.5 and is the state-of-the-art algorithm, which works well for many decision-making problems.

TABLE IX

COMPARISON OF THE TESTING ACCURACIES WITH TWO BETTER PRUNING

METHODS

Testing acc. (five-fold CV repeated three times) Data set Pruning Method

1 2 3

TABLE X

COMPARISON OF THE NUMBER OF RULES WITH TWO BETTER PRUNING METHODS

# of rules (five-fold CV repeated three times) Data set Pruning Method

1 2 3

Accuracy comparison result of our method to C5.0 is shown in Table XI. It records the testing accuracy from five-fold cross validation, repeated three times on each data set. On average, we find that our rule-base outperforms C5.0 in seven out of ten data sets. Thus our system has better generalization ability than C5.0. Our method is also compared to C5.0 with respect to the average number of rules. Table XII shows the comparison of the number of rules generated by these two methods at the same experiment. The training time and executive time of our method for the data sets, are recorded in Table XIII. This simulation was done on Pentium 4 CPU 3.4 GHz personal computers with 2GB RAM. The training and executive time for C5.0 are very fast, less than 0.1 sec., for all the above data sets. We find that our rule-base is smaller than C5.0 in seven out of ten data sets. It is evident that our approach tends to produce a better classification accuracy with more concise rule sets than C5.0.

TABLE XI

ACCURACY COMPARISON OF OUR METHOD AND C5.0

Testing acc. (five-fold CV repeated three times) Data set Algorithm

TABLE XII

RULE NUMBER COMPARISON OF OUR METHOD AND C5.0

# of rules (five-fold CV repeated three times) Data set Algorithm

TABLE XIII

TRAINING TIME AND EXECUTIVE TIME OF OUR METHOD

Data set Training Time (sec) (five-fold CV)

Executive Time ( sec) (five-fold CV)

10⁻3

Crude_oil 0.203 1.433

Glass 1.591 0.651

Iris 0.172 0.526

Myo_electric 0.156 1.107

Norm4 0.114 0.098

Bupa 0.422 0.269

Promoters 0.024 5.650

Heart 0.362 0.411

Golf 0.289 3.134

Australian 0.407 0.115

Chapter 5. Conclusion

In this thesis, we proposed a genetic algorithm based fuzzy ID3 method to construct fuzzy classification system, which can accept continuous, discrete, or mixed-mode data sets. Next, we formulated three pruning methods to obtain a more efficient rule base. In the experiment, we found that the third pruning method was superior to the others; therefore, we used genetic algorithm based fuzzy ID3 with the third pruning method to classify data. Our proposed method can directly classify mixed-mode data set with high classification accuracy. On testing to some famous data sets, which include continuous, discrete, and mixed-mode data sets, we have obtained a very high classification accuracy with small number of rules. It is remarked that the decision tree after pruning can lead to a smaller fuzzy rule base and the pruned rule base can usually remain or decrease slightly the classification performance despite the deduction of the number of the rules.

Furthermore, on comparing the results generated by our proposed method with C5.0, we find that our rule-base outperforms C5.0 in seven out of ten data sets. As demonstrated in the testing, the proposed new pruning method is helpful to improve the testing accuracy.

For Heart, Australian, Myo_electric and Norm4 data sets, if the rule number of our fuzzy ID3 method is less than four, the accuracy is greatly decreased after pruning.

In further work, when the rule number is few, we will determine whether the pruning method will be used or not. Computation consuming is another task in the field of

machine learning, we must try to reduce the computation burden in this scheme.

These will be a good challenge to study in the future.

References

[1] E. Alpaydin, Introduction to Machine Learning. Cambridge, Massachusetts: MIT Press, 2004.

[2] Y. Yuan and M. J. Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets Syst., vol.

69, pp. 125–139, 1995.

[3] M. S. Chen and J. Han, “Data mining: An overview from a database perspective,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866–883, Dec. 1996.

[4] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, pp.

81–106, 1986.

[5] J. R. Quinlan, C4.5, Programs for Machine Learning. San Mateo, CA: Morgan Kauffman, 1993.

[6] Data Mining Tools, http://www.rulequest.com/see5-info.html, 2003.

[7] L. Breiman et al., Classification and Regression Trees. Monterey, CA:

Wadsworth and Brooks/Cole, 1984.

[8] M. Umano et al., “Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems,” in Proc. Third IEEE Conf. on Fuzzy Systems, vol. 3, pp. 2113–2118, 1994.

[9] C. Z. Janikow, “Fuzzy decision trees: issues and methods,” IEEE Trans. Syst., Man, Cybern. B, vol. 28, no. 1, pp. 1–14, Feb. 1998.

[10] X. Z. Wang et al., “On the optimization of fuzzy decision trees,” Fuzzy Sets Syst., vol. 112, pp. 117–125, 2000.

[11] I. Hayashi et al., “Generation of decision trees by fuzzy ID3 with adjusting mechanism of AND/OR operators,” in IEEE Int. Conf. Fuzzy Syst., Anchorage,

AK, pp. 681–685, May 1998.

[12] X. Z. Wang and J. R. Hong, “On the handling of fuzziness for continuous-valued

在文檔中基因演算之模糊ID3方法和其決策樹的修剪研究 (頁 35-0)