• 沒有找到結果。

Extracting Decision Trees from a CDR-Tree

4 Concept Drift Rule Mining Tree Algorithm

4.2.2 Extracting Decision Trees from a CDR-Tree

When users require the old or new decision tree, or both, in addition to the concept-drifting rules, CDR-Tree algorithm can provide them efficiently and accurately via the following extraction steps:

Step 1. To extract the old (new) classification model, the splitting attribute values of all internal nodes and the class labels in all leaf nodes of the new (old) instances are ignored.

Step 2. Check each node from the bottom-up and left-to-right.

Step 3. For any node no and its sibling node(s) ns,

(a.) If node no is a leaf and singleton node (i.e. itdoes not have any sibling node), its parent node will be removed from the CDR-Tee and node no will be pulled-up.

This situation is illustrated in Figure 4.4(a).

(b.) If node no is an internal and singleton node, the parent node of no will be removed and the sub-tree rooted at no will be pulled-up. This situation is illustrated in Figure 4.4(b).

(c.) If ns has the same splitting value as that of no and no, ns are all leaf nodes.

CDR-Tree will merge them into a single node nm. The class label of nm is assigned by a majority vote. This situation is illustrated in Figures 4.4(c) and (d).

(d.) If ns has the same splitting value as that of no but not all of them are leaf nodes, CDR-Tree will pick out the internal node nm, which contains the most instances among all internal nodes ns. Except for the sub-tree STm rooted at nm, all sibling nodes and their sub-trees are then removed from the CDR-Tree. The instances, which belong to these removed leaf nodes and sub-trees, are migrated into the internal node nm and will follow the path of STm until they reach a leaf node as Figure 4.4(e) illustrates. Note that a migrant instance may stop in an internal node

nI of STm if there is no branch to proceed. In such a condition, the CDR-Tree will use the splitting attribute in nI to generate a new branch and accordingly a new leaf node as illustrated in Figure 4.4(f). The target class of the leaf nodes in STm and the newly generated leaf node(s) are then assigned by a majority vote.

Step 4. Repeat Step 2 until no more nodes can be merged.

Step 5. If there is a leaf node that is not pure, continue splitting it.

Figure 4.4 Illustrations of the extraction strategy in CDR-Tree algorithm.

Due to the merging strategy, some leaf nodes in a CDR-Tree might be not pure. The goal

of Step 5 is to solve this problem. However, this step can be omitted if users do not really need an overly detailed decision tree. Note that the CDR-Tree keeps the count information in each node during its building step; therefore, the computational cost for this extraction procedure is small. Compare this to building a decision tree from the beginning; CDR-Tree can generate the decision tree much more efficiently and quickly. Below is the pseudo code of the CDR-Tree’s extraction procedure.

The extraction procedure of CDR-Tree CDRTreeExtract (CDR-Tree)

If the decision tree of old instances is requested then

The splitting attribute values in all internal nodes and the class labels in all leaf nodes of the new instances are ignored;

Else

The splitting attribute values in all internal nodes and the class labels in all leaf nodes of the old instances are ignored; Pick out the internal node nm with the most instances among all internal nodes;

Remove all the sibling nodes and their sub-trees except for the sub-tree STm rooted at nm; Migrate the instances belong to these removed leaf nodes and sub-tree into the internal node nm;

For each migrant instance in STm

If it can reach a leaf node Migrate it into the leaf node;

Else

Migrate it into the internal node where no branch can be proceeded;

End if

For each node in the path of STm

If it is a leaf node and contains migrant instances then Assign a target class to it by the majority vote;

If it is a internal node and contains migrant instances then Create new branch and corresponding leaf nodes;

Assign a class label to the new leaf nodes by the majority vote;

For node leaf node in the extracted CDR-Tree If it is not pure then

Go on splitting it;

End if

Return extracted decision tree

Example 4.3: Taking the CDR-Tree in Figure 4.3 as an example, the extract decision trees are shown in Figure 4.5, where Figure 4.5(a) is the old classification model for Table 4.1 without implementing Step 5; Figure 4.5(b) is still the model for Table 4.1 but with the implementation of Step 5; and Figures 4.5(c) and (d) correspond to Table 4.2. By comparing these results to those in Figures 4.1 and 4.2, we can find that without the implementation of Step 5, there are only 1 misclassified instance in Figure 4.5(a) and 2 ones in Figure 4.5(c).

When Step 5 is executed, Figure 4.5(b) and Figure 4.5(d) reach 100% accuracy as are Figure 4.1 and 4.2. Furthermore, Figure 4.5(d) is identical to Figure 4.2, but Figure 4.5(b) is a little different from Figure 4.1. From this example we determine that although the decision tree extracted from the CDR-Tree is not proved to be identical to that built from the beginning, it can reach a comparable accuracy even without the implementation of Step 5.

Figure 4.5 The extracted decision trees from Fig. 4.3: (a) the model of Table 4.1 without implementing Step 5; (b) the model of Table 4.1 with the implementation of Step 5; (c) the model of Table 4.2 without implementing Step 5; (d) the model of Table 4.2 with the implementation of Step 5.

4.3 Experiment and Analysis

We implement CDR-Tree algorithm in Microsoft Visual C++ 6.0 for experimental analysis and performance evaluation. The experimental environment and datasets are clearly described in Section 4.1. In Section 4.2, we demonstrate how the accuracy of CDR-Trees is affected by different drifting levels. We compare the accuracy of C4.5 to that of the model extracted from the CDR-Tree in Section 4.3. Finally, the comparison of execution time among CDR-Tree, the model extracted from the CDR-Tree, and C5.0 is given in Section 4.4.

All experiments here are done on a 3.0GHz Pentium IV machine with 512 MB DDR memory, running Windows 2000 professional. Due to the lack of a benchmark containing concept-drifting datasets, our experimental datasets are generated by IBM data generator. We use IBM data generator because we want to generate several kinds of datasets to evaluate our CDR-Tree. In our experiment, four classification functions, Functions F3, F5, F43 and F45, are randomly selected to generate the experimental datasets.

In order to analyze the performance of our CDR-Tree under different drifting ratios R%

(i.e. the proportion of drifting instances to all instances), we use the four functions mentioned above to generate required experimental datasets. For each function, the noise level is set to 5% and the dataset generated by IBM data generator is regarded as the original/first dataset in the data stream. Then we code a program to amend the first dataset and generate the second ones as a new dataset. Our program works as follows: First, it randomly picks up one instance S in the original dataset and randomly selects attributes am (1 ≤ m ≤ 5) for reference. Instances, which have the same and values in all attributes am to that of S, are picked out. The class label and values belonging to am of these picked out instances are then replaced by a random value in the corresponding value-domain. The main principle of our program is that concept drifts are caused by the variances of some attributes. We limit the number of referable attributes less

than five since drifting concepts should be caused by some but not a lot attribute values and there are only nine basic attributes in IBM data generator. If the number of drifting instances is less than the requirement, the program goes on next loop to get more drifting instances. On the contrary, if there are more instances satisfy the requirement, R % instances are randomly picked up as drifting ones. As a result, each function generates five second datasets with different drifting ratios. A total 4 old datasets and 20 new datasets are generated in our experiments. Every dataset includes 10000 instances and the 10-fold cross-validation test method is applied to all experiments. That is, each experimental dataset is divided into 10 parts of which nine parts are used as training sets and the remaining one as the testing set. In the following experiments, we will use D(i) to denote a dataset generated by Function Fi and D(i,R) to represent a dataset with R% drifting ratio resulting from D(i).