Fundamental Concept - 利用基因演算法之Fuzzy ID3方法於階層式場景分析系統

Our proposed image analysis system relies heavily on the fuzzy set theory, genetic algorithm, and fuzzy ID3 algorithm. First, it is instructive to explain them in detail.

2.1. Introduction to Fuzzy ID3

Knowledge acquisition from data is very important in knowledge engineering. A popular and efficient method is ID3 algorithm [9]. The ID3 approach to pattern recognition and classification consists of a procedure for synthesizing an efficient decision tree for classifying pattern that have non-numeric feature values. The decision tree can also be expressed in the form of rules. Therefore, ID3 is often thought of as an inductive inference procedure for machine learning or rule acquisition.

Fuzzy ID3 (FID3) algorithm [10], [11] extended from ID3 to incorporate fuzzy notation. The decision tree using fuzzy ID3 algorithm is similar to that of ID3 algorithm. Fuzzy ID3 algorithm is extended to apply to a data set containing numeric feature values instead of symbolic feature and generates a fuzzy decision tree using fuzzy sets. A fuzzy decision tree consists of nodes for training features, edges for branching by given feature values of fuzzy sets, and leaf node for final decision classes with certainties.

The feature ranking step is optional as we can use any arbitrary order of the features, but it is a desirable step because it can reduce the size of the tree and hence produce an efficient and accurate decision tree. While construct the decision tree, we use genetic algorithm to tune the fuzzy set membership function of features and parameters of decision tree to improve the classification performance and reducing the rule number.

2.2. Feature Ranking

When we start to construct decision tree, we have to choose the most important feature from the whole features. The order of feature to construct decision tree is an important issue to be investigated. In order to construct a decision tree with high accuracy and small size, the order of feature is evaluated using information gain [12].

In the process of deciding the order of features is called feature ranking.

The information theory that underpins this information gain criterion can be given in one statement: The information conveyed by a message depends on its probability and can be measured in bits as minus the logarithm to base 2 of that probability. So, for example, if there are 8 equal probable messages, the information conveyed by any one of them is or 3 bits. Therefore, the information gain criterion provides a mechanism for a ranking a set of features so that the most favorable order can be chosen.

à log2(1/8)

}

Assume that we have a training data set D, where each training data has l features and one classified class and fuzzy sets

for the feature . Let to be a fuzzy subset in whose class A1, A2, ..., Al C = C{ 1, C2, ..., Cm

Fi1, Fi2, ..., Fim Ai D^C^k D

is Ckand is the sum of the membership values in a fuzzy set of training data.

We use the information gain G A to estimate the gain of the feature, and decide the order of feature, whereA represents the ith feature.

|D| after branch according to the feature . We will the select the feature with maximum information gain for constructing the decision tree at root. So we will set higher order to the feature with higher . Because of the feature ranking procedure will influence the performance and size of the decision tree. Accordingly, we will obtain not only minimized rule number but also maximized accuracy.

( ) E A( i, D)

} Ai

G(Ai, D)

2.3. Tree Construction

Assume that we have a training data set D, where each training data has l features and one classified class and fuzzy sets

for the feature . Let to be a fuzzy subset in whose class is

A1, A2, ..., Al C = C{ 1, C2, ..., Cm

Fi1, Fi2, ..., Fim Ai D^C^k D

Cka. Then the algorithm to generate a fuzzy decision tree is shown as follows:

1) Generate the root node that has a set of all training data, i.e., a fuzzy set of all training data with the unit membership value.

2) If a node with a fuzzy set of data t D satisfies the following conditions: 2.3) There are no attributes for more classifications,

then it is a leaf node and we assign the certainty D D^C^k

with all classes at

this node.

3) If it does not satisfy the above conditions, it is not a leaf node, and the internal node is generated as follows:

3.1) Select the feature which has next large G value for test feature .

(Ai, D) Atest

3.2) Divide D into fuzzy subsets D according to the test feature, where the membership value of data in is the product of the membership value in and the value of of the value of

3.4) Replace D by D and repeat from 2) recursively until the end of all path is leaf node.

2.4. Fuzzy Set Discretization

Continuous-valued features have to be discretized priori to selection, typically by partitioning the range of feature into subranges. In ID3-like algorithms, a threshold value for the continuous-valued feature partition into two subranges. We regard this threshold value as cut point. The objection may be raised is that the discretized schema will cause to produce “bad” cut point especially when there are more than two classes in the problem.

This drawback can be overcome by using a discretization algorithm, called class-attribute interdependence maximization (CAIM) [13]. The CAIM algorithm show that it generates discretization schemes with almost always the highest dependence between the class labels and discrete intervals, and always with significantly lower numbers of intervals. Nevertheless, this crisp set is unnatural in the real world. Therefore, a fuzzy set introduces vagueness by eliminating the sharp boundary that divides members from nonmembers in the group. Thus, the transition between full membership and nonmembership is gradual rather than abrupt. Hence, we introduce Gaussian-type membership functions to each feature in our fuzzy ID3 algorithm.

The fuzzy ID3 scheme is determined by the parameter which includes the thresholds , , and the membership functions of each feature fuzzy set. A good selection of fuzzy rule base, leaf node threshold, and membership functions would greatly improve the accuracy of decision trees. To this end, genetic algorithm (GA) based scheme is utilized because of the essential nature of nonlinear of decision trees which limits the feasibility of traditional gradient method. In our work, GA [14] is

òr òn

used to tune the leaf node thresholds , , and the parameter of the membership functions of each feature. The membership function for each feature adopts the Gaussian-type and is given by

òr òn where x is the corresponding feature value of the data with mean and variance û.

Thus for each membership function, two parameters ö and must tune. To minimize the rule number and maximize the accuracy, the fitness function [15] is defined as where A is the accuracy of the classification, is the lowest accuracy of the classification in the current population,

L is the average depth of the decision tree, and η is the influence of the average depth. While starting the tuning procedure, initially we set η to a value such that

η is greater than A−A₀. This means that

reduction of average depth of decision tree obtains a higher priority over maximization of accuracy. Therefore, the thresholds are tuned that the data classifies at lower depth, so that nodes at higher depth of the tree becomes redundant. As GA evolves, we gradually continue to decrease the value of η so that maximization of the accuracy starts dominating. Subsequently, we reduce η to zero in k steps. After k steps, η is always zero. In other words, we focus on the improvement of the accuracy after η becomes zero. Thus we can decrease the rule number without losing classification performance.

Now, we will illustrate one cycle of the tuning process. Assume we have a data set with four features and three classes, such that there are twelve membership functions. Each membership function has two parameters ö and , and two û

thresholds and ò in addition. Thus we have to tune 26 parameters totally.

Initially we set random number to these parameters. In our setting, is randomly chosen from the range between the maximum and minimum value of the corresponding feature among the data and is chosen between 0 and the standard deviation of that feature.

òr n

There are several encoding of GA which depends on the problem heavily. Binary encoding is the most common one, mainly because the first research of GA used this type of encoding and because of its relative simplicity. In binary encoding, every chromosome is a string of bits 0 or 1. Crossover and mutation are two basic operators of GA. Performance of GA depend on them very much. There are many ways how to perform crossover and mutation. We briefly describe how to perform these two operators.

In Fig. 2.1. Multi point crossover method selects two crossover points, binary string from the beginning of the chromosome to the first crossover point is copied from the first parent, the part from the first to the second crossover point is copied from the other parent and the rest is copied from the first parent again. We repeat this a

+

=

Fig. 2.1. Multi point crossover.

procedure until the end of those parent chromosomes. This process produces two new offspring chromosome, each of which is similar to both parent chromosomes. There are other ways to make crossover, for example we can choose more crossover points.

Crossover can be quite complicated and depends mainly on the encoding of chromosomes. Specific crossover made for a specific problem can improve performance of the genetic algorithm.

After a crossover is performed, mutation takes place probably. Mutation is intended to prevent falling of all solutions in the population into a local optimum of the solved problem. In Fig. 2.2. Mutation operation randomly changes the offspring.

In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0 to 1. Mutation can be illustrated as follows:

=

Fig. 2.2. Mutation.

In our GA scheme, assume we generate 50 chromosomes of these parameters, and use them to generate decision trees. After each decision tree is generated, for example, one individual has accuracy 87% and average depth is 3.5. The lowest accuracy in this population is 72%, and the fitness function of this individual is

f Accordingly, we perform the reproduction, crossover, and mutation operators to generate the new chromosomes and continue until the predetermined condition is achieved. Here we select the crossover probability

pc= 1and mutation probability pm= 0.0001 for the GA evolving algorithm.

The advantage of GA is in their parallelism. GA is traveling in a search space using many individual trials in each generation so that they are less likely to get stuck in a local extreme like the other methods. The disadvantage of GA is in the computational time. GA can be slower than other methods. But since we can terminate the computation in any time, the longer run is acceptable. For some problems, choosing and implementation of encoding and fitness function can be difficult even though GA is powerful. To apply GA to fuzzy ID3 scheme, Fig. 2.4. is a flowchart of our genetic algorithm based fuzzy ID3 method.

2.5. Fuzzy Rule Inference

According the rule base, inference of the decision tree starts from the root node and iteratively tests each node indicated by the rule until reach at a leaf node. Note

that we have recorded certainty values D D^C^k

at leaf nodes as mentioned above and it

represents the certainty of each class of the corresponding rule.

Since we obtain the D D^C^k

values of each leaf node, the node is assigned by all

class name with certainty value D D^C^k

. On the other hand, every leaf node has all

class name with corresponding certainty value. The rule produced by each leaf node which can classify the data to every class with certainty value and does not directly classify the data to a specific class. For example, the fuzzy rule extracted from the leaf

node as follows:

IF X1 is F12 AND X2 is F21

THEN Class 1 with certainty 0.3 and Class 2 with certainty 0.7 .(2.10)

In the pre-condition of the above rule, X1 assumes the membership value in the second fuzzy set F12 of the first feature and X2 assumes the membership value in the first fuzzy set F21 of the second feature. In the consequent part, there are two certainty values, 0.3 certainty value for class 1 and 0.7 certainty value for class 2. The steps of using the rule base to classify are as follows:

1) For each extracted fuzzy rule, we multiply the membership value of the corresponding fuzzy set of the testing data from the root to the leaf node sequentially. That is the firing strength, i.e. suppose the membership value of the testing data associated with the ith fuzzy set is . The firing strength is given by

2) Multiply the certainty of the classes of the leaf node associated with the current fuzzy rule by the firing strength and denote the values as ,

J n( ) n = 1, 2, ..., class number

( )

3) Repeat 1) and 2) until that all rules have been evaluated.

4) Sum the result in 3) of all the rules. Note that we must sum up the each class respectively.

5) Assign the testing data to the class with the maximum value in 4).

For example, a simple decision tree with 2 features, 3 subsets and 3 classes is shown in Fig. 2.3. This decision has four leaf nodes F11, F21, F22, and F13 with their

certainties C1, C2, and C3 respectively. In addition, the membership values of the testing data are shown in the branch. Thus we can use these 4 fuzzy rules to classify the testing data as follows:

class 1: ΣJ(1)=0.1*0.1+0.5*0.8*0.3+0.5*0.3*0.3+0.9*0.8=0.895 class 2: ΣJ(2)=0.1*0.2+0.5*0.8*0.7+0.5*0.3*0.4+0.9*0.1=0.45 class 3: ΣJ(3)=0.1*0.7+0.5*0.8*0.0+0.5*0.3*0.3+0.9*0.1=0.205

Fig. 2.3. An example of decision tree.

The testing data is assign to class 1 becauseΣJ(1) is the maximum. Note that we classify one testing data need to evaluate all fuzzy rules but not just rely on a specific rule. In this way, we can use all rules together to decide the class of every testing data instead of generating a new specific rule only for some specific data. Therefore, we can reduce the size of rule base without losing performance.

Fig. 2.4. Flowchart of genetic algorithm based fuzzy ID3 method.

在文檔中利用基因演算法之Fuzzy ID3方法於階層式場景分析系統 (頁 15-27)