Illustrative Examples - An Evolutionary Neural Fuzzy Classifier Using Bacterial

Chapter 4 An Evolutionary Neural Fuzzy Classifier Using Bacterial

4.3 Illustrative Examples

In this section, we evaluate the classification performance of the proposed NFS-BFPSO method using two better-known benchmark data sets and one skin color

detection problem. The first example uses the iris data and the second example uses the Wisconsin breast cancer data. The two benchmark data sets are available from the University of California, Irvine, via an anonymous ftp address ftp://ftp.ics.uci.edu/pub/machine-learning-databases. In the following simulations, the parameters and number of training epochs were based on the desired accuracy. In short, the trained NFS with BFPSO was stopped once its high learning efficiency was demonstrated.

Example 1: Iris Data Classification

The Fisher-Anderson iris data consists of four input measurements, sepal length (sl), sepal width (sw), petal length (pl), and petal width (pw), on 150 specimens of the iris plant. Three species of iris were involved, Iris Sestosa, Iris Versiolor and Iris Virginica, and each species contains 50 instances. The measurements are shown in Figure 4.2.

In the iris data experiments, 25 instances with four features from each species were randomly selected as the training set (i.e., a total of 75 training patterns were used as the training data set) and the remaining instances were used as the testing set.

Once the NFS was trained, all 150 test patterns of the iris data were presented to the trained NFS, and the re-substitution error was computed. In this example, three fuzzy rules are adopted. After 4000 generations, the final fitness value was 0.9278.

Figure 4.3 (a)-(f) show the distribution of the training pattern and the final assignment of the fuzzy rules (i.e., distribution of input membership functions). Since the region covered by a Gaussian membership function is unbounded, in Figure 4.3 (a)-(f), the boundary of each ellipse represent a rule with a firing strength of 0.5. We compared the testing accuracy of our proposed method with that of other methods – the neural fuzzy system with bacterial foraging optimization (NFS-BFO) and the

neural fuzzy system with particle swarm optimization (NFS-PSO). The experiments calculated the classification accuracy and the values of the average produced on the testing set using the NFS-BFO method, the NFS-PSO method, and the proposed NFS-BFPSO method.

Figure 4.2: Iris data: iris sestosa (), iris versiolor (), and iris virginica ().

During the learning phase, the learning curves from the proposed NFS-BFPSO method, the NFS-BFO method, and the NFS-PSO method are shown in Figure 4.4.

Table 4.1 shows that the experiments with the NFS-BFPSO method result in high accuracy, with an accuracy percentage ranging from 96% to 98.67%. The means of re-substitution accuracy was 97.6%. The average classification accuracy of the

NFS-BFPSO method was better than that of other methods. Table 4.2 shows the comparison of the classification results of the NFS-BFPSO method with other methods [28][102][108-110] on the iris data. The results show that the proposed NFS-BFPSO method is able to keep similar average substitution accuracy.

(a) For the Sepal Length and Sepal Width dimensions.

(b) For the Petal Length and Petal Width dimensions.

(d) For the Sepal Width and Petal Width dimensions.

(e) For the Sepal Width and Petal Length dimensions.

(f) For the Sepal Length and Petal Width dimensions.

Figure 4.3: The distribution of input training patterns and final assignment of three rules.

Figure 4.4: Learning curves of the NFS-BFPSO method, the NFS-BFO method, and the NFS-PSO method.

Table 4.1: Classification accuracy using various methods for the iris data.

Model

Experiment # NFS-BFO NFS-PSO NFS-BFPSO

1 96 98.67 98.67

2 92 93.33 96

3 97.33 94.67 98.67

4 97.33 98.67 97.33

5 94.67 94.67 97.33

Average (%) 95.47 96 97.6

Table 4.2: Average re-substitution accuracy comparison of various models for the iris data classification problem.

Models Average re-substitution accuracy (%)

FEBFC [102] 96.91

SANFIS [28] 97.33

FMMC [108] 97.3

FUNLVQ+GFENCE [109] 96.3

Wu-and-Chen’s [110] 96.21

NFS-BFPSO 97.6

Example 2: Wisconsin Breast Cancer Diagnostic Data Classification The Wisconsin breast cancer diagnostic data set contains 699 patterns distributed into two output classes, “benign” and “malignant.” Each pattern consists of nine input features: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. 458 patterns are in the benign class and the other 241 patterns are in the malignant class. Since there were 16 patterns containing missing values, we used 683 patterns to evaluate the performance of the proposed NFS-BFPSO method. To compare the performance with other models, we used half of the 683 patterns as the training set and the remaining patterns as the testing set.

Experimental conditions were the same as the previous experiment. The training patterns were randomly chosen, and the remaining patterns were used for testing. The experiments calculated the classification accuracy and the values of the average produced on the testing set by the NFS-BFO method, the NFS-PSO method, and the proposed NFS-BFPSO method.

During the supervised learning phase, 4000 epochs of training were performed.

Figure 4.5 shows the membership functions for each input feature. The learning curves from the proposed NFS-BFPSO method, the NFS-BFO method, and the

NFS-PSO method are shown in Figure 4.6. The performance of the NFS-BFPSO method is better than the performance of all other models.

Table 4.3 shows that the experiments with the NFS-BFPSO method result in high accuracy, with an accuracy percentage ranging from 97.66% to 98.54%. The means of re-substitution accuracy was 97.95%. The average classification accuracy of the NFS-BFPSO method was better than that of other methods. We compared the testing accuracy of our model with that of other methods [26][28][101][102][111]. Table 4.4 shows the comparison between the learned NFS-BFPSO method and other fuzzy, neural networks, and neural fuzzy systems. The average classification accuracy of the NFS-BFPSO method is better than that of other methods.

Figure 4.5: Input membership functions for breast cancer classification.

Figure 4.6: Learning curves from the NFS-BFPSO method, the NFS-BFO method and the NFS-PSO method.

Table 4.3: Classification accuracy for the Wisconsin breast cancer diagnostic data.

Model

Experiment # NFS-BFO NFS-PSO NFS-BFPSO

1 95.32 96.49 97.66

2 95.61 97.08 98.54

3 93.86 94.44 97.66

4 94.74 97.37 97.95

5 94.74 96.49 97.95

Average (%) 94.85 96.37 97.95

Table 4.4: Average accuracy comparison of various models for Wisconsin breast cancer diagnostic data.

Models Average re-substitution accuracy (%)

NNFS [101] 94.15

FEBFC [102] 95.14

SANFIS [28] 96.3

NEFCLASS [26] 92.7

MSC [111] 94.9

NFS-BFPSO 97.95

Example 3: Skin Color Detection

The description of the system is the same as Section 3.4. Unlike the previous chapter set four rules to constitute the neuro-fuzzy classifier, we set three fuzzy rules in this example. In addition, the parameter learning method is change to be BFPSO method.

In this example, the performance of the NFS-BFPSO method is compared with the NFS-BFO method, and the NFS-PSO method. The learning curves are shown in Figure 4.7. In Figure 4.7, we find that the performance of the proposed NFS-BFPSO method is superior to the other methods. In addition, the comparison items include the training and testing accuracy rates with various existing models are tabulated in Table 4.5.

The CIT facial database consists of complex backgrounds and diverse lighting.

Hence, from the comparison data listed in Table 4.5, the average of the test accuracy rate is 82.39% for the NFS-BFO method, 83.64% for the NFS-PSO method and 85.82% for the proposed NFS-BFPSO method. This demonstrates that the CIT database is more complex and does not lead to a decrease in the accuracy rate. The proposed NFS-BFPSO method maintains a superior accuracy rate. The color images from the CIT database are shown in Figure 4.8. A well-trained network can generate

binary outputs (1/0 for skin/non-skin) to detect a facial region. Figure 4.9 shows that our model accurately determines a facial region.

Figure 4.7: The learning curves of the three methods using the CIT database.

Table 4.5: Performance comparison with various existing models from the CIT database.

Method NFS-BFPSO NFS-PSO NFS-BFO

Average training accuracy rate 97.63% 96.77% 96.5%

Average testing accuracy rate 85.82% 83.64% 82.39%

Figure 4.8: Original face images from CIT database.

Figure 4.9: Results of skin color detection with 3 dimension input (Y, Cb, Cr).

4.4 Concluding Remarks

This chapter proposes an efficient evolutionary learning method, using bacterial

foraging oriented by particle swarm optimization strategy (BFPSO), for the neural fuzzy system (NFS) in classification applications. The proposed BFPSO method attempts to make a judicious use of exploration and exploitation abilities of the search space and therefore likely to avoid false and premature convergence in many cases.

The advantages of the proposed BFPSO method are summarized as follows: 1) BFPSO involves the elite-selection mechanism to gain a chance to reproduce near optimal solutions. 2) BFPSO records the best previous solution and the global best solution to evolve. 3) BFPSO can balance the exploration and exploitation abilities of the search space. Three examples showed that the proposed NFS-BFPSO method improves the system performance in terms of a fast learning convergence, and a high correct classification rate.

Chapter 5 Nonlinear System Control Using

Functional-Link-Based Neuro-Fuzzy Network Model Embedded with

Modified Particle Swarm Optimizer

Nonlinear system control is an important tool that is adopted to improve control performance and achieve robust fault-tolerant behavior. Among nonlinear control techniques, those based on artificial neural networks and fuzzy systems have become popular topics of research in recent years [112-114] because classical control theory usually requires a mathematical model to design the controller. However, the inaccuracy of the mathematical modeling of plants usually degrades the performance of the controller, especially for nonlinear and complex control problems [115]. On the contrary, both the fuzzy system controller and the artificial neural network controller provide key advantages over traditional adaptive control systems. Although traditional neural networks can learn from data and feedback, the meaning associated with each neuron and each weight in the network is not easily interpreted. Alternatively, the fuzzy logical models are easily appreciated, because they use linguistic terms and the structure of IF-THEN rules. However, fuzzy systems have a lack of an effective learning algorithm to refine the membership functions to minimize output errors.

According to the literature review mentioned before, it can be said that, in contrast to pure neural or fuzzy methods, neural fuzzy networks (NFNs) systems [8-34] possess the advantages of both neural networks and fuzzy systems. NFNs bring the low-level learning and computational power of neural networks into fuzzy systems and give the

high-level human-like thinking and reasoning of fuzzy systems to neural networks.

This chapter presents a PSO-based learning algorithm for the neural fuzzy system (NFS) in nonlinear system control applications. PSO is an efficient tool for optimization and search problems. However, it is easy to become trapped in local optima due to its information sharing mechanism. Many research works have shown that mutation operators can help PSO prevent premature convergence [116-118]. To prevent basic PSO from becoming trapped in local optima, we modified the basic PSO by adding a diversity scheme, called the distance-based mutation operator, which strongly encourages a global search giving the particles more chance of converging to the global optimum. Therefore, the proposed learning algorithm is so called distance-based mutation particle swarm optimization (DMPSO).

The idea behind the proposed DMPSO learning algorithm is that there are only two kinds of convergence: 1) local optimum convergence and 2) global optimum convergence. If local optimum convergence occurred, meaning that the basic PSO is trapped in a local optimum, this is a good time to apply the mutation operator to help the PSO to escape from the local optimum. If global optimum convergence occurred, applying the mutation operator will cause the PSO to naturally converge again at the global optimum.

5.1 Learning Scheme for the FLNFN Model

This section presents the learning scheme for constructing the FLNFN model.

The proposed learning scheme comprises a structure learning phase and a parameter learning phase.

Figure 5.1: Flowchart of the proposed learning scheme for the FLNFN model.

Figure 5.1 presents flowchart of the learning scheme for the FLNFN model.

Structure learning is based on the entropy measure used to determine whether a new rule should be added to satisfy the fuzzy partitioning of input variables. Parameter learning is based on the proposed evolutionary learning algorithm, which minimizes a given cost function by adjusting the link weights in the consequent part and the parameters of the membership functions. Initially, there are no nodes in the network except the input–output nodes, i.e., there are no nodes in the FLNFN model. The nodes are created automatically as learning proceeds, upon the reception of incoming training data in the structure and parameter learning processes. In this research, once the learning process is completed, the trained-FLNFN can act as the nonlinear system controller. The following two sections detail the structure learning phase and the parameter learning phase.

5.2 Structure Learning Phase

The foremost step in structure learning is to determine whether a new rule should be extracted from the training data and to determine the number of fuzzy sets in the universe of discourse of each input variable, since one cluster in the input space corresponds to one potential fuzzy logic rule, in which m and _ij _ij represent the mean and standard deviation of that cluster, respectively. For each incoming pattern x , the rule firing strength can be regarded as the degree to which the incoming i

pattern belongs to the corresponding cluster. The entropy measure between each data point and each membership function is calculated based on a similarity measure. A data point of closed mean will have lower entropy. Therefore, the entropy values between data points and current membership functions are calculated to determine

whether or not to add a new rule. For computational efficiency, the entropy measure can be calculated using the firing strength from u_ij⁽²⁾ as

2 used to generate a new fuzzy rule and new functional link bases for new incoming data are described as follows. The maximum entropy measure

max 1max( ) _j threshold that decays during the learning process.

In the structure learning phase, the threshold parameter EM is an important parameter. The threshold is set between zero and one. A low threshold leads to the learning of coarse clusters (i.e., fewer rules are generated), whereas a high threshold leads to the learning of fine clusters (i.e., more rules are generated). If the threshold value equals zero, then all the training data belong to the same cluster in the input space. Therefore, the selection of the threshold value EM will critically affect the simulation results. As a result of our extensive experiments and by carefully examining the threshold value EM , which uses the range  ^{0, 1} , we concluded that there was a relationship between threshold value EM and the number of input variables (N). Accordingly, EM N, where  belongs to the range ^{0.26, 0.3}^.

Once a new rule has been generated, the next step is to assign the initial mean and standard deviation to the new membership function and the corresponding link weight for the consequent part. Since the goal is to minimize an objective function,

the mean, standard deviation, and weight are all adjustable later in the parameter learning phase. Hence, the mean, standard deviation, and weight for the new rule are set as

After the network structure has been adjusted according to the current training data, the network enters the parameter learning phase to adjust the parameters of the network optimally based on the same training data.

5.3 Parameter Learning Phase

Ratnaweera et al. [61] stated that the lack of population diversity in PSO algorithms is understood to be a factor in their convergence on local optima.

Therefore, the addition of a mutation operator to PSO should enhance its global search capacity and thus improve its performance. There are mainly two types of mutation operators: one type is based on particle position [118] and the other type is based on particle velocity [117]. The former method is the most common technique, and the mutation operator we proposed in this research is also based on particle position.

In [116], Li, Yang, and Korejo modified the PSO by adding a mutation operator;

the mutation operator provides a chance to escape from local optima. They focused on determining which random generator of the mutation operator is good for improving the population. However, the timing of application of the mutation operator is the most important thing. If mutation operator is applied too early, when the particles are

not nearly convergent, the local search ability of PSO is destroyed. If the mutation operator is applied too late, the parameter learning algorithm will be very inefficient.

Hence, it is an important issue to consider when to apply mutation operator. In our study, we used the distances between each particle as a measure to determine whether the mutation operator needed to be applied or not, and the modified PSO we used is the so called distance-based mutation particle swarm optimization (DMPSO).

Comparing the basic PSO with DMPSO, a convergent detection unit used to detect the particle convergent status is introduced. If the particles are convergent, the mutation operator will be processed. Otherwise, the mutation operator will be skipped.

The convergent detection unit computes the average distance from every particle to the particle that has global best value using Eq. (5.4)

( ) 1

After the average distance is computed, the threshold Th_conv is used to determine whether the particles are close enough or not according to Eq. (5.5). If all particles are close enough, meaning that all particles are converging to the same position, the mutation operator will be applied. Otherwise, the mutation operator will be skipped.

( )t Th_conv

  (5.5)

In this study, every particle has its own mutation probability. If the average distance is greater than Th_conv, implying that the majority of particles are not convergent, the mutation probability is set to zero, meaning that every particle does

not mutate and the behavior of every particle is like a generic PSO. If the average distance is less than Th_conv, meaning that all particles are converging to the same position, named G_best^t , the mutation probability ( MP ) of each particle is computed by Eq. (5.6). to 1 only when the i particle is successfully evolved at the ^th t iteration, meaning ^th that the local best fitness value is improved at the t iteration, and ^th progress t is ( ) the number of successful evolution particles at time step t .

The design of mutation probability is based on the ratio of improved population.

If the ratio of the improved population is higher, the mutation probability becomes smaller. Most particles are moving toward to the best value that they have currently found. The lower probability guarantees the direction of the moving group will not be destroyed by the mutation operator. On the other hand, if most particles do not improve their fitness value, the population is in the stable status. There are two possibilities: the first possibility is that the particles have converged to the global optimum (or near global optimum). The application of the mutation operator at the moment will not destroy the moving group, because the particles still remember the global optimum, and the mutated particles will move toward the global optimum in the next iteration. The second possibility is that the particles have converged to the local optimum, or in other words, they have fallen into a trap. The mutation operator provides a chance to escape from the trap. If some particles mutate and the new

在文檔中內嵌粒子群優化學習演算法之類神經模糊系統及其應用 (頁 57-0)