O R I G I N A L A R T I C L E
Data mining–based hierarchical cooperative coevolutionary
algorithm for TSK-type neuro-fuzzy networks design
Chi-Yao Hsu•Sheng-Fuu Lin •Jyun-Wei Chang
Received: 2 January 2012 / Accepted: 12 April 2012 / Published online: 25 April 2012 Springer-Verlag London Limited 2012
Abstract This study proposes a data mining–based hier-archical cooperative coevolutionary algorithm (DMHCCA) for TSK-type neuro-fuzzy networks design. The proposed DMHCCA consists of two-level evolutions: the neuro-level evolution (NULE) and the network-neuro-level evolution (NWLE). In NULE, a data mining–based evolutionary learning algorithm is utilized to evolve neurons. The good combinations of neurons evolved in NULE are reserved for being the initial populations of NWLE. In NWLE, the initial population are mated and mutated to produce new structure of networks. Similar to NULE, the good neurons of evolved network in NWLE are inserted into the NULE. Thus, by interactive two-level evolutions, the neurons and structure of network can be evolved locally and globally, respec-tively. Simulation results using DMHCCA are reported and compared with other existing models. Application of DMHCCA to a three-dimensional (3D) surface alignment task is also described, and experimental results are pre-sented better performance than other alignment systems. Keywords Hierarchical cooperative coevolutionary algorithm Neuro-level evolution Network-level evolution Data mining–based evolutionary learning algorithm Three-dimensional surface alignment
1 Introduction
In recent years, a fuzzy system used for several problems has become a popular research topic [1–10], especially for
solving nonlinear and complex problems [11–14]. The reason is that fuzzy systems use fuzzy sets, instead of a mathematical model, for designing controllers. Therefore, fuzzy systems can solve the problem that inaccurate mathematical modeling degrades the performance of the controllers.
The fuzzy system consists of a set of fuzzy if–then rules that are selected according to a substantial amount of heuristic observations to express the knowledge of proper strategies frequently. Thus, it is difficult for human experts to examine all the input–output data from a complex sys-tem to find proper rules for a fuzzy syssys-tem. To face with this challenge, there are several approaches proposed for generating if–then rules from numerical data [2, 3, 6]. These methods are all developed for supervised learning; that is, the correct ‘‘target’’ output value is given for each input pattern to guide the network’s learning. Among them, the most well-known supervised learning algorithm is back-propagation (BP) [3,6], which is a powerful training technique when tuning the parameters of networks. In addition, M. Riemiller and H. Braun [7] proposed a direct adaptive method for faster back-propagation learning: the RPROP algorithm. In their results, RPROP has shown a better performance in comparison with the gradient-des-cent method. Since the BP and RPROP algorithms are widespread to minimize the error function when training the networks, it may reach the local minima but never find the global solution. In addition, the performance of BP training depends on the initial values of the system parameters. Moreover, for different network topologies, one has to derive new mathematical expressions for each network layer.
Considering the above disadvantages, one may face with suboptimal performances, even for a suitable neural fuzzy network topology. Hence, the techniques capable of C.-Y. Hsu S.-F. Lin (&) J.-W. Chang
Department of Electrical Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan, ROC e-mail: [email protected]
training the parameters and finding a global solution while optimizing the overall structure are needed. To this end, evolutionary algorithms appear to be better candidates than the BP algorithm. Recently, an evolutionary fuzzy model has become a popular research field [15–28]. The evolu-tionary fuzzy model is a learning process to generate a fuzzy system automatically by incorporating evolutionary learning procedures. Among these evolutionary fuzzy models, the well-known algorithms are the genetic fuzzy models, that is, fuzzy models that are augmented by a learning process based on genetic algorithms (GAs). In spite of the genetic fuzzy models being used to seek the optima solutions, they may have some limitations such as same lengths of chromosomes, predefined parameters, and so on. Hence, several improved evolutionary algorithms have been proposed [22–28] to take into account these limitations. In [22], Bandyopadhyay et al. used the vari-able-length genetic algorithm to let chromosomes with different lengths in a same population. Carse et al. [23] used the genetic algorithm to evolve fuzzy rule-based controllers. In [24], authors presented an efficient immune symbiotic evolution learning algorithm to compensate the neuro-fuzzy controller. The experimental results showed that their approach has adopted to solve several nonlinear control problems. Lin et al. [25] proposed a novel self-constructing evolutionary algorithm for designing a TSK-type fuzzy model. Their algorithm exhibited good results on the water bath temperature control problem. Gomez and Schmidhuber proposed lots of work to consider these limitations [26,27]. In their work, enforced subpopulations (ESP) are proposed to use subpopulations of neurons for the fitness evaluation and overall control. As shown in [26], the subpopulations that use to evaluate the solution locally can obtain better performance compared to systems with only one population used to evaluate the solution. Never-theless, ESP do not reserve the good combinations of subpopulations whose fitness is high. It indicates that information about potential combinations of subpopula-tions is lost. In [28], Lin and Hsu proposed a hybrid evo-lutionary learning algorithm to combine the compact genetic algorithm and the modified variable-length genetic algorithm to perform structure/parameter learning to con-struct a network dynamically. More recently, Hsu and Lin [29] proposed a multi-groups cooperation-based symbiotic evolution (MGCSE) to train a TSK-type neuro-fuzzy net-work (TNFN). Their results showed that MGCSE can obtain better performance and convergence than symbiotic evolution. In spite of MGCSE being a good approach for training a TNFN, it would not be suitable for complex problems. The reason is that complex problems lead to large amount of parameters must be trained. Thus, it could result in slow rate of convergence. In addition, MGCSE performed random group combination to construct a
network. In spite of the fact that such action can sustain diversity, there is no systematic way to identify suitable groups for selecting chromosomes.
Although the above evolutionary learning algorithms [22–29] can improve traditional genetic algorithms, these algorithms may conduct one or more of the following problems: (1) the random group selection of fuzzy rules, (2) low convergence rate as the problem becomes complex, and (3) potential fuzzy rules combinations is lost.
To this end, this study proposes data mining–based hierarchical cooperative coevolutionary algorithm (DMH CCA) for improving the problems of evolutionary learn-ing algorithms that were mentioned above. The notion of DMHCCA is to utilize two-level evolutions: the neuro level and the network level. At the neuro level, to solve the problem of the random group selection, this paper utilizes a data mining–based evolutionary learning algo-rithm (DELA) to evolve neurons. The reason why we adopt the data mining approach is that data mining has been widespread used in several fields [30, 31]. Data mining is a method of mining information from a data-base called ‘‘transactions.’’ Data mining can be regarded as a new way for performing data analysis. One aim of data mining is to find association rules among sets of items that occur frequently in transactions. To achieve this aim, several methods have been proposed [32–34], and a comprehensive survey of discovering frequent item sets and association rules have been presented in [35]. In [32], the authors proposed a mining method, which ascertains large sets of items to find out the association rules in transactions. Hang et al. [33] proposed frequent pattern growth (FP-growth) to mine frequent patterns without candidate generation. In Hang’s work, items that occur more frequently will have better chances of sharing information than items that occur less frequently. Wu et al. [34] proposed a data mining method based on the GA algorithm that efficiently improves the traditional GA by using analysis and confidence parameters. Thus, the DELA method adopts a data mining method to system-atically select the group of fuzzy rules that can solve the problem of the random group selection. Besides, the regularized least square (RLS) is proposed to increase the convergence rate. At the network level, the good combinations of neurons (fuzzy rules) are reserved and evolved into new ones. Moreover, DMHCCA proposed variable antecedent-part crossover (VAC) and variable antecedent-part mutation (VAM) at network level such that the variable length of chromosomes can be mated and mutated. Therefore, DMHCCA tries to improve hierar-chical enforced subpopulations (HESP) [27] that only fixed length of networks can be evaluated in one gener-ation. In the first example of our experimental section, DMHCCA is compared with H-ESP, and the results show
that the proposed DMHCCA is proven superior to H-ESP. In the second example, a three-dimensional (3D) surface alignment task is adopted to examine the learning per-formance of DMHCCA. The experimental results show that the proposed DMHCCA trained TNFN-based align-ment approach is better than other alignalign-ment systems.
This paper is organized as follows. In Sect.2, a TSK-type neuro-fuzzy network is introduced. The proposed DMHCCA is described in Sect.3. In Sect. 4, the illustra-tion examples are presented. The conclusions are given in the last section.
2 Structure of TSK-type neuro-fuzzy network (TNFN) A TSK-type neuro-fuzzy network (TNFN) [5] employs different implication and aggregation methods from a standard Mamdani fuzzy system. According to [6, 36], authors have shown that a TSK-type NFN can offer better network size and learning accuracy than a Mamdani-type NFN. Thus, instead of using fuzzy sets, the conclusion part
of a fuzzy rule is a linear combination of the crisp inputs. The fuzzy rule of TNFN is shown in Eq. (1), where n and j represent the number of the input dimensions and the serial number of the fuzzy rules, respectively.
IF x1is A1jðm1j;r1jÞ and x2is A2jðm2j;r2jÞ
. . .and xnis Anjðmnj;rnjÞ
THEN y0¼ w0jþ w1jx1þ þ wnjxn ð1Þ
The structure of a TNFN is shown in Fig.1, where n represents the number of input dimensions. It is a five-layer network structure. In the TNFN, the firing strength of a fuzzy rule is calculated by performing the following ‘‘AND’’ operation on the truth values of each variable to its corresponding fuzzy sets by:
uð3Þij ¼Y n i¼1 exp uð1Þi mij h i2 r2 ij 0 B @ 1 C A ð2Þ
where uð1Þi ¼ xi and uð3Þij are the outputs of 1st and 3rd
layers; mij and rij are the center and the width of the
Fig. 1 Structure of the TSK-type neuro-fuzzy network
Gaussian membership function of the jth term of the ith input variable xi, respectively.
The output of the neuro-fuzzy network is computed by: y¼ uð5Þ¼ PM j¼1u ð4Þ j PM j¼1u ð3Þ j ¼ PM j¼1u ð3Þ j w0jþP n i¼1wijxi PM j¼1u ð3Þ j ð3Þ
where u(5) is the output of 5th layer; wij is the weighting
value with ith dimension and jth rule node; and M is the number of fuzzy rule.
3 Data mining–based hierarchical cooperative coevolutionary algorithm
The learning process of DMHCCA is shown in Fig.2. As shown in this figure, DMHCCA involves two major evolutions: neuro-level evolution and network-level evolution. The blocks of inserting good networks and inserting good neurons are the connection between the neuro and network-level evolution. These two operations indicate that good evolved results in one level evolution would be transferred to another level evolution. Once receiving good neurons or networks, the received chro-mosomes would be mated with other old chrochro-mosomes to yield some new offspring. Therefore, by exchanging the good information between two levels of evolu-tion, we have more chance to find the global optimal solution.
3.1 Neuro-level evolution
In this subsection, we will discuss the neuro-level evo-lution (NULE). To consider the structure of TNFN, NULE adopts the variable length of a combination of chromosomes with RLS method to construct a TNFN. The structure of chromosomes to construct TSK-type neuro-fuzzy networks (TNFNs) in NULE is shown in Fig.3. In this figure, each antecedent part of a fuzzy rule represents a chromosome selected from a group, Psize
denotes that there are Psize groups in a population, and
Mk indicates that there are Mk rules used in TNFN
construction.
After discussing the structure of chromosomes to con-struct TNFNs, details of the coding step for NULE and RLS method are described as follows:
(1) Coding Step: The coding structure of chromosomes in the proposed NULE is shown in Fig.4. This figure describes an antecedent part of a fuzzy rule that has the form in Eq. (1), where mij and rij represent a Gaussian
membership function with mean and deviation of ith dimension and jth rule node, respectively. Besides, a pair of (m, r) indicates a neuron in Layer 2 of a TNFN.
Evolving an antecedent part of a fuzzy rule is likely to evolve a neuron. Thus, the evolution of this level is called a neuro-level evolution.
(2) RLS method: Assume a TSK-type neural fuzzy model composed of m fuzzy rules as the following form: Rj: IF x1 is A1j. . . and xnis Anj;
THEN yj¼ wojþ w j
1x1þ þ wnjxn
ð4Þ
where j = 1,…, m and Aijis the linguistic part with respect
the input i and Rule j. From Eq. (4), the output can be written as: y¼ Pm j¼1ujyj Pm j¼1uj ¼ ^u1y1þ ^u2y2þ þ ^umym; ð5Þ
where uj is the firing strength of Rule j, and
^
uj¼ uj=ðu1þ þ umÞ. Then it is possible to express the
equation above into the form: y¼ ^u1 w10þ w 1 1x1þ þ w1nxn þ þ ^um wm0 þ w m 1x1þ þ wmnxn ¼ aW ð6Þ where W ¼ ½WT 1 W T m T ; Wj¼ ½w j 0 w j n T ; j¼ 1; . . .m, and a¼ ½^u1^u1x1. . .^u1xn ^ u2^u2x1. . .^u2xn .. . ^ um^umx1 ^umxnT
Since y and a are known values, the only unknown value is the consequent part W. Suppose a given set of training inputs and desired outputs is xðtÞ; yf dðtÞg
M
t¼1. The Eq. (6)
can be rewritten as:
AW¼ Yd ð7Þ
where A = [a(1) a(2)…a(M)]T.
In order to get the smooth estimation, the regularization is adopted. The approximation solution can be written as follows:
^
W ¼ ðATAþ kIÞ1
ATYd; ð8Þ
where k is a regularization parameter that adjusts the smoothness. Thus, by getting Eq. (8), we finish the esti-mation of the consequent part of fuzzy rules.
The learning process of NULE involves seven operators: initialization, self-organization algorithm, data mining– based selection method, fitness assignment, reproduction, crossover, mutation, and insert good networks. The whole learning process is introduced below:
a. Initialization: Before we start the neuro-level evolu-tion, the initial groups of individuals should be generated. Thus, initial groups are generated randomly within a
Fig. 2 Learning process of DMHCCA
predefined range. The following formulations show how to generate the initial chromosomes in each group:
Deviation: Chrg;c½p ¼ random½rmin;rmax;
where p¼ 2; 4; . . .; 2n;
g¼ 1; 2; . . .; Psize; c¼ 1; 2; . . .; NC; ð9Þ
Mean: Chrg;c½p ¼ random½mmin; mmax;
where p¼ 1; 3; . . .; 2n 1; ð10Þ where Chrg,crepresents cth chromosome in the gth group,
NC is the total number of chromosomes in each group,
p represents the pth gene in a Chrg,c, and [rmin, rmax],
[mmin, mmax] represent the predefined range to generate the
chromosomes.
b. Self-adaptive method (SAM): To select fuzzy rules automatically, the proposed DMHCCA adopts our previous research—the self-adaptive method (SAM) [37]—to determine the suitability of TNFN models with different fuzzy rules. The self-adaptive method encodes the proba-bility vector VMk to stand for the suitability of a TNFN with
Mkrules. In addition, in SAM, the minimum and maximum
number of rules must be predefined to limit the number of fuzzy rules to a certain bound, that is, [Mmin, Mmax]. The
processing steps of SAM are described as follows: Step 1. Update the probability vectors VMk according to
the following equations:
VMk ¼ VMkþ ðUpt valueMk kÞ; if Avg fitMk
VMk ¼ VMk ðUpt valueMk kÞ; otherwise
(
ð11Þ
Avg¼ X
Mmax
Mk¼Mmin
fitMk=ðMmax Mminþ 1Þ; ð12Þ
Upt valueMk ¼ fitMk
.XMmax
Mk¼Mmin
fitMk; ð13Þ
if FitnessMk ðBest FitnessMk ThreadFitnessvalueÞ
then fitMk ¼ fitMkþ FitnessMk; ð14Þ
where VMk is the probability vector, k is a predefined
threshold value, Avg represents the average fitness value in the whole population, Best FitnessMk represents the best
fitness value of TNFN with Mkrules, and fitMk is the sum of
the fitness values of the TNFN with Mkrules.
Step 2. Determine the selection times of TNFN with dif-ferent rules according to the probability vectors as follows:
RpMk ¼ ðSelection TimesÞ ðVMk=Total VelocyÞ; ð15Þ
Total Velocy¼ X
Mmax
Mk¼Mmin
VMk; ð16Þ
where Mk= Mmin, Mmin?1, …, Mmax, Selection_Times
represents the total selection times in each generation and RpMk represents the selection times of TNFN with Mkrules
in one generation.
Step 3. Accumulator calculation: If the current best combination of chromosomes does not improve, then accumulator can be computed as below:
if Best Fitnessg¼ Best Fitness;
then Accumulator¼ Accumulator þ 1; ð17Þ where Best_Fitnessgrepresents the best fitness value of the
best combination of chromosomes in the gth generation, and Best_Fitness represents the best fitness value of the best combination of chromosomes in the current generations.
c. Data mining–based selection method (DMSM): This process performs the selection step, which involves the selection of groups and the selection of chromosomes.
(1) Selection of groups: This paper proposes DMSM to determine the suitable groups for chromosomes selection to form a TNFN. In DMSM, suitable groups are selected according to the groups, which conduct from association rules that indicate good performance. In contrast, unsuit-able groups are avoided selecting according to the groups, which conduct from association rules that demonstrate bad performance. To perform DMSM, we use a transaction-built action and an association rule mining action to select the well-performing groups. The details of these two actions are described as follows.
Action1: Transaction-built action.
The aims of this action are twofold: accumulate the transaction set and select groups. Regarding the accumu-lation of transaction set, the transactions are built using the following equations:
if FitnessMk ðBest FitnessMk ThreadFitnessvalueÞ
Transactionj½i ¼ TFCRuleSetMk½i
then
PerformanceIndex¼ g; ð18Þ
if FitnessMk\ðBest FitnessMk ThreadFitnessvalueÞ
Transactionj½i ¼ TFCRuleSetMk½i
then
PerformanceIndex¼ b; ð19Þ
where i = 1, 2, …, Mk, Mk= Mmin, Mmin?1, …, Mmax,
j = 1, 2,…, TransactionNum, the FitnessMk is the fitness
value of TNFN with Mk rules, ThreadFitnessvalue is a
Fig. 4 Coding an antecedent part of a fuzzy rule into a chromosome in NULE
predefined value, TransactionNum is the total number of transactions, Transactionj[i] is the ith item in the jth
transaction, TFCRuleSetMk½i is the ith group in the Mk
groups used for chromosomes selection, and Performance Index = g and Performance Index = b represent the good and bad performance, respectively. Hence, transactions have the form shown in Table1. As shown in Table1, the first transaction indicates that the three-rule TNFN formed by the first, fourth, and eighth groups has ‘‘good’’ perfor-mance. In contrast, the second transaction indicates that the four-rule TNFN formed by the second, fourth, seventh, and the tenth groups has ‘‘bad’’ performance.
Regarding the group selection, DMSM selects groups using the following equation:
if Accumulator NormalTimes then GroupIndex½i ¼ Random½1; PSize;
ð20Þ where i = 1, 2, …, Mk, Mk= Mmin, Mmin?1, …, Mmax,
Accumulator is used to determine which action should be adopted, GroupIndex[i] is the selected ith group of the Mk
groups, and PSizeindicates that there are PSize groups in a
population. If the best fitness value does not improve for a sufficient number of generations (NormalTimes), then DMSM selects groups according to the association rule mining action.
Action 2. Association rule mining action.
In the association rule mining action, suitable groups are selected according to the association rules. To produce the association rules with good performance, the frequent groups must be found in advance. Thus, we adopt FP-growth method described in [33] to find the frequent groups. Then, the found frequent groups are compared with the groups owing bad performance shown in Table1 to count the confidence degree, which can be computed by the following formula:
confidenceðfrequent groups ) goodÞ ¼ Pðgoodjfrequent groupsÞ
¼ suppðfrequent groups [ goodÞ
suppðfrequent groups [ goodÞ þ supp ðfrequent groups [ badÞ; ð21Þ where P(good|frequent groups) is the conditional probability, frequent groups [ good or bad is the union of frequent groups and good or bad performance, and supp
(frequent groups [ good or bad) is the counts of frequent groups with good or bad performance occurring in transactions. Then, the rule is valid if
confidenceðfrequent groups ) goodÞ minconf ; ð22Þ where minconf is the minimal confidence given by a user or an expert. Hence, we can infer that if a rule satisfies Eq. (22), then the frequent groups can be considered as the suitable groups. For example, if the confidence of {2, 5, 8}) {g} is larger than the minimum confidence, we produce this association rule, which indicates that the combination of the second, fifth, and eighth groups have ‘‘good’’ performance. After doing so, the frequent groups are conducted to produce association rules and generate the AssociatedGoodPool, which contains all frequent groups that satisfy Eq. (22).
After the association rules are constructed, DMSM selects groups according to the association rules. The group indexes are selected from the associated good groups according to the following equations:
if NormalTimes\Accumulator ExploreTimes then GroupIndex½i ¼ w;
where w ¼ GoodItemSet½q
¼ Random½AssociatedGoodPool; ð23Þ where q = 1, 2,…, AssociatedGoodPoolNum, i = 1, 2, …, Mk, Mk= Mmin, Mmin?1, …, Mmax, ExploreTimes is a
predefined value that judge to perform the association rule mining action, AssociatedGoodPool is the sets of good item set obtained from the association rules, GoodPoolNum is the total number of sets in Associated-GoodPool and GoodItemSet[i] presents a good item set randomly selected from AssociatedGoodPool. In the Eq. (23), if Mkis greater than the size of GoodItemSet, the
remaining groups are selected using Eq. (20). If the best fitness value does not improve for a sufficient number of generations (ExploreTimes), DMSM selects groups based on the transaction-built action and sets Accumulator = 0.
(2) Selection of chromosomes: After the Mkgroups are
selected, Mkchromosomes are selected from Mkgroups as
follows:
ChromosomeIndex½i ¼ q; ð24Þ
where q = Random[1, Nc], i = 1, 2,…, k, Ncis the total
number of chromosomes in each group, and Chromo-someIndex[i] is the index of a chromosome that is selected from the ith group.
d. Fitness assignment: To assign the fitness value of an individual, the following detailed steps in the fitness value assignment are performed:
Step 1. Choose Mkantecedent part of fuzzy rules using
RLS method to construct a TNFN RpMktimes from Mk
Table 1 Transactions in the DMSM
Transaction index Groups Performance index
1 1, 4, 8 g
2 2, 4, 7, 10 b
… … …
groups with size NC. The Mkgroups are obtained from the
DMSM.
Step 2. Evaluate every TNFN that is generated from Step 1 to obtain a fitness value. In this paper, the fitness value is designed according to the following formulation: Fitness Value¼ 1=ð1 þ Eðy; yÞÞ; ð25Þ where Eðy; yÞ ¼X N i¼1 ðyi yiÞ 2 ; ð26Þ
where yiand yirepresents the desired and predicted values
of the ith output, respectively, Eðy; yÞ is an error function and N represents the number of the training data in each generation.
Step 3. Divide the fitness value by Mk and accumulate
the divided fitness value to the selected antecedent part of fuzzy rules with their fitness value records.
Step 4. Divide the accumulated fitness value of each chromosome from Mkgroups by the number of times that it
has been selected.
e. Reproduction: To perform reproduction, elite-based reproduction strategy (ERS) [29] is adopted. In ERS, every chromosome with the best performance is kept. In the remaining chromosomes in each group, the roulette-wheel selection method [38] is adopted for proceeding with the reproduction process. Then the well-performed chromo-somes in the top half of each group [21] proceed to the next generation. The other half is generated by performing crossover and mutation operations on chromosomes in the top half of the parent individuals.
f. Crossover: In this step, a two-point crossover strategy [38] is adopted. Once the crossover points are selected, exchanging the site’s values between the selected sites of individual parents can create new indi-viduals. These individuals are offspring that inherent the parents’ merits.
g. Mutation: The utility of the mutation step can provide some new information to every group at the site of an individual by randomly altering the allele of a gene. Thus, mutation can lead to search new space that would prevent from falling into the local minimal solution. In the muta-tion step, uniform mutamuta-tion [39] is adopted, and the mutated gene is drawn randomly from the domain of the corresponding variable.
h. Insert good networks: Since there are ‘‘Selec-tion_Times’’ networks constructed in every generation, the fitness value of each network is recorded and compares it with the network evolution level. If the fitness of the net-work is better than the worst netnet-work in the netnet-work evolution level, then this network is inserted into the net-work evolution level.
If the number of generations reaches a predefined maximal iteration value or the best fitness value is greater than a fitness threshold, DMHCCA is terminated, and output the final results.
3.2 Network-level evolution
In this subsection, the network-level evolution (NWLE) is discussed. The main processes of NWLE involve six operations: receive good networks, reproduction, variable antecedent-part crossover, variable antecedent-part muta-tion, evaluamuta-tion, and insert good neurons. The details of these operations are described as follows:
a. Receive good networks: Before the network evolution starts, we receive N well-performed networks from neuro-level evolution to be chromosomes. The coding structure of chromosomes in the network-level evolu-tion is shown in Fig.5. In this figure, each block of a chromosome describes an antecedent part of a fuzzy rule that has the form in Eq. (4), where mij and rij
represent a Gaussian membership function with mean and deviation of ith dimension and jth rule node, respectively. The consequent part of a fuzzy rule is skipped to encode into chromosomes since regularized least square is proposed to estimate the consequent part. After that, we sort the chromosomes to prepare for performing reproduction.
b. Reproduction: Reproduction is a process in which string are copied according to their fitness value. In this step, roulette-wheel selection method is adopted for the reproduction process. The well-performed chromosomes in the top half of each group proceed to the next generation. The other half is generated by executing variable two-part and variable two-part operations on chromosomes in the top half of the parent individuals.
c. Variable antecedent-part crossover: In the network-level evolution, the variable antecedent-part crossover (VAC) is proposed to perform crossover. In VAC, two parents are selected by using the roulette-wheel selection method [38]. Because the selected parents may be with different length, the misalignment of individuals must be avoided in the crossover operation. Thus, antecedent-part crossover is proposed to address
Fig. 5 The coding the antecedent part of fuzzy rules into a chromosome in the network-level evolution
this problem. The antecedent part means that only the antecedent of fuzzy rule is performed crossover operation. In VAC, two-point crossover [38] is adopted to execute crossover. Thus, new individuals are generated by exchanging the site’s values between the selected sites of the parents’ individuals. In VAC, to avoid the misalignment of individuals in the crossover, the selection of the crossover points would not exceed the shortest length chromosome of two parents. Two individuals with different lengths using VAC operation are shown in Fig.6 where ARj
represents the parameters of the antecedent part of the jth rule in the TNFN, and Rkrepresents there are
k fuzzy rules in a TNFN. After performing the VAC, the new offspring can replace the individuals with poor performance.
d. Variable antecedent-part mutation: The mutation oper-ator can randomly alter the allele of a gene. It provides new information to every population at the site of an individual. In the network-level evolution, the variable antecedent-part mutation (VAM) is adopted to perform the mutation operation. The benefit of VAM is to be applied to different length of chromosomes. The VAM operation of each individual is shown in Fig.7where AR indicates antecedent part of fuzzy rule In VAM, uniform mutation [39] is adopted, and the mutated gene is drawn randomly from the domain of the corresponding variable.
e. Evaluation: The evaluating step is to evaluate the fitness of each chromosome that has not already been evaluated in a population. The higher a fitness value indicates, the better the performance. Since each chromosome only includes the antecedent part of
fuzzy rules, the consequent part of fuzzy rules is not defined. Thus, similar to the fitness assignment in NULE, RLS method is used to estimate the consequent part of fuzzy rules.
f. Insert good neurons: After the evaluation operation, if a network has a higher fitness value than the best network in the neuro level, insert the neurons into the corresponding groups of subpopulation in the neuro-level evolution.
In short, the purpose of NWLE is to reserve the good combinations of fuzzy rules produced by NULE and evolve the structure of the produced neural fuzzy networks. Thus, the utility of NWLE is to fine tune the evolved results of NULE. To this end, NULE would be a major evolution to evolve TNFNs and it affects the effectiveness of the pro-posed DMHCCA model.
4 Illustration examples
To verify the proposed DMHCCA, two examples are dis-cussed in this section. The first one is a prediction of Mackey–Glass time series. The second one is a three-dimensional (3D) surface alignment task. Based on these examples, this study compares DMHCCA with that of others methods. The initial parameters for the two exam-ples are given in Table2. The initial parameters of the proposed DMHCCA are determined by parameter explo-ration in [40], which was the first study in parameter exploration. As shown in [40], a small population size is good for the initial performance, and a large population size is good for long-term performance. Moreover, a low mutation rate is good for online performance, and a high mutation rate is good for off-line performance. Thus, we adjust parameters of DMHCCA according to the criterion mentioned in parameter exploration method.
4.1 Example 1: Prediction of Mackey–Glass time series
The Mackey–Glass time series is a common benchmark for testing different learning algorithms. Thus, we utilize such chaotic time series to perform an extensive analysis on our proposed algorithm and other evolutionary algorithms Fig. 6 The variable
antecedent-part crossover operation in the network-level evolution
Fig. 7 The variable antecedent-part mutation operation in the network-level evolution
The Mackey–Glass time series is generated from the following delay differential equation:
dxðtÞ dt ¼
0:2xðt sÞ
1þ x10ðt sÞ 0:1xðtÞ ð27Þ
Crowder [41] extracted 1,000 input–output data pairs {x, yd}, which consisted of four past values of x(t), that is ½xðt 18Þ; xðt 12Þ; xðt 6Þ; xðtÞ; xðt þ 6Þ ð28Þ where s = 17 and x(0) = 1.2. There are four inputs to DMHCCA, corresponding to these values of x(t), and one output representing the value x(t ? Dt), where Dt is a time prediction into the future. The first 500 pairs [from x(1) to x(500)] are the training data set, and the remaining 500 pairs [from x(501) to x(1,000)] are the testing data set used for validating the proposed method. The values are floating-point numbers assigned using the DMHCCA initially. The fitness function in this case is defined in Eqs. (25) and (26) to train the neural fuzzy network. The evolution learning processes 500 generations; it is repeated 50 times. For comparative analysis, the present study adopts the root mean square error (RMSE), which is defined as follows:
RMSE¼ 1 Nt XNt l¼1 Ylðt þ 6Þ Yldðt þ 6Þ 2 " #1=2 ; ð29Þ
where Nt is the number of testing data, Yld(t ? 6) =
x(t ? 6) is the desired value, and Yl(t ? 6) is the predicted
value by the model with four inputs and one output. To compare with other algorithms, in this example, according the parameter exploration method [40], 12, 12, 10, and 12 fuzzy rules are set for hierarchical enforced subpopulations (HESP) [27], enforced subpopulations (ESP) [26], traditional symbiotic evolution (TSE) [42], and traditional genetic algorithm (TGA) [19], respectively. In addition, the population size has the range of 10–250 in increments of 10, the crossover rate has the range of 0.25–1 in increments of 0.05, and the mutation rate has the range of 0–0.3 in increments of 0.01. Toward this end, the other parameters setting for HESP, ESP, TSE, and TGA are as follows: (1) the population sizes are 30, 30, 200, and 50, respectively; (2) the crossover rates are 0.6, 0.6, 0.4, and
0.7, respectively; (3) the mutation rate of the four methods are 0.04, 0.05, 0.05, and 0.04, respectively. In addition, as same with DMHCCA method, the evolution learning of each method processes for 500 generations and is repeated 50 times.
Table3 lists the generalization capabilities of the pro-posed DMHCCA, HESP [27], ESP [26], TSE [42], and TGA [19]. Clearly, as shown in Table3, DMHCCA obtains a lower RMSE than other methods.
Furthermore, this case also compares the running time of DMHCCA with that of other methods. The running time defined in this case is used to measure the time when the fitness of the algorithm converges to the predefined value. The results of four algorithms over 50 runs are reported in Table4. As shown in this table, the proposed DMHCCA is faster than ESP, TSE, and TGA.
4.2 Example 2: 3D surface alignment task
In this example, we apply DMHCCA to a 3D surface alignment task. The example of 3D surface alignment is a real problem that aims to align two surfaces. Figure8
Table 2 Initial parameters of
DMHCCA before training Parameters Value Parameters Value
NULE NWLE NULE NWLE
Psize 40 20 Mutation rate 0.2 0.1
Nc 20 None [Mmin, Mmax] [3, 25] None
Selection_Times 50 None [mmin, mmax] [-10, 10] [-10, 10]
NormalTimes 10 None [rmin, rmax] [1, 15] [1, 15]
ExploreTimes 15 None minconf 60 % None
Crossover rate 0.6 0.7 RLS parameter (k) 0.003 0.003
Table 3 Performance comparison of various existing models
Method RMSE
Best Mean Worst STD
DMHCCA 0.0032 0.0048 0.0082 0.0011 HESP 0.0076 0.0092 0.0012 0.0014
ESP 0.0092 0.011 0.015 0.0016
TSE 0.015 0.019 0.024 0.0024
TGA 0.021 0.029 0.064 0.013
Table 4 Comparison of the running time of various algorithms Method Best (s) Worst (s) Mean (s)
DMHCCA 6.55 57.26 23.39
HESP 15.36 107.86 56.25
ESP 18.76 128.43 66.19
TSE 24.48 192.71 152.75
illustrates the procedure of a 3D surface alignment task. From this figure, the 3D scene is scanned by a 3D imaging laser scanner where the size of the scanned scene is 256 9 256 with 20 field of view. Each pixel in the range image reflects a range data that indicates a distance from the sensed point to the scanner. In other words, the range data can be considered as a 3D point with respect to the scanner. Thus, the scanner can be a center of a coordinate system to represent each sensed range data. To this end, the 3D posi-tion of each pixel is created by transforming range data to Cartesian coordinate. The region of interest (ROI) is extracted by using the segmentation algorithm described in [43]. The reference model is a target 3D surface that the ROI wants to align with. Thus, the purpose of the 3D surface alignment task is to align the ROI with the reference model. The problem of 3D surface alignment has been imple-mented by several methods [44–48]. Among them, a coarse-to-fine technique is a useful way for performing 3D surface alignment [44,45]. Coarse alignment provides an approximate transformation for aligning two surfaces. Such alignment must be efficient and accurate. Fine alignment takes the initial gauss of transformation given by coarse alignment as a starting point to iteratively minimize the distance between the input surface and the destination surface. Thus, this study utilizes a TNFN-based coarse-to-fine method to perform 3D surface alignment.
Regarding coarse alignment of 3D surface, this study adopts a TNFN-based coarse alignment approach. Such approach captures VFH of multi-views of a 3D object as the input of a TNFN. The desired output of the TNFN is the corresponding pose of the captured feature. Thus, the
desired output and the feature input can be performed for using DMHCCA to train a TNFN. Once the TNFN has been trained, input of the VFH of an arbitrary view of an object into the trained TNFN can yield an estimated pose. Then, we can utilize the estimated output pose to recover the input point clouds to coarsely align with the reference model.
Regarding fine alignment of 3D surface, similar to the neural network method (NNM) [45], a TNFN-based fine alignment approach is used to combine the DMHCCA trained TNFN-based surface modeling with the downhill simplex optimization method to iteratively reduce the distance from the input 3D surface to the reference surface. To examine the alignment accuracy, 2000 synthesized point cloud sets are generated randomly within the range described in Table5. For training the TNFN, 70 % of point clouds (1,400) are prepared for training data set and the remaining 30 % of point clouds (600) are prepared for testing data set. The initial parameters used by DMHCCA for the TNFN training are defined in Table2.
Fig. 8 Procedure of a 3D surface alignment task
Table 5 Range of 3D rigid transformation parameters 3D rigid transformation
parameter
The range of affine transformation parameter / (degree), for roll [-10, 10]
u (degree), for yaw [-90, 90] h (degree), for pitch [0, 90]
x(m) [-0.2, 0.2]
y(m) [-0.2, 0.2]
For setting the parameters of NNM, according to [45], a 2-layer neural network is used to model the vertebral sur-face model where the first layer has 20 nodes and the second layer has 10 nodes. In this example, by practical experimentations, the first layer setting for 30 nodes and the second layer setting for 20 nodes would have good results for modeling the surface of the reference model. In addition, the back-propagation algorithm is used for training the neural network, and the training process stops as the error between the output of the neural network and the desire distance value is less than 0.001 or the iterations reach 1000. Thus, NNM adopts the above parameters to train a neural network to model the reference surface.
Since the execution time and alignment accuracy are two major issues for a surface alignment system, we take them as the evaluation conditions to examine the propose alignment system.
4.2.1 Alignment accuracy
To evaluate the alignment accuracy, we compare the proposed method with NNM [45] and iterative closest
point (ICP) [48]. About the stopping criterion, to com-pare all alignment methods, this paper sets the same criterion for each alignment method. To this end, the alignment procedure of each method is terminated when the number of iteration reaches 100 or the alignment error is less than 0.0005. Therefore, based on the 600 testing sets of point clouds, the alignment error is listed in Table6 where RMSE indicates the root mean square error. From this table, the proposed method exhibits the lowest coarse and fine alignment error than other sys-tems. Figure9a–c presents a real alignment example (ROI is extracted by Fig.8) of the proposed TNFN-based method, NNM, and ICP where the blue and red point clouds represent the testing and reference model data, respectively. From this figure, the fine alignment error of the proposed method, NNM, and ICP are 0.0558, 0.1121, and 0.0569 m, respectively. This result indicates that the proposed TNFN-based method can achieve high accuracy in real 3D point cloud data. Furthermore, regarding the alignment speed, the execution time of the pro-posed system, NNM, and ICP are 1.71, 2.13, and 7.93 s, respectively. Therefore, the proposed method demon-strates the higher alignment speed than NNM and ICP.
4.2.2 Alignment speed
In consideration of alignment speed, we calculate the average execution time of aligning 600 testing sets of point clouds. The results of the alignment speed are also listed in Table6. As shown in this table, the execution time of the Table 6 Results of alignment accuracy and execution time
Method Average RMSE (m) Average execution Time (s) TNFN-based coarse-to-fine alignment 0.0651 3.19 NNM 0.1357 4.21 ICP 0.0667 46.26
proposed TNFN-based method is shorter than those from NNM and ICP.
5 Conclusion
In this paper, DMHCCA is proposed for designing TNFNs. The proposed DMHCCA involves two-level evolutions: NULE and NWLE. NULE combines DELA and RLS to not only choose the group of fuzzy rules systematically but also increase the rate of convergence. NWLE proposed VAC and VAM to enable the mating and mutating of the variable length of chromosomes. The mutual evolution of NULE and NWLE would make the neurons and structure of network to be evolved locally and globally, respectively. According to the simulation results on benchmark, the proposed DMHCCA exhibits better performance than other learning methods. Besides, a 3D surface alignment task is utilized to examine the learning ability of DMHCCA. The experimental results show that the DMHCCA trained TNFN-based alignment method is superior to other align-ment systems.
Although DMHCCA can get better results in compari-son with other learning algorithms, it still has a limitation. Specifically, the number of hierarchical level is only two to execute the training of structure and parameters of neural fuzzy networks. As the application problem become more complex, there is a need to increase the hierarchical level to match the complex problem. Thus, in the future work, the multi-hierarchical level is taken into consideration of fur-ther investigation of how to cooperate these hierarchical levels to adapt the model to a complex problem.
Acknowledgments This work was funded by Contract NSC 99-2623-E-009-006-D from the National Science Council, Taiwan, R.O.C.
Appendix
The following contents list the abbreviations used in this paper.
Data mining based hierarchical cooperative coevolution-ary algorithm, DMHCCA; neuro-level evolution, NULE; network-level evolution, NWLE; data mining–based evolu-tionary learning algorithm, DELA; three-dimensional, 3D; back-propagation, BP; genetic algorithm, GA; enforced sub-populations, ESP; multi-groups cooperation-based symbiotic evolution, MGCSE; TSK-type neuro-fuzzy network, TNFN; regularized least square, RLS; variable antecedent-part crossover, VAC; variable antecedent-part mutation, VAM; hierarchical enforced subpopulations, HESP; self-adaptive method, SAM; data mining–based selection method, DMSM; elite-based reproduction strategy, ERS; traditional symbiotic
evolution, TSE; traditional genetic algorithm, TGA; region of interest, ROI; neural network method, NNM; iterative closest point, ICP.
References
1. Lin CT, Lee CSG (1996) Neural fuzzy systems: a neuro-fuzzy synergism to intelligent system. Prentice-Hall, Englewood Cliffs, NJ
2. Towell GG, Shavlik JW (1993) Extracting refined rules from knowledge-based neural networks. Mach Learn 13:71–101 3. Lin CJ, Lin CT (1997) An ART-based fuzzy adaptive learning
control network. IEEE Trans Fuzzy Syst 5(4):477–496
4. Wang LX, Mendel JM (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst Man Cybern 22(6): 1414–1427
5. Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 15:116–132
6. Juang CF, Lin CT (1998) An on-line self-constructing neural fuzzy inference network and its applications. IEEE Trans Fuzzy Syst 6(1):12–31
7. Riemiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceeding of IEEE international conference on neural networks, pp 586–591 8. Lin FJ, Lin CH, Shen PH (2001) Self-constructing fuzzy neural network speed controller for permanent-magnet synchronous motor drive. IEEE Trans Fuzzy Syst 9(5):751–759
9. Takagi H, Suzuki N, Koda T, Kojima Y (1992) Neural networks designed on approximated reasoning architecture and their application. IEEE Trans Neural Netw 3(5):752–759
10. Mizutani E, Jang J-SR (1995) Coactive neural fuzzy modeling. In: Proceeding of IEEE international conference on neural net-works, pp 760–765
11. Lin C-J, Chin C-C (2004) Prediction and identification using wavelet-based recurrent fuzzy neural networks. IEEE Trans Syst Man Cybern Part B 34(5):2144–2154
12. Narendra KS, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Trans Neural Netw 1:4–27
13. Juang CF, Lin CT (1999) A recurrent self-organizing neural fuzzy inference network. IEEE Trans Neural Netw 10(4):828–845 14. Mastorocostas PA, Theocharis JB (2002) A recurrent
fuzzy-neural model for dynamic system identification. IEEE Trans Syst Man Cybern 32(2):176–190
15. Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, Reading, MA
16. Koza JK (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA
17. Fogel LJ (1994) Evolutionary programming in perspective: the top-down view. In: Zurada JM, Marks RJ II, Goldberg C (eds) Computational intelligence: imitating life. IEEE Press, Piscata-way, NJ
18. Li M, Wang Z (2009) A hybrid coevolutionary algorithm for designing fuzzy classifiers. Inf Sci 179(12):1970–1983
19. Karr CL (1991) Design of an adaptive fuzzy logic controller using a genetic algorithm. In: Proceeding of the 4th international conference on genetic algorithms, pp 450–457
20. Lin CT, Jou CP (2000) GA-based fuzzy reinforcement learning for control of a magnetic bearing system. IEEE Trans Syst Man Cybern Part B 30(2):276–289
21. Juang CF, Lin JY, Lin CT (2000) Genetic reinforcement learning through symbiotic evolution for fuzzy controller design. IEEE Trans Syst Man Cybern Part B 30(2):290–302
22. Bandyopadhyay S, Murthy CA, Pal SK (2000) VGA-classifier: design and applications, IEEE Trans. Syst Man Cybern Part B 30(6):890–895
23. Carse B, Fogarty TC, Munro A (1996) Evolving fuzzy rule based controllers using genetic algorithms. Fuzzy Sets Syst 80(3): 273–293
24. Chen CH, Lin CJ, Lin CT (2009) Using an efficient immune symbiotic evolution learning for compensatory neuro-fuzzy controller. IEEE Trans Fuzzy Syst 17(3):668–682
25. Lin CJ, Chen CH, Lin CT (2011) An efficient evolutionary algorithm for fuzzy inference systems. Evol Syst 2(2):83–99 26. Gomez FJ (2003) Robust non-linear control through
neuroevo-lution, Ph. D. Disseration, The University of Texas at Austin 27. Gomez F, Schmidhuber J (2005) Co-evolving recurrent neurons
learn deep memory POMDPs. In: Proceeding of conference on genetic and evolutionary computation, pp 491–498
28. Lin CJ, Hsu YC (2007) Reinforcement hybrid evolutionary learning for recurrent wavelet-based neuro-fuzzy systems. IEEE Trans Fuzzy Syst 15(4):729–745
29. Hsu YC, Lin SF, Cheng YC (2010) Multi groups cooperation based symbiotic evolution for TSK-type neuro-fuzzy systems design. Exp Syst Appl 37(7):5320–5330
30. Lee JT, Wu HW, Lee TY, Liu YH, Chen KT (2009) Mining closed patterns in multi-sequence time-series database. Data Knowl Eng 68:1071–1090
31. Tanbeer SK, Ahmed CF, Jeong BS (2009) Parallel and distributed algorithm for frequent pattern mining in large database. IETE Tech Rev 26:55–65
32. Agrawal R, Srikant R (1994) Fast algorithm for mining associ-ation rules. In: Proceeding of the internassoci-ational conference on VLDB, pp 487–499
33. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of ACM-SIGMOD, Dallas, Tx, pp 1–12
34. Wu YT, An YJ, Geller J, Wu YT (2006) A data mining based genetic algorithm. In: Proceeding of IEEE workshop SEUS-WCCIA, pp 27–28
35. Shankar S, Purusothaman T (2009) Utility sentient frequent itemset mining and association rule mining: a literature survey and comparative study. Int J Soft Comput Appl 4:81–95 36. Sugeno M, Tanaka K (1991) Successive identification of a fuzzy
model and its applications to prediction of a complex system. Fuzzy Sets Syst 42(3):315–334
37. Lin SF, Cheng YC (2010) Two-strategy reinforcement evolu-tionary algorithm using data-mining based crossover strategy with TSK-type fuzzy controllers. Int J Innovat Comput Control 6(9):3863–3885
38. Cordon O, Herrera F, Hoffmann F, Magdalena L (2001) Genetic fuzzy systems evolutionary tuning and learning of fuzzy knowl-edge bases, advances in fuzzy systems-applications and theory, vol 19. World Scientific Publishing, NJ, USA
39. Cox E (2005) Fuzzy modeling and genetic algorithms for data mining and exploration, 1st edn. Morgan Kaufman Publications, San Francisco, USA
40. De Jong KA (1975) Analysis of the behavior of a class of genetic adaptive systems, Ph. D. Dissertation, Dep. Computer and Communication Sciences, Univ. Michigan, Ann Arbor, MI 41. Cowder RS (1990) Predicting the mackey-glass time series with
cascade-correlation learning. In: Proceedings of the 1990 con-nectionist models summer school, pp. 117–123
42. Moriarty DE, Miikkulainen R (1996) Efficient reinforcement learning through symbiotic evolution. Mach Learn 22:11–32 43. Rabbani T, van den Heuvel FA, Vosselmann G (2006)
Seg-mentation of point clouds using smoothness constraint. Proc IS-PRS 35:248–253
44. Liu H, Yan J, Zhang D (2006) Three-dimensional surface regis-tration: a neural network strategy. Neurocomputing 70:597–602 45. Zhang J, Ge Y, Ong SH, Chui CK, Teoh SH, Yan CH (2008)
Rapid surface registration of 3D volumes using a neural network approach. Image Vis Comput 26:201–210
46. Chetverikov D, Stepanov D, Kresk P (2005) Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm. Image Vis Comput 23:299–309
47. Liu YH (2004) Improving ICP with easy implementation for free-form surface matching. Pattern Recogn 37:211–226
48. Besl P, Mckay N (1992) A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256