• 沒有找到結果。

An ACS-based Framework for Fuzzy Data Mining

N/A
N/A
Protected

Academic year: 2021

Share "An ACS-based Framework for Fuzzy Data Mining"

Copied!
29
0
0

加載中.... (立即查看全文)

全文

(1)

Accepted Manuscript

An ACS-based Framework for Fuzzy Data Mining

Tzung-Pei Hong, Ya-Fang Tung, Shyue-Liang Wang, Min-Thai Wu, Yu-Lung Wu

PII: S0957-4174(09)00350-9

DOI: 10.1016/j.eswa.2009.04.016

Reference: ESWA 3685

To appear in: Expert Systems with Applications Received Date: 1 September 2008

Accepted Date: 2 April 2009

Please cite this article as: Hong, T-P., Tung, Y-F., Wang, S-L., Wu, M-T., Wu, Y-L., An ACS-based Framework for Fuzzy Data Mining, Expert Systems with Applications (2009), doi: 10.1016/j.eswa.2009.04.016

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

(2)

ACCEPTED MANUSCRIPT

An ACS-based Framework for Fuzzy Data Mining*

Tzung-Pei Hong1, 2, Ya-Fang Tung3, Shyue-Liang Wang4, Min-Thai Wu2, Yu-Lung Wu3

1

Department of Computer Science and Information Engineering National University of Kaohsiung, Kaohsiung, 811, Taiwan.

2

Department of Computer Science and Engineering National Sun Yat-sen University, Kaohsiung, 804, Taiwan

3

Institute of Information Management I-Shou University, Kaohsiung, 840, Taiwan

4

Department of Information Management

National University of Kaohsiung, Kaohsiung, 811, Taiwan.

E-MAIL: tphong@nuk.edu.tw, m9522030@stmail.isu.edu.tw, slwang@nuk.edu.tw, d953040015@mail.nsysu.edu.tw, wuyulung@isu.edu.tw

Abstract

Data mining is often used to find out interesting and meaningful patterns from huge databases. It may generate different kinds of knowledge such as classification rules, clusters, association rules, and among others. A lot of researches have been proposed about data mining and most of them focused on mining from binary-valued data. Fuzzy data mining was thus proposed to discover fuzzy knowledge from linguistic or quantitative data. Recently, ant colony systems (ACS) have been successfully applied to optimization problems. However, few works have been done on applying ACS to fuzzy data mining. This thesis thus attempts to propose an ACS-based framework for fuzzy data mining. In the framework, the membership functions are first encoded into binary bits and then fed into the ACS to search for the optimal set of membership functions. The problem is then transformed into a multi-stage graph, with each route representing a possible set of membership functions. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database. At last, experiments are made to make a comparison with other approaches and show the performance of the proposed framework.

Keywords: Ant colony system, Data mining, Fuzzy set, Membership function, Association rule.

---

* This is an extended version of the paper "Extracting membership functions in fuzzy data mining by ant colony systems", presented in The 2008 International Conference on Machine Learning and Cybernetics.

(3)

ACCEPTED MANUSCRIPT

1. Introduction

Data mining is most commonly used in attempts to induce association rules from transaction data. An association rule is an expression X→Y, where X is a set of items and Y is a single item [1]. It means in the set of transactions, if all the items in X exist in a transaction, then Y is also in the transaction with a high probability. For example, assume whenever customers in a supermarket buy bread and butter, they will also buy milk. From the transactions kept in the supermarkets, an association rule such as "Bread and Butter → Milk" will be mined out. Most previous studies focused on binary valued transaction data. Transaction data in real-world applications, however, usually consist of quantitative values. Designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field.

Recently, the fuzzy set theory has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [12]. The theory has been applied in fields such as manufacturing, engineering, diagnosis, and economics. Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect with specific domains. As to fuzzy data mining, Hong et al. [11] proposed a mining approach that integrated fuzzy-set concepts with the Apriori mining algorithm to find fuzzy interesting itemsets and association rules in quantitative transaction data. In that approach, the memberships functions used for fuzzy data mining have to be defined in advance. In [8], a GA-based fuzzy data-mining method for extracting both association rules and membership functions from quantitative transactions was thus proposed. The proposed GA-based method was divided into two phases: mining membership functions and mining fuzzy association rules. In the phase of mining membership functions, GA is used to derive the membership functions suitable for mining problems. In the phase of mining fuzzy association rules, the best membership functions derived by genetic algorithms are used to

(4)

ACCEPTED MANUSCRIPT

fuzzify the quantitative transactions. Then a fuzzy mining approach proposed in [9] can be used to find fuzzy association rules.

Recently, Ant Colony Systems (ACS) have been successfully applied to optimization problems. They are inspired from the behavior of social insects and are s heuristic approach. Ants deposit their chemical trails called “pheromone” on the ground for communicating with others. According to the pheromone, ants can find the shortest path between the source and the destination. The characteristics of an ant colony include positive feedback and distributed computation. It also uses a constructive greedy heuristic [16] to search for solutions.

The research about data mining based on the ant colony system is still rare. Previous works on ACS-based rule discovery were proposed by Parpinelli [22] and Cordon et al. [4], in which they proposed the mining of classification rules for fuzzy control systems. Very few other researches explore the association rules. Therefore, in this work, we propose an ACS-based framework to extract membership functions from quantitative data for fuzzy data mining. Numerical experiments on the proposed algorithm are also performed to show its effectiveness.

The remaining parts of the paper are organized as follows. Section 2 reviews ACS and fuzzy data mining. An ACS-based mining framework is then presented in Section 3. The details about how to use ACS on fuzzy data mining are explained in Section 4. The proposed algorithm based on the above framework is described in Section 5. An example demonstrating the proposed algorithm is given in Section 6. Numerical simulations are shown in Section 7. Conclusion and future work are given in Section 8.

2. Background

This section reviews some basic concepts related to this paper. They are fuzzy data mining and ant colony systems.

(5)

ACCEPTED MANUSCRIPT

2.1 Fuzzy Data Mining

The fuzzy set theory was proposed by Zadeh [31]. His primary idea is to use natural languages to represent the concepts, in which words may have ambiguous meanings. It may be especially useful for quantifying and reasoning. Fuzzy sets can be thought as one of the extensions of traditional crisp sets, in which each element must either be in or not in a set. The role of fuzzy sets in data mining helps transform quantitative values into linguistic terms, thus reducing possible itemsets in the mining process. Hong et al. then proposed a fuzzy mining algorithm to mine linguistic association rules [9][10]. They first transformed each quantitative value into several fuzzy sets labeled with linguistic terms by membership functions. The algorithm then calculated the scalar cardinality of each linguistic term on all the transaction data. The mining process based on fuzzy counts is then performed to find fuzzy association rules. Hong et al. then modified the previous algorithm and proposed a new fuzzy data-mining algorithm for extracting both association rules and membership functions [8].

Besides, Kaya et al. proposed a GA-based clustering method to derive a predefined number of membership functions for getting a maximum profit within an interval of user-specified minimum supported values [13]. Later, the idea of a multi-objective GA to find a number of Pareto-optimal rule sets and an automatic method for mining association rules came into existence [14][15]. There is still a fairly large body of literature about fuzzy data mining [5][17][23][24][26]][28][29][30]. In this paper, we attempt to propose a new model based on the ant colony system for fuzzy data mining.

2.2 The Ant Colony System

The ant system was first introduced in 1991 [2][6], and then extended to the ant colony system [7]. The idea of the ant system is from the observation on the real colonies of ants searching for food. Ants are capable of cooperating to solve complex problems such as

(6)

ACCEPTED MANUSCRIPT

searching for foods, carrying food and so on. They can find the shortest path between their nests and food without using vision. They will deposit pheromone on the paths for their companions. When the next ants go through the paths, they select the path with high density of pheromone. Ants thus determine the next node on the route according to the pheromone density. Once all ants have terminated their tours, the amount of pheromone on the tours will have been modified.

The ant algorithms have thus been designed to simulate the above ant behavior for solving optimization problems. Especially, the process of modifying the amounts of pheromone on the tours is called the updating rule, which is designed to give more pheromone to the best path. Currently, the ant algorithms have been applied to solve several difficult NP-hard problems, such as Traveling Salesman Problem (TSP) [7], Quadratic Assignment Problem (QAP) [19][25], Vehicle Routing Problems (VRP) [16][21][27], Job Schedule Problem (JSP) [3] etc [18][20].

The Ant Colony System (ACS) [7] proposed by Dorigo et al. is based on the ant system [6] and is applied to extract membership functions in this paper. The algorithm for finding solutions to an optimization problem is shown in Figure 1 [7].

Figure 1. The ACS algorithm

The three rules used in the ACS algorithm are described below. Initialize

Loop

/* at this level each loop is called an iteration */ Each ant is positioned on a starting node Loop

/* at this level each loop is called a step */

Each ant applies a state transition rule to incrementally build a solution. A local pheromone updating rule is applied

Until

All ants have built a complete solution A global pheromone updating rule is applied Until

(7)

ACCEPTED MANUSCRIPT

1. State transition rule: It defines how an ant probabilistically changes its current state to a next state (node) to form a solution.

2. Global updating rule: It defines how the pheromone of the best tour passed by the ants will be updated after all ants have completed their tours.

3. Local updating rule: It defines how the pheromone of a path is updated when an ant constructs the path.

3. The ACS-based fuzzy-mining framework

In this section, a fuzzy mining framework based on the ACS algorithm is proposed to discover both useful association rules and suitable membership functions from quantitative values. The proposed framework is shown in Figure 2 where each item has its own membership function set. These membership function sets are then fed into the ant colony system to search for the final appropriate sets. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database.

(8)

ACCEPTED MANUSCRIPT

Figure 2. The ACS-based framework for fuzzy data mining

The proposed framework modified the GA-based mining framework in [8]. A new encoding scheme is used to replace the chromosomes coding scheme in the GA-based approach. The framework is divided into two phases. The first phase searches for an appropriate set of membership functions for the items by the ACS mining algorithm. After the searching for the solutions in the first phase is finished, the best set of membership functions is used for fuzzy data mining in the second phase.

The ACS algorithm plays an important role in extracting the membership functions in Phase 1. We assume the parameters of a membership functions are discrete. We transform the

(9)

ACCEPTED MANUSCRIPT

extraction of membership functions into a route-search problem. A route then represents a possible set of membership functions. The artificial ants can then be used to find a nearly optimal solution.

4. Using ACS on fuzzy data mining

This section describes how to use ACS on fuzzy data mining in more details. It is divided into the following four subsections: representation of coding, pheromone initialization, state transition rule, and pheromone updating rule.

4.1 Representation of coding

Instead of representing the membership functions of all items as a long code, we will encode the membership function of each item into a binary code. Each item has a set of membership functions, which are assumed to be the shape of an isosceles triangle for simplicity. The membership functions stand for linguistic terms, such as low, middle, high. Each membership function thus has two parameters, center and half the spread (called span). First, we use n binary-bits to encode each center and each span of a membership function of an item according to the quantity range of the item in the database. For example, if the quantity range of an item is among 0 to 15, we may use four bits to encode it. Note that the other shapes of membership functions can also be used here.

Below, an example is given to illustrate the coding scheme. Assume there are three linguistic terms (membership functions) for an item Ij. Let Cj1, Cj2, Cj3 denote the three centers of the linguistic terms and Sj1, Sj2, Sj3 represent their spans. Also assume the center and the span of a linguistic term is encoded by four bits. The binary string for the centers of the item will thus be represented by 12 bits, such as {(0, 0, 1, 1) (0, 1, 1, 1) (1, 1, 0, 1)} in Figure 3. Similarly, the span of each linguistic term will be encoded as {(0, 0, 1, 1) (0, 1, 1, 0) (0, 1, 1,

(10)

ACCEPTED MANUSCRIPT

1)}. 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 Center Span

C

i1

S

i1

S

i2

S

i3

C

i2

C

i3

2

3

2

2

2

1

2

0

Figure 3. The representation of membership functions for an item

For the above case, the corresponding representation of membership functions for Item Ij and the function shape are showed in Figure 4, where the three centers are {3, 7, 13} and the three spans are {3, 6, 7} from the coding scheme.

3 7 1 Membership value Quantity 13

(C

j

, S

j

)

(3, 3) (7, 6) (13, 6)

Figure 4.The real number of membership functions for an item

After the membership functions are encoded, the ACS algorithm can then be applied to find the (nearly) optimal solution. As can be observed in Figure 3, each position of a string includes two bits, one for the center and the other for the span. Thus there are four cases, namely (0, 0), (0, 1), (1, 0), (1, 1). If we think the decision of each bit as a node, then it is a

(11)

ACCEPTED MANUSCRIPT

multi-stage decision problem. Although it can be solved by the dynamical programming techniques, it is still NP-hard. The ACS algorithm is thus used here to solve it. Just like TSP (traveling salesman problem), ants can choose one of the four alternatives at each node in every pass.

Therefore for the above example, there are twelve nodes and four selections of each node. An ant will thus pass through the nodes, each of which is composed of a pair of (Cj, Sj). When an ant finishes a route with twelve nodes, one possible set of membership functions of an item will be generated. Ants thus continue repeating this process until the termination condition is reached. The best set of membership functions (with the highest amounts of pheromone) obtained so far is thus output for fuzzy data mining. The idea can be shown in Figure 5, where there are 12 stages and each stage has four nodes. An ant thus has four alternatives at each stage.

Figure 5. The multi-stage graph formed for the proposed ant-mining algorithm

Let the i-th node chosen be represented as the pair (cji, Sji). Note that for the above example, the pair (cj5, Sj5) will represent the first pair of bits in the second membership function of Item Ij. At the beginning, an ant randomly chooses a node at each stage and forms a complete possible solution. Other ants will follow the trail with high pheromone afterward. For the example in Figure 3, the four nodes (0, 0), (0, 0), (1, 1), (1, 1) will be the path for the first membership function.

(12)

ACCEPTED MANUSCRIPT

4.2 Pheromone initialization

The initial amount of pheromone deposited at each path is 0.5. While an ant goes through the path, it will deposit pheromone on the path between two nodes. In this paper, we don’t consider the function η used in the TSP problem and only take the pheromone remained on paths into account. It is set at 1 for all the transitions for simplicity.

4.3 State transition rule

In this work, every ant selects the next node with a calculated probability. The state transition rule for an ant at node j to the next node i at the next stage is given as follows:

   > ≤ = ∈ 0 jn jn j R n q if q , P robability n with a p q q if i arg max( ){τ }, 0 , (1)

where R(j) is the set of nodes to which node j is connected, τjn is the pheromone on the edge

from node j to node n, q is a random variable uniformly distributed between [0, 1], and q0 (0≦q01) is a parameter predefined. The probability Pjn is defined as follows:

. ) ( ) ( ) ( ) (

∈ = j R n jn jn jn t t t P τ τ (2)

When the value of q is greater than the predefined parameter value, Equation 14 is used to decide the transition probability of each possible next node. The probability is thus based on the amounts of pheromone deposited on the paths.

(13)

ACCEPTED MANUSCRIPT

4.4 Pheromone updating rule

One of the differences between the ant colony system and the ant system is the pheromone updating rule. In our proposed method, there are two updating rules for artificial ants to search for optimal solutions and avoids the stagnating evolutionary process. One is the local updating rule and the other is the global updating rule. They are stated as follows.

4.3 Local updating rule

The local updating rule prevents ants from falling into local optima while they are searching on the paths. It can appropriately adjust the amount of pheromone while ants construct the path. The local updating rule is given below:

0 ) ( ) 1 ( ) 1 ( ρ τ ρ τ τjn t+ = − × jn t + × , (3)

where (1-ρ) is a parameter used to adjust the pheromone on the constructed edge and τ0 is the initial pheromone value of the edge.

4.3 Global updating rule

The goal of the global updating rule used in this paper is to allow the best paths can be further exploited. While all ants in an iteration have completed their trails, the pheromone of the best path is increased and that of the others is decreased. The global updating rule used in this paper is the iteration-best update, which is shown as follows:

jn jn

jn

α

τ

α

τ

τ

=

(

1

)

×

+

×

, (4)

where 0<α<1 is the pheromone decay parameter and ∆τjn is calculated as follows:

   − ∉ − ∈ × = ∆ path best iteration jn if path best iteration jn if value fitness jn , 0 ,

β

τ

, (5)

(14)

ACCEPTED MANUSCRIPT

where the parameter β is used to adjust the fitness value to the pheromone change. In Formula

5, an iteration-best path is the best path among the ones found by all the ants in each iteration. The fitness values of the tours constructed by the ants will be introduced in the next section. The one with the highest fitness value in each iteration will be the global best tour which is then applied to the global updating rule here.

5 The proposed ACS-based fuzzy data mining algorithm

In this section, we will propose a fuzzy mining algorithm to extract membership functions and fuzzy association rules based on the ACS approach. The proposed algorithm is described below.

5.3 Initial population

As mentioned above, each item will have a set of isosceles-triangular membership functions. The membership function stands for the linguistic terms such as low, middle, high. Transforming these quantitative values into linguistic terms requires a feasible population of database. Therefore, we need to initialize and update a population during the evolution process.

5.3 Fitness functions

In this work, we use the fitness function proposed by Hong et al. [8] to obtain a good set of membership functions. The fitness value of a possible solution is defined as:

y suitabilit

L f = | 1 |

, (6)

(15)

ACCEPTED MANUSCRIPT

functions obtained. The suitability factor used in the fitness function is designed to reduce the occurrence of the two bad kinds of membership functions shown in Figure 6, where the first one is too redundant, and the second one is too separate.

5 8 9

Low Middle High

Quantity

0

(a)

5 20 25

Low Middle High

Quantity

0

(b)

5 8 9

Low Middle High

Quantity

0

(a)

5 20 25

Low Middle High

Quantity

0

(b)

Figure 6. Two bad membership functions

The suitability of the membership functions includes two items, the overlap factor designed for avoiding the first bad case and the coverage factor designed for avoiding the second bad case. The calculation for the suitability for Item Ij is thus designed as follows:

). ( _

) (

_ factor Ij coverage factor Ij

overlap + (7)

The overlap factor is defined as follows:

. ] 1 ) 1 ), ) , ( ) , ( (( [ _ 1 1 1

≠ − = k j jk jk j R R min R R overlap max factor overlap (8)

The term overlap(Rjk and Rji) represents the overlap ratio of two membership functions Rjk

and Rji, which is defined as the overlap length divided by the minimum span (half the spread)

of the two functions. If the overlap length is larger than the span, then these two membership functions are thought of as a little redundant. Appropriate punishment must then be considered in this case. The coverage factor is defined as follows:

) ( ) ,... ( _ j jk j1 I max R R range 1 factor coverage = . (9)

(16)

ACCEPTED MANUSCRIPT

max (Ij) is the maximum quantity of Ij in the transactions. The coverage factor of a set of

membership functions for an item Ij is thus defined as the coverage range of the membership

functions divided by the maximum quantity of that item in the transactions. The more the coverage ratio is, the better the derived membership functions are.

5.3 The proposed mining algorithm

In addition to the parameters defined in the previous sections, the following parameters will be used: the number of artificial ants, the minimum pheromone ratio of an ant, the evaporation ratio of pheromone, the local updating ratio, and the global updating ratio. The proposed ACS-based algorithm for mining membership functions and fuzzy association rules is given as follow.

INPUT:

(1) n quantitative transaction data,

(2) a set of m items, which is with l predefined linguistic terms, (3) a support threshold α,

(4) a confidence threshold λ, and (5) a maximum number of iterations G.

OUTPUT: An appropriate set of membership functions for all items in fuzzy data mining.

STEP 1: Let p = 1, where p is used to keep the identity number of the item to be processed.

STEP 2: Let the multi-stage graph for the fuzzy mining problem be (N, E), where N is the set

of nodes and E is the set of edges (Figure 7). Also denote the j-node in the i-th stage as Nij, and the edge from Nij to N(i+1)k as Eijk. Initially set the pheromone of on every edge Eijk as 0.5.

(17)

ACCEPTED MANUSCRIPT

STEP 4: Sets up the complete route for each artificial ant Antq by the following substeps.

STEP 4.1: Selects the edges from start to end according to the state transaction rule.

STEP 4.2: Update the pheromone of the edges passed through by Antq according to the

local updating rule.

STEP 5: Evaluate the fitness value of the solution (membership functions) obtained by each

artificial ant according to the following substeps.

STEP 5.1: For each transaction datum Di, i = 1 to n, transfer its quantitative value vp for

item Ip into a fuzzy set fp according to the membership functions obtained from the ant. That is, fp is represented as:

, ... ... 2 2 1 1 pl pl pk pi p p p p Region f Region f Region f Region f + + + +

+ where Regionpk is the k-th

fuzzy term of item Ip, fpk is vp’s fuzzy membership value in the region, and l is the number of fuzzy membership functions.

STEP 5.2: The scalar cardinality of each region in the transactions is calculated as follows:

, 1 ) (

= = n i i pk pk f count (10)

wherefpk(i) is the fuzzy membership value of region Rpk from the i-th datum.

STEP 5.3: Check for each Rpk whether its countpk /n is larger than or equal to the minimum

support threshold α. If Rpk satisfies the above condition, put it in the set of large 1-itemsets (L1).

STEP 5.4: Calculate the fitness value of the solution from the ant by dividing the number

of large itemsets in L1 over the suitability. That is,

. | | 1 y suitabilit L fitness=

STEP 6: Once all the artificial ants find their entire routes, the one holding the highest fitness

(18)

ACCEPTED MANUSCRIPT

STEP 7: If the generation g is equal to G, output the current best set of membership functions

of item Ip for fuzzy data mining; otherwise, g = g + 1 and go to STEP 4.

STEP 8: If p m, set p = p + 1 and go to STEP 2 for another item; otherwise, stop the

algorithm.

The final set of membership functions output in STEP 7 and the 1-itemsets obtained are then used to mine fuzzy association rules from the given database. Our fuzzy mining algorithm proposed in [9] is then adopted to achieve this purpose.

Figure 7. The multi-stage graph for the fuzzy mining problem

6 An Example

In this section, an example is given to illustrate the proposed mining algorithm. Assume there are four items in the transaction database: A, B, C, D. The data set includes the five transactions shown in Table 1. The number in the parenthesis represents the quantity of the item. Assume each item has three fuzzy regions: Low, Middle and High. The final set of membership functions for each item can be derived as follows.

Table 1. The transaction database in the example

TID Transaction

T1 A(11), B(7), C(12)

(19)

ACCEPTED MANUSCRIPT

T3 A(8), B(10)

T4 C(7), D(13)

T5 A(5), C(11)

STEP 1: Let the item A be the first item to be processed.

STEP 2: The initial pheromone of all the edges is set as 0.5.

STEP 3: Assumethe maximum generation number G is given as 10. The current generation g

is initially set at 1.

STEP 4: The complete route for each artificial ant is formed by the following substeps.

STEP 4.1: All the initial pheromone of each path is set at 0.5. The probability for an ant to

choose any node at the next stage for t = 0 is thus the same and is calculated according to the state transition rule as follows:

25 . 0 4 1 5 . 0 1 * 5 . 0 * ) 0 ( = ∗ ∗ = ∗ =

n jn jn P η τ η τ

STEP 4.2: Let the parameter ρ be set at 0.1. Once one edge is chosen by an ant, its

pheromone is updated as:

275 . 0 5 . 0 1 . 0 25 . 0 9 . 0 ) 0 ( ) 1 ( ) 1 ( = −

ρ

τ

+

ρ

τ

0 = ∗ + ∗ =

τ

jn jn

STEP 5: The fitness value of the solution (membership functions) obtained by each artificial

ant is evaluated by the following substeps.

STEP 5.1: The quantitative value of each item in each transaction is transformed into a

fuzzy set according to the membership functions obtained from the ACS algorithm. For example, if an ant for item A gets a complete route as shown in Figure 8, the binary codes will be transformed into the three membership functions of (2, 2), (5, 2) and (10, 5), where the first numbers in the brackets are the centers of membership functions and the second ones are the spans.

(20)

ACCEPTED MANUSCRIPT

Figure 8. A solution from an ant

Take the transaction T1 as an example. The contents of T1 include (A, 11), (B, 7) and (C, 12). The amount “11” of item A is then transformed into the following fuzzy set by using the above membership functions:

0/A.Low + 0/A.Meddle + 1/A.High. The results for the other transactions on item A are shown in Table 2.

Table 2. The fuzzy set and the counts for item A

A1 Fuzzy Sets T1      + + High Mid Low 1 0 0 T2      + + High Mid Low 0 0 7 . 0 T3      + + High Mid Low 5 . 0 0 0 T4      + + High Mid Low 0 0 0 T5      + + High Mid Low 0 1 0 count L(0.7), M(1), H(1.5)

(21)

ACCEPTED MANUSCRIPT

the count value. For example, the scalar cardinality of A.Low = (0+0.7+0+0+0) = 0.7. The results for item A are shown in the last row of Table 2.

STEP 5.3: The support of each fuzzy region is then compared with the predefined

minimum support α. If the support is equal to or larger than α, then the fuzzy region is put in the large 1-itemsets (L1). Assume α is set at 0.18 in the example. For item A, both A.Middle and A.High are large and are thus put in

L1.

STEP 5.4: The fitness value of the solution from the ant is calculated by dividing the

number of large itemsets in L1 over the suitability. In this example, the suitability of the membership functions for item A is equal to 1. The number of large itemsets for item A is 2 from the above substep. The fitness value is thus 2/1(=2).

Now assume there are five artificial ants in the example and the membership functions constructed by the five artificial ants for item A are shown in Figure 9.

Figure 9. The five membership functions constructed from the five ants

(22)

ACCEPTED MANUSCRIPT

evaluated by the above substeps, with the results shown in Table 3. The results for the other items can be similarly derived.

Table 3. The fitness values of the five ants for item A

A1 A2 A3 A4 A5 T1      + + High Mid Low 1 0 0       + + High Mid Low 0 7 . 0 1 . 0       + + High Mid Low 2 . 0 8 . 0 0       + + High Mid Low 1 0 0       + + High Mid Low 4 . 0 0 0 T2      + + High Mid Low 0 0 7 . 0       + + High Mid Low 1 0 5 . 0      + + High Mid Low 0 0 6 . 0       + + High Mid Low 1 0 0       + + High Mid Low 0 0 1 T3      + + High Mid Low 5 . 0 0 0       + + High Mid Low 1 0 65 . 0      + + High Mid Low 0 6 . 0 4 . 0       + + High Mid Low 1 0 0       + + High Mid Low 0 8 . 0 0 T4      + + High Mid Low 0 0 0       + + High Mid Low 0 0 0       + + High Mid Low 0 0 0       + + High Mid Low 0 0 0       + + High Mid Low 0 0 0 T5      + + High Mid Low 0 1 0       + + High Mid Low 0 0 7 . 0       + + High Mid Low 0 0 1       + + High Mid Low 0 0 9 . 0       + + High Mid Low 0 4 . 0 5 . 0 count L(0.7), M(1), H(1.5) L(1.95), M(0.7), H(0) L(2), M(1.4), H(0) L(2.4), M(0.7), H(0) L(1.5), M(1.2), H(0.4) L1 2 1 2 1 2 Coverage 1 1 1 1 1 Overlap 0 0 0 0 0 Fitness 2 1 2 1 2

STEP 6: Once all the artificial ants find their entire routes, the one holding the highest fitness

value will be used to update the pheromone according to the global updating rule. In this example, there are three sets of membership functions (A1, A3, A5) with the highest fitness value. One of the three sets of membership functions is then randomly chosen and the global updating rule is applied to it.

(23)

ACCEPTED MANUSCRIPT

of membership functions for item A is then output for fuzzy data mining.

STEP 8: The next item is handled until all items are processed.

The best set of membership functions (with the highest fitness value) for each item is then produced to derive fuzzy association rules.

7 Numerical Experiments

Experiments were made to show the performance of the proposed algorithm. The experiments were implemented in C/C++ on a personal computer with AMD Athlon(tm) 64 Processor 3200+ and 1 GB RAM. There were a total of 64 items and 10,000 transactions used in the experiments. The initial size of ants was set at 10, and the minimum support was set at 0.04. The parameters in the ACS algorithm were set as follows: the initial ratio of pheromone was 0.5, the evaporation ratio was 0.9, the local updating ratio was 0.1 and the global updating ratio was 0.9. The results are the average values by ten runs. The average fitness values of the artificial ants along with different numbers of generations are shown in Figure 10.

(24)

ACCEPTED MANUSCRIPT

It can be observed from Figure 10 that the average fitness values gradually increased and became quite stable after about three thousand generations. The numbers of large 1-itemsets along with different generations are shown in Figure 11. The curve also stabilizes after about three thousand generations.

Figure 11. The numbers of large 1-itemsets along with different numbers of generations

Figure 12 then shows the execution time of the ACS mining algorithm under different numbers of generations. The average execution time increases along with the generations.

(25)

ACCEPTED MANUSCRIPT

Figure 12. The execution time of the ACS mining algorithm

The proposed approach was then compared with the other two approaches along with different minimum-support values. One approach was based on the GA algorithm proposed by Hong et al. [8], and the other directly used the uniform fuzzy partition. We use three thousand generations as the principle generation number. The relationship between the numbers of large 1-itemsets under different minimum-support values is shown in Figure 13. The number of large 1-itemsets decreased along with the increase of the minimum support. It was quite consistent with the mining characteristics. Besides, the proposed approach could obtain the largest number of large 1-itemsets among the three approaches.

Figure 13. The comparisons of the three approaches on large 1-itemsets

8 Conclusion and future work

In this paper, we have looked into the issues of applying the ACS algorithm to extract membership functions for fuzzy data mining and have proposed an algorithm to achieve the purpose. An example is also given to demonstrate the proposed algorithm and numerical experiments are made to show the performance of the proposed algorithm. Experimental

(26)

ACCEPTED MANUSCRIPT

results show that it can get more knowledge amount than GA and than the uniform partition. However, more work needs to be done in the future. For example, the design of other heuristic functions in state transition and the definition of different fitness values may be further studied.

References

[1] Agrawal, R., & Srikant, R. (1994). Fast algorithm for mining association rules. Proceedings of International Conference on Very Large Databases, 487-499.

[2] Colorni, A., Dorigo, M., & Maniezzo, V. (1991). Distributed optimization by ant colonies. Proceedings of the First European Conference on Artificial Life, 134-142.

[3] Colorni, A., Dorigo, M., Maniezzo, V., & Trubian, M. (1994). Ant system for job-shop

scheduling. Operations Research. Statistics and Computer Science, 34, 39-53.

[4] Cordon, J. C., & Herrera, F. (2002). Learning fuzzy rules using ant colony optimization.

Proceedings of ANT2000 International Workshop on Ant Algorithms, 13-21.

[5] Cordón, A., Herrera, F., & Villar, P. (2001). Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base. IEEE Transactions on Fuzzy Systems, 9(4), 667-674.

[6] Dorigo, M., Maniezzo, V., & Colorni, A. (1996). Ant system: optimization by a colony of

cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26(1), 29-41.

[7] Dorigo, M. & Gambardella, L. M. (1997). Ant colony system: a cooperative learning approach to the traveling salesman problem, IEEE Transactions on Evolutionary Computation, 1(1), 53-66.

[8] Hong, T. P., Chen, C. H., Wu, Y. L. & Lee, Y. C. (2006). A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership

(27)

ACCEPTED MANUSCRIPT

functions. Soft Computing: A Fusion of Foundations, Methodologies and Applications, 1091-1101.

[9] Hong, T. P., Kuo, C. S., & Chi, S. C. (1999). Mining association rules from quantitative

data. Intelligent Data Analysis, 3(5), 363-376.

[10] Hong, T. P., Kuo, C. S., & Chi, S. C. (2001). Trade-off between time complexity and number of rules for fuzzy mining from quantitative data. Uncertainty, Fuzziness, and Knowledge-Based Systems, 9(5), 587-601.

[11] Hong, T. P., Kuo, C. S., & Wang, S. L. (2004). A fuzzy AprioriTid mining algorithm with

reduced computational time, Applied Soft Computing, 5(1), 1-10.

[12] Kandel, A. (1992). Fuzzy expert systems. CRC Press, (pp. 8-19). Boca Raton.

[13] Kaya, M., & Alhajj, R. (2003). A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining. Proceedings of IEEE International Conference on Fuzzy Systems (pp. 881-886).

[14] Kaya, M., & Alhajj, R. (2004). Integrating multi-objective genetic algorithms into clustering for fuzzy association rules mining, Proceedings of IEEE International Conference on Data Mining (pp. 431-434).

[15] Kaya, M., & Alhajj, R. (2005). Genetic algorithm based framework for mining fuzzy association rules. Fuzzy Sets and Systems, 152, 587-601.

[16] Kuo, R. J., Chiu, C. Y., & Lin, Y. J. (2004). Integration of fuzzy theory and ant algorithm for vehicle routing problem with time window. Fuzzy Information, 2, 925-930.

[17] Lee, Y. C., Hong, T. P., & Lin, W. Y. (2004). Mining fuzzy association rules with multiple minimum supports suing maximum constraints. Lecture Notes in Computer Science, 3214, 1283-1290.

[18] Maniezzo, V., & Arbonaro, A. (2000). An ants heuristic for the frequency assignment problem. Future Generation Computer Systems, 16, 927-935.

(28)

ACCEPTED MANUSCRIPT

assignment problem. Technical Report IRIDIA/94-28, IRIDIA, Universite Libre de Bruxelles, Belgium.

[20] Merkle, D., Middendorf, M., & Schmeck, H. (2002). Ant colony optimization for

resource-constrained project scheduling. IEEE Transactions on Evolutionary Computation, 6(4), 333-346.

[21] Montemanni, R., Gambardella, M., Rizzoli, A. E., & Donati, A. (2005). Ant colony system for a dynamic vehicle routing problem. Combinatorial Optimization, 10(4), 327-343.

[22] Parpinelli, R. S., Lopes, H. S., Freitas, A. A. (2001). An ant colony based system for data

mining: application to medical data. Proceedings of Genetic and Evolutionary Computation Conference, 791-798.

[23] Roubos, H., & Setnes, M., (2001). Compact and transparent fuzzy model and classifiers

through iterative complexity reduction. IEEE Transactions on Fuzzy Systems, 9(4), 516-524.

[24] Setnes, M., & Roubos, H. (2000). GA-based modeling and classification: complexity and

performance. IEEE Transactions on Fuzzy Systems, 8(5), 509-522.

[25] Stutzle, T., & Dorigo, M. (1999). ACO algorithms for the quadratic assignment problem,

In D. Corne, M., Dorigo & F. Glover (eds), New Ideas in Optimization. McGraw-Hill. [26] Subramanyam, R. B. V., & Goswami, A. (2005). A fuzzy data mining algorithm for

incremental mining of quantitative sequential patterns. Uncertainty, Fuzziness and Knowledge-Based Systems, 3(6), 633-652.

[27] Wade, A., & Salhi, S. (2004). An ant system algorithm for the mixed vehicle routing problem with backhauls. Metaheuristics: Computer Decision-Making, 699-719. Norwell, MA: Kluwer.

[28] Wang, C. H., Hong, T. P., & Tseng, S. S. (1998). Integrating fuzzy knowledge by genetic

(29)

ACCEPTED MANUSCRIPT

[29] Wang, C. H., Hong, T. P., & Tseng, S. S. (2000). Integrating membership functions and

fuzzy rule sets from multiple knowledge sources. Fuzzy Sets and Systems, 112, 141-154. [30] Yue, S., Tsang, E., Yeung, D., & Shi, D. (2000). Mining fuzzy association rules with

weighted items. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 1906–1911.

數據

Figure 1. The ACS algorithm
Figure 2. The ACS-based framework for fuzzy data mining
Figure 4.The real number of membership functions for an item
Figure 5. The multi-stage graph formed for the proposed ant-mining algorithm
+7

參考文獻

相關文件

After the Opium War, Britain occupied Hong Kong and began its colonial administration. Hong Kong has also developed into an important commercial and trading port. In a society

• helps teachers collect learning evidence to provide timely feedback &amp; refine teaching strategies.. AaL • engages students in reflecting on &amp; monitoring their progress

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Strategy 3: Offer descriptive feedback during the learning process (enabling strategy). Where the

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density