Thesis Organization - 數量型資料庫中模糊概念轉移探勘之研究

CHAPTER 1 INTRODUCTION

1.3 Thesis Organization

The rest of this thesis is organized as follows. The background and the related works including concept-drift, apriori algorithm, fuzzy data mining, fuzzy C-means and fuzzy membership functions are reviewed in Chapter 2. The proposed mining algorithm for fuzzy concept-drift patterns of membership functions for varying quantitative databases is proposed in Chapter 3. Chapter 4 then describes the mining algorithm proposed for finding the two kinds of fuzzy concept-drift patterns with the experimental results. Some experimental evaluation for the approach is also given in the chapter. The approach which considers the concept drift of fuzzy association rules and membership functions at the same time is also stated. Finally, the conclusion and future work are given in Chapter 6.

CHAPTER 2

REVIEW OF RELATED WORK

In this chapter, some related studies on concept-drift, apriori algorithm, fuzzy data mining, fuzzy C-means and fuzzy membership functions are briefly reviewed.

2.1 Concept Drift

In recent years, the field of concept-drift has become popular. Tsymbal proposed concept-drift as finding patterns what changes over time in unexpected ways [28]. For example, assume at time t there is an association rule "if buying milk, then buying bread" and at time t + k; there is another rule "if buying milk, then buying apple" mined.

The latter rule changes from a former rule in the consequence part along with the time.

This change is a type of concept-drift pattern.

Based on the concept-drift patterns, the traditional method on data mining has been used in various research areas [29-32]. When the concept-drift occurs, the classification model built by using old dataset is not suitable for predicting new coming dataset.

However, in a real life users might be very interested in the rules of concept-drift. For example, doctors would desire to know the main causes more for disease variation since such rules would enable them to diagnose patients more correctly and quickly. Lee et

al. proposed a rule based on the concept of a decision tree to mining the concept-drift rule [33] L.C. Cheng et al. then proposed utilizes group information at different time-points for consensus sequence mining. The aim is to find the change in consensus sequence at different times so as to understand changes in group preference moving closer to the authentic idea of that group, this study classifies the change of group consensus sequence into five types (emerging patterns, emerging ambiguous pairs, order change sequence, addition/removal of items and significant changes ) [34].

The continued growth of Email usage, which is naturally followed by an increase in unsolicited emails so called spams, motivates research in spam filtering area. In the context of spam filtering systems, addressing the evolving nature of spams, which leads to obsolete the related models. Hayat et al. proposed an adaptive spam filtering system based on language model is proposed which can detect concept-drift based on computing the deviation in email contents distribution [35].

The concept-drift was used to data classification and data steam [36-42]. Concept-drift has been a very important concept in the realm of data streams. Streaming data may consist of multiple drifting concepts each having its own underlying data distribution. Concept-drift occurs when a set of examples has legitimate class labels at one time and has different legitimate labels at another time. Padmalatha et al. proposed a comprehensive overview of existing concept -evolution in concept drifting techniques

along different dimensions and it provides lucid vision about the ensemble's behavior when dealing with concept-drift.

Song et al. defined three types of concept-drift patterns in the association rule mining [24]. These are emerging patterns, unexpected changes, and added/perished patterns. The different types of concept-drift patterns indicate the different meaning of concept-drift for the association rule. An evaluative function was designed to calculate the degrees of concept-drift, if the degree of the concept-drift is between two rules and is bigger than a predefined threshold. The function will generate concept-drift patterns and related concept-drift rules immediately. Assume there are two rules rit: A→B with sup (A→B) = a and rⁱ^t+k: C→D with sup (C→D) = b, where rⁱ^t is the i-th rule of rule set RS^t at time t. rit+k is the j-th rule of rule set RS^t+k at time t+k, and A, B, C, D are itemsets.

The definitions of the three patterns are given below [24].

Definition 1. (Emerging Patterns) If a rule r^k is an emerging pattern, then the following two conditions should be satisfied: (1) The conditional and the consequent parts of rules rit and rit+k are the same. That is, A = C and B = D; (2) Supports of rules rit and rit+k are different. That is sup (A→B) ≠ (C→D).

Example 1. rit: Bread = high → Mike = Large (support = 0.2), rⁱ^t+k: Bread = high → Mike = Large (support = 0.5). In this case, rⁱ^t+k is the emerging patterns to rit if we set the minimum threshold at 0.2, there are two rules that have the same structure

and the difference of the support for these two rules is 0.3. The emerging patterns will be generated immediately.

Definition 2. (Unexpected Change) If a rule rk is an unexpected change, then the following two conditionals should be satisfied: (1) the conditional parts of rule rit and rit+k are same. That is A = C; (2) the consequent parts of rule rit and rit+k are different.

That is B = D.

Example 2. rit: Bread = high → Mike = Large, rⁱ^t+k: Bread = high → Mike = Low. In this case, rule rit+k is an unexpected consequent change with respect to ritsince the conditional parts of rit and rit+k are similar, but the consequent parts of the two rules are quite different.

Definition 3. (Added/Perished Rules) If rit+k is an added rule it means that conditional part C and the consequent part D of rit+k are different from those of any rit

in RS^t. If rit is a perished rule, it means that the conditional part A and the consequent part B of rit are different from those anyrit+k in RS^t+k.

Example 3. rit: Bread = high → Mike = Large, rⁱ^t+k: Vegetable= high → Apple

= High. In this case, rit+k is the Added change with respect to rit since the conditional parts and consequent part of rit and rit+k are different.

2.2 Apriori Algorithm

The goal of data mining is to discover important associations among items such that the presence of some items in a transaction will imply the presence of some other items. To achieve this purpose, Agrawal and his co-workers proposed several mining algorithms based on the concept of large itemsets to find association rules in transaction data [8, 43-45]. The processes of Apriori algorithm as following :

INPUT: D : quantitative transaction databases; α : the minimum support threshold.

OUTPUT: The L large items.

STEP 1: Calculate the number (count) of each item in the transaction data. Assume the total number of transaction data is n. If one item appears more than once in a transaction, count its occurrence only once. Set the support (support) of each item as count/n.

STEP 2: Check whether the support of each items is larger than or equal to the predefined minimum support value α. If the item satisfies the condition, put it in the set of large l-itemsets (L1).

STEP 3: If L1 is null, then exit the algorithm; otherwise, do the next step.

STEP 4: Set r = 1 where r is the number of item in the large itemsets currently being processed.

STEP 5: Generate the candidate set Cr+1 by joining Lr.

STEP 6: Calculate the number (counts) of each candidate (r+1)-itemset s in Cr+1; set its support (supports) as conuts/n.

STEP 7: Check whether the support of each candidate (r+1)-itemset s is larger than or equal to the predefined minimum support value α. If the item satisfies the condition, put it in the set of large (r+1)-itemsets (Lr+1).

STEP 8: If Lr+1 is null, then exit the algorithm; otherwise, set r = r + 1 and repeat Steps 5 to 7.

Return L

2.3 Fuzzy Data Mining

In data mining, the patterns with high frequency of occurrences will be found out as association rules, and these association rules can be used to analyze and describe the purchase behavior. However, since the traditional data mining methods do not take quantitative information into consideration, some valuable rules may thus be lost.

To solve this problem, Kuok et al. developed a new issue, fuzzy data mining [46], which applied the fuzzy set theory to traditional data mining. The main reason is that fuzzy set theory has been widely used to various applications due to its simplicity and similarity to human reasoning. According to key steps of the proposed approach, the quantitative values in transaction were first converted into linguistic term through

membership functions, and then the count of a fuzzy itemset in a transaction could be calculated by the product of fuzzy regions of all fuzzy of all fuzzy terms of the itemset in that transaction. Finally, the fuzzy association rule, which satisfied the user-specified minimum fuzzy confidence threshold, could be derived from a set of fuzzy frequent itemsets with high fuzzy frequency.

Different from the calculation function in Kuok et al. study, Hong et al. proposed a fuzzy mining algorithm to find fuzzy association rules by deriving quantitative data into fuzzy values [47]. Hong et al. applied the fuzzy minimum operator in fuzzy set theory to evaluate counts of fuzzy itemsets in a set of transactions, and they also proposed an apriori-based mining algorithm to efficiently find fuzzy association rules.

In addition, Hong et al. investigated the trade-off problem between the number of fuzzy rules and computation time. Besides, Hong et al. also proposed a fuzzy weighted data mining approach based on the support-confidence framework to extract weighted association rules with linguistic terms from quantitative transactions [48].Because of the success of fuzzy mining, many extend approaches are widely proposed [49, 50].

2.4 Fuzzy C-means

Fuzzy C-means (FCM) is a popular method for clustering that uses fuzzy theory and allows one piece of data to belong to two or more clusters [51]. FCM is frequently

used in pattern recognition. FCM is based on the minimization of the following objective function:

(2-1) Where m is a number which is greater than 1, uij is the fuzzy value of xi for the membership function in the cluster j, xi is the i-th of d-dimensional measured data, cj is the d-dimension center of the cluster j, and ||*|| is a norm expressing the similarity between any measured data and the center. Euclidean distance is commonly used.

This iteration will stop when max_ij u_ij^k⁻¹−u_ij^k <β and where _β is a termination criterion between 0–1, whereas k is the iteration step(s). This procedure converges to a local minimum or a saddle point of Jm. The processes of Fuzzy C-means Algorithm as following :

INPUT: U : the degree of membership for each individual; c : the number of cluster; m : fuzziness index; i is used to identify the individuals; j is used to identify the clusters and β : termination criterion;

OUTPUT: U (The degree of membership for each individual).

STEP 1: Initialize U = [uij] matrix, U⁽⁰⁾.

STEP 2: At k-step: calculate the centers vectors C^(k) = [cj] with U^(k). C is the set of the center of membership functions which are used to calculate the degree of membership for each individual.

(2-2)

STEP 3: Update U^(k), U^(k+1).

(2-3)

STEP 4: If then STOP; otherwise return to step 2.

Return matrix U

2.5 Fuzzy Membership Functions

In this part, membership function in fuzzy set theory are introduced and ways to find scalar cardinality values of items according to their corresponding membership function is described. In fuzzy set theory a membership function can be represented using a graph that defines how each point in the input space is mapped to membership value between 0 and 1.

Currently, there are two common methods for encoding membership functions for items. We the fuzzy regions were encoded as Parodi and Bonelli did [52], the encoding stored each fuzzy region Rjk as an isosceles-triangle membership functions similar to Figure 2.1 with the (c, w) pairs. Where c indicates the center abscissa of a membership function and w represents half the spread of a membership function.

)

Figure 2.1: Membership functions

The second encoding approach is by using the 2-tuple linguistic representation model [53]. Take the set of membership functions MFj for the item Ij as an example.

They are encoded as a substring of cjlLRjl…cjkLRjk…cj|Ij|LRj|Ij|, where cjk and LRjk are the center abscissa and lateral displacement of k-th membership function for item Ij. The scheme is shown in Figure 2.2.

Figure 2.2: 2-tuple linguistic for membership functions

Note that the half spreads of membership functions are predefined in the second encoding method. Assume there are m items, the entire membership functions for all

1.0

Membership value

Quantity

Rj1 Rjk Rjl

cj1

wi1 cjk

wik cjl

wil

1.0

Membership value

Quantity 𝑐𝑐_𝑗𝑗1

𝑤𝑤_𝑖𝑖1 𝑐𝑐_{𝑗𝑗𝑗𝑗}

𝑤𝑤_{𝑖𝑖𝑗𝑗} 𝑐𝑐𝑗𝑗𝑗𝑗

𝑤𝑤_{𝑖𝑖𝑗𝑗} 𝐿𝐿𝐿𝐿𝑗𝑗1 𝐿𝐿𝐿𝐿𝑗𝑗𝑗𝑗 𝐿𝐿𝐿𝐿_{𝑗𝑗𝑗𝑗}

items are encoded by concatenating substrings of MF1, MF2, …, MFj, …, MFm. For expamle, the membership function of an item A is shown in Figure 2.3, and its has three fuzzy regions : Low, Middle and High.

3.0 6.0 9.0

Figure 2.3: The membership functions of the three items A, B, C for this example According to the membership shown in Figure 2.3, different quantities of items in transactions database can be represented by different degree value in different regions.

Table 2.1 is a transaction database. Item A appears in two transactions, Trans 2 and Trans 3, as shown in Table 2.1, and quantity value in the transactions are 6 and 3, respectively. Also, the membership functions of item A is shown in Figure 2.3 of, and it includes three regions, Low, Middle and High.

Table 2.1: The set of three quantitative transaction data for this example ID Expanded Items

Trans 1 (A, 0)(B, 9)(C, 2) Trans 2 (A, 6)(B, 3)(C, 0) Trans 3 (A, 3)(B, 0)(C, 5)

According to the membership function of A in Figure 2.3, its two quantity values in region A.Low are pointed to 0 and 0.5, and the quantity values in region A.Middle are pointed to 0.5 and 0.5, and the quantity values in other region A.High are pointed to 0 and 0.5, respectively. After that, the quantity value of item A in the three transaction is then converted into a fuzzy set (A.Low, 0.5 + A.Middle, 0.5). All other transactions that include item A are processed similarly. Thus, corresponding fuzzy sets of item A in the four transactions, Trans2 are all, as shown in Table 2.2.

Table 2.2: The fuzzy sets converted for transactions

TID Fuzzy set

CHAPTER 3

CONCEPT DRIFT FOR FUZZY MEMBERSHIP FUNCTIONS

3.1 Definitions and Review Fuzzy Membership Functions

In this chapter, we will present the concept-drift patterns for fuzzy membership functions (CDMF).

3.1.1 Fuzzy Membership Functions by Fuzzy C-means

In this part, we propose a simple method to generate a set of membership functions by FCM. Each membership function is designed as a triangle and encoded as a pair (c, w). The peak of the triangle is located at c and the distance between c and left acme is

w, if we need to generate n membership functions for each item. The proposed

algorithm will obtain an n cluster center by using FCM as described in chapter 2.5. In addition, each center will be as the location c of the peak of triangle (membership function) then calculate the span w as the distance between the locations of the peak in this triangle with the previous one with the first one is the distance between the locations of the peak with 0. Figure 3.1 shows an example.

In Figure 3.1, the set of membership functions MFj for the item Ij are represented

as a substring of cj1wj1…c1|Ij|w1|Ij|, where |Ij| is the number of membership functions of Ij.

Figure 3.1: Membership functions of an item Ij

Membership function plays a role in converting commodity items into something similar with human semantics. Figure 3.2 shows membership functions set for apples purchased in a year. Figure 3.2 consists of three membership functions representing low, medium, and high for each different purchase amount. If we buy five apples, the low fuzzy value is equal to 0.4, the medium fuzzy value is equal to 0.6, and the high fuzzy value is equal to 0.

1.0

Membership value

Quantity MF1 MF2 MFj

cj1

wi1

cjk

wik

cjl

wil

Figure 3.2: Membership functions set that apple was purchased in a year.

Additionally, for this example, we know the status of the concept from the membership functions. A purchasing amount less than three projects on behalf of the linguistic term group with low membership values will reach 1. A purchasing amount of six projects on behalf of the linguistic term group medium membership values will reach 1. A purchasing amount greater than nine projects on behalf of the linguistic term group high membership values will reach 1. We can regard the data as representative of value linguistic terms and observes the changes at different times.

3.1.2 Concept-drift Patterns for Fuzzy Membership Functions

In this part, we studied three different concept-drifts of fuzzy membership functions. The first concept-drift is the change of the representative value for the

3.0 6.0 9.0

1.0

Apple

Low Medium High

0.6 0.4 Membership

value

Quantity

linguistic term group (the center of membership function). The second concept-drift is the change of the linguistic term range. The third concept-drift is the change of fuzzy support for the linguistic terms. Each kind of variant degree of concept-drift is described below.

(A) The change of the representative value for the linguistic term

Figure 3.2 shows the membership functions of the purchasing amount of apples over the last year and Figure 3.3 shows the membership functions of the purchasing amount of apples for this year. In the low linguistic term group, the representative value, which is the center of membership functions reduced from three to two. In the high linguistic term group, the representative value increases from nine to ten. This represents the concepts of the low and the high linguistic term groups have already changed. The concept of the medium linguistic term group is retained as the original status.

Figure 3.3: Membership functions for the purchasing amount of apples in this year.

Formula (3-1) shows the variant degree of the representative value of a linguistic

term.

𝑐𝑐𝑐𝑐𝐿𝐿𝑐𝑐 = |𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝐶𝐶_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

(∑^𝑁𝑁_𝑖𝑖=1𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 )/𝑁𝑁 (3-1) Where D^t and D^t+k are the transaction databases by different times or different places. C is a representative value of some linguistic terms for database t. n is used to identify a commodity items, m is used to identify a linguistic terms and N is the number of all linguistic terms, respectively.

(B) The change of the linguistic term range

The meaning of the range of membership functions is the influence of a linguistic term. For example, although the representative value of the medium linguistic term has not been changed in Figure 3.2 and Figure 3.3, the range of the membership functions for this year is larger than the previous year. The influence (scope) of the medium

2.0 6.0 10.0

1.0

Apple

Low Medium High

Membership value

Quantity

linguistic term group is bigger than before.

Formula (3-2) shows the variant degree of the range of a membership functions.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = |𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡+𝑗𝑗|

(𝐶𝐶𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐶𝐶_𝑛𝑛1^𝑡𝑡 )/(𝑁𝑁 − 1) (3-2) Where D^t and D^t+k are the transaction databases. D is the linguistic term range. C is a representative value of some linguistic terms for database t. n is used to identify a commodity items, m is used to identify a linguistic terms and N is the number of all linguistic terms, respectively.

(C) The change of fuzzy support for the linguistic term

A change in fuzzy support represents group size changed for this linguistic term group. We can use value rules for this type of concept change. For example, people that buy expensive mobile phones this year are greater than last year.

Formula (3-3) is shown the variant degree of the fuzzy support for a membership

function.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 =|𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 (3-3)

Where sup is the fuzzy support for a specific membership function (linguistic term) of some item.

The proposed algorithm will compare the membership function with a predefined threshold after calculating its change. If the degree is larger than the threshold and the related concept-drift patterns will be immediately generated. The detailed algorithm is

described below.

3.2 The Proposed CDMF Mining Algorithm

In this part, the proposed approach that combines concept-drift, FCM algorithm is described as follows:

INPUT: D^t、D^t+k : databases; I : the number of item; S : concept-drift rules sets; M : the

number of linguistic term; α: linguistic term threshold; β : membership functions threshold and γ: support threshold;

OUTPUT: The concept-drift rule for fuzzy membership functions.

STEP 1: The two database generate fuzzy membership functions for each item via the

在文檔中數量型資料庫中模糊概念轉移探勘之研究 (頁 16-0)