Fuzzy Membership Functions - REVIEW OF RELATED WORK

CHAPTER 2 REVIEW OF RELATED WORK

2.5 Fuzzy Membership Functions

In this part, membership function in fuzzy set theory are introduced and ways to find scalar cardinality values of items according to their corresponding membership function is described. In fuzzy set theory a membership function can be represented using a graph that defines how each point in the input space is mapped to membership value between 0 and 1.

Currently, there are two common methods for encoding membership functions for items. We the fuzzy regions were encoded as Parodi and Bonelli did [52], the encoding stored each fuzzy region Rjk as an isosceles-triangle membership functions similar to Figure 2.1 with the (c, w) pairs. Where c indicates the center abscissa of a membership function and w represents half the spread of a membership function.

)

Figure 2.1: Membership functions

The second encoding approach is by using the 2-tuple linguistic representation model [53]. Take the set of membership functions MFj for the item Ij as an example.

They are encoded as a substring of cjlLRjl…cjkLRjk…cj|Ij|LRj|Ij|, where cjk and LRjk are the center abscissa and lateral displacement of k-th membership function for item Ij. The scheme is shown in Figure 2.2.

Figure 2.2: 2-tuple linguistic for membership functions

Note that the half spreads of membership functions are predefined in the second encoding method. Assume there are m items, the entire membership functions for all

1.0

Membership value

Quantity

Rj1 Rjk Rjl

cj1

wi1 cjk

wik cjl

wil

1.0

Membership value

Quantity 𝑐𝑐_𝑗𝑗1

𝑤𝑤_𝑖𝑖1 𝑐𝑐_{𝑗𝑗𝑗𝑗}

𝑤𝑤_{𝑖𝑖𝑗𝑗} 𝑐𝑐𝑗𝑗𝑗𝑗

𝑤𝑤_{𝑖𝑖𝑗𝑗} 𝐿𝐿𝐿𝐿𝑗𝑗1 𝐿𝐿𝐿𝐿𝑗𝑗𝑗𝑗 𝐿𝐿𝐿𝐿_{𝑗𝑗𝑗𝑗}

items are encoded by concatenating substrings of MF1, MF2, …, MFj, …, MFm. For expamle, the membership function of an item A is shown in Figure 2.3, and its has three fuzzy regions : Low, Middle and High.

3.0 6.0 9.0

Figure 2.3: The membership functions of the three items A, B, C for this example According to the membership shown in Figure 2.3, different quantities of items in transactions database can be represented by different degree value in different regions.

Table 2.1 is a transaction database. Item A appears in two transactions, Trans 2 and Trans 3, as shown in Table 2.1, and quantity value in the transactions are 6 and 3, respectively. Also, the membership functions of item A is shown in Figure 2.3 of, and it includes three regions, Low, Middle and High.

Table 2.1: The set of three quantitative transaction data for this example ID Expanded Items

Trans 1 (A, 0)(B, 9)(C, 2) Trans 2 (A, 6)(B, 3)(C, 0) Trans 3 (A, 3)(B, 0)(C, 5)

According to the membership function of A in Figure 2.3, its two quantity values in region A.Low are pointed to 0 and 0.5, and the quantity values in region A.Middle are pointed to 0.5 and 0.5, and the quantity values in other region A.High are pointed to 0 and 0.5, respectively. After that, the quantity value of item A in the three transaction is then converted into a fuzzy set (A.Low, 0.5 + A.Middle, 0.5). All other transactions that include item A are processed similarly. Thus, corresponding fuzzy sets of item A in the four transactions, Trans2 are all, as shown in Table 2.2.

Table 2.2: The fuzzy sets converted for transactions

TID Fuzzy set

CHAPTER 3

CONCEPT DRIFT FOR FUZZY MEMBERSHIP FUNCTIONS

3.1 Definitions and Review Fuzzy Membership Functions

In this chapter, we will present the concept-drift patterns for fuzzy membership functions (CDMF).

3.1.1 Fuzzy Membership Functions by Fuzzy C-means

In this part, we propose a simple method to generate a set of membership functions by FCM. Each membership function is designed as a triangle and encoded as a pair (c, w). The peak of the triangle is located at c and the distance between c and left acme is

w, if we need to generate n membership functions for each item. The proposed

algorithm will obtain an n cluster center by using FCM as described in chapter 2.5. In addition, each center will be as the location c of the peak of triangle (membership function) then calculate the span w as the distance between the locations of the peak in this triangle with the previous one with the first one is the distance between the locations of the peak with 0. Figure 3.1 shows an example.

In Figure 3.1, the set of membership functions MFj for the item Ij are represented

as a substring of cj1wj1…c1|Ij|w1|Ij|, where |Ij| is the number of membership functions of Ij.

Figure 3.1: Membership functions of an item Ij

Membership function plays a role in converting commodity items into something similar with human semantics. Figure 3.2 shows membership functions set for apples purchased in a year. Figure 3.2 consists of three membership functions representing low, medium, and high for each different purchase amount. If we buy five apples, the low fuzzy value is equal to 0.4, the medium fuzzy value is equal to 0.6, and the high fuzzy value is equal to 0.

1.0

Membership value

Quantity MF1 MF2 MFj

cj1

wi1

cjk

wik

cjl

wil

Figure 3.2: Membership functions set that apple was purchased in a year.

Additionally, for this example, we know the status of the concept from the membership functions. A purchasing amount less than three projects on behalf of the linguistic term group with low membership values will reach 1. A purchasing amount of six projects on behalf of the linguistic term group medium membership values will reach 1. A purchasing amount greater than nine projects on behalf of the linguistic term group high membership values will reach 1. We can regard the data as representative of value linguistic terms and observes the changes at different times.

3.1.2 Concept-drift Patterns for Fuzzy Membership Functions

In this part, we studied three different concept-drifts of fuzzy membership functions. The first concept-drift is the change of the representative value for the

3.0 6.0 9.0

1.0

Apple

Low Medium High

0.6 0.4 Membership

value

Quantity

linguistic term group (the center of membership function). The second concept-drift is the change of the linguistic term range. The third concept-drift is the change of fuzzy support for the linguistic terms. Each kind of variant degree of concept-drift is described below.

(A) The change of the representative value for the linguistic term

Figure 3.2 shows the membership functions of the purchasing amount of apples over the last year and Figure 3.3 shows the membership functions of the purchasing amount of apples for this year. In the low linguistic term group, the representative value, which is the center of membership functions reduced from three to two. In the high linguistic term group, the representative value increases from nine to ten. This represents the concepts of the low and the high linguistic term groups have already changed. The concept of the medium linguistic term group is retained as the original status.

Figure 3.3: Membership functions for the purchasing amount of apples in this year.

Formula (3-1) shows the variant degree of the representative value of a linguistic

term.

𝑐𝑐𝑐𝑐𝐿𝐿𝑐𝑐 = |𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝐶𝐶_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

(∑^𝑁𝑁_𝑖𝑖=1𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 )/𝑁𝑁 (3-1) Where D^t and D^t+k are the transaction databases by different times or different places. C is a representative value of some linguistic terms for database t. n is used to identify a commodity items, m is used to identify a linguistic terms and N is the number of all linguistic terms, respectively.

(B) The change of the linguistic term range

The meaning of the range of membership functions is the influence of a linguistic term. For example, although the representative value of the medium linguistic term has not been changed in Figure 3.2 and Figure 3.3, the range of the membership functions for this year is larger than the previous year. The influence (scope) of the medium

2.0 6.0 10.0

1.0

Apple

Low Medium High

Membership value

Quantity

linguistic term group is bigger than before.

Formula (3-2) shows the variant degree of the range of a membership functions.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = |𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡+𝑗𝑗|

(𝐶𝐶𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐶𝐶_𝑛𝑛1^𝑡𝑡 )/(𝑁𝑁 − 1) (3-2) Where D^t and D^t+k are the transaction databases. D is the linguistic term range. C is a representative value of some linguistic terms for database t. n is used to identify a commodity items, m is used to identify a linguistic terms and N is the number of all linguistic terms, respectively.

(C) The change of fuzzy support for the linguistic term

A change in fuzzy support represents group size changed for this linguistic term group. We can use value rules for this type of concept change. For example, people that buy expensive mobile phones this year are greater than last year.

Formula (3-3) is shown the variant degree of the fuzzy support for a membership

function.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 =|𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 (3-3)

Where sup is the fuzzy support for a specific membership function (linguistic term) of some item.

The proposed algorithm will compare the membership function with a predefined threshold after calculating its change. If the degree is larger than the threshold and the related concept-drift patterns will be immediately generated. The detailed algorithm is

described below.

3.2 The Proposed CDMF Mining Algorithm

In this part, the proposed approach that combines concept-drift, FCM algorithm is described as follows:

INPUT: D^t、D^t+k : databases; I : the number of item; S : concept-drift rules sets; M : the

number of linguistic term; α: linguistic term threshold; β : membership functions threshold and γ: support threshold;

OUTPUT: The concept-drift rule for fuzzy membership functions.

STEP 1: The two database generate fuzzy membership functions for each item via the following sub-steps.

(a) Set i = 1, where i is used to keep the identity number of the current item from database. (FCM refers to the related words).

(b) The center points of these N clusters are set as the center of fuzzy membership functions for these M linguistic terms.

(d) Set i = i + 1.

(e) i ≤ I, go to Step (a).

STEP 2: Find the concept-drift rules from the fuzzy membership functions of N items between D^t and D^t+k.

STEP 3: Set the initial concept-drift rules sets .

STEP 4: Set n = 1, where n is used to keep the identity number of the current item from database.

STEP 5: Calculate the degree of change for each linguistic term representative values and check the concept-drift rules from two databases D^t and D^t+k by below sub-steps.

(a) Set m = 1, where m is used to keep the identity number of the current linguistic term.

(b) Calculate the degree of change for each linguistic term representative values.

𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 representative for the n-th item and m-th linguistic term from databases.

𝑐𝑐𝑐𝑐𝐿𝐿𝑐𝑐 = |𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝐶𝐶_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

(∑^𝑁𝑁_𝑖𝑖=1𝐶𝐶_{𝑛𝑛𝑛𝑛}^𝑡𝑡 )/𝑁𝑁 (3-1) (c) Check the concept-drift rules. If then put the concept-drift rule

“The value of Cnm is changed from 𝐶𝐶𝑛𝑛𝑛𝑛𝑡𝑡 to 𝐶𝐶𝑛𝑛𝑛𝑛𝑡𝑡+𝑗𝑗 ” in S.

(d) Set m = m + 1.

STEP 6: Calculate the degree of change for each linguistic term range values and check the concept-drift rules from two databases D^t and D^t+k by below sub steps.

= S

≤cdLT α

(a) Set m = 1, where m is used to keep the identity number of the current linguistic term.

(b) Calculate the change of the linguistic term range for each linguistic term representative values. 𝐷𝐷_{𝑛𝑛𝑛𝑛}^𝑡𝑡 representative for the n-th item and m-th linguistic term from databases.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = |𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡+𝑗𝑗|

(𝐶𝐶𝑛𝑛𝑛𝑛𝑡𝑡 − 𝐶𝐶_𝑛𝑛1^𝑡𝑡 )/(𝑁𝑁 − 1) (3-2) (c) Check the concept-drift rules. If then put the concept-drift rule

“The linguistic term range value Dnm of is changed from 𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡 to 𝐷𝐷𝑛𝑛𝑛𝑛𝑡𝑡+𝑗𝑗 ” in S.

(d) Set m = m + 1.

STEP 7: Calculate the degree of change for each support values and check the concept-drift rules from two databases D^t and D^t+k by below sub steps.

(a) Set m = 1, where m is used to keep the identity number of the current linguistic term.

(b) Calculate the support change for each items. 𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 representative for the n-th item and m-n-th linguistic term from databases.

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 =|𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 − 𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗}|

𝑠𝑠𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 (3-3)

“The support value Supnm of is changed from 𝑐𝑐𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^𝑡𝑡 to 𝑐𝑐𝑐𝑐𝑐𝑐_{𝑛𝑛𝑛𝑛}^{𝑡𝑡+𝑗𝑗} ” in S.

(d) Set m = m + 1.

≤cdMF β

cdSup γ ≤

STEP 8: Set n = n + 1.

STEP 9: If the item set has not been processed as well as its items, then go to Step 5.

STEP 10: Output item sets S.

3.3 Experimental Results

In this part, the results of the experiments show the performance of the proposed concept-drift for fuzzy membership functions (CDMF) algorithm. We used single computer with 3rd generation Intel Core i5-3230M 2.60GHz processor with 4 cores, 4 threads and DDR3-1600Mhz 12 GB random-access memory. The operating system was Microsoft Windows 8.1 Pro, and the programming language was .NET Framework 4.5.1 C# (C# Version 5.0).

A simulation dataset containing 1559 items and 21,556 transactions was used in the experiments. In the data set, the number of purchased items in transactions was first randomly generated, and the purchased items and their quantities in each transaction were then generated. Here, we selected 21,566 transactions from the simulated dataset, and divided into two datasets as databases D^t and D^t+k, where each dataset had 10,733 transactions. The initial cluster size C was set at 3, the fuzziness index value m was set at 2, the linguistic term threshold varies from 1 to 0.1, the membership functions threshold varies from 1 to 0.1,and support threshold varies from 1 to 0.1 Firstly, Figure

3.4 , 3.6 and 3.8 show the proposed approach.

Experiments were first conducted on database to evaluate the numbers of linguistic term concept-drifts items with different thresholds.

Figure 3.4: The number of concept-drift item by the algorithms along with different linguistic term threshold in database

Figure 3.4 shows the proposed algorithm was performed with different pair of databases that were two databases at different locations, the databases of first half with second half of a year, the two databases of the random months, and the database of a random month with whole year. In the experimental results, we can find the influence for customer behavior for a different time that was bigger than a different location. We observed that the short-term databases may contain more special rules, so when we compared these databases, more concept-drifts can be found. In the contrast, since the

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

long-term databases tended to be stable, less concept-drifts will occur. As a result, we consider that the comparison between short-term databases as more preferable. Figure 3.5 showed the executive efficiency for proposed algorithm on database for different linguistic term thresholds varying from 1 to 0.1.

Figure 3.5: The execution efficiency of the four different time and location with different linguistic term threshold in database.

In Figure 3.5, it showed the executive efficiency of the different linguistic term thresholds in database. More concept-drift patterns will be generated when the threshold value increases. Thus, the proposed algorithm will spend more executive time if the process sets a higher value of threshold.

Experimental results were second conducted on database to evaluate the numbers of membership functions concept-drifts items with and different threshold. Figure 3.6

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

shows the effect of different threshold values that identify the number of rules.

Figure 3.6: The number of concept-drift item by the algorithms along with different membership functions threshold in database.

Experiment were made database to evaluate efficiency of the algorithms. Figure 3.7 showed the execution efficiency on database for different membership functions threshold varying from 1 to 0.1.

0 200 400 600 800 1000 1200 1400 1600

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Concept-drift Items

Thresholds

Membership Functions

Different Location

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

Figure 3.7: The execution efficiency of the four different time and location with different membership functions threshold in database.

Finally, experiment were conducted on database to evaluate the numbers of support concept-drifts items with and different threshold. Figure 3.8 shows the effect of different threshold values that identify the number of rules.

0 100 200 300 400 500 600

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Execution Time(sec.)

Thresholds

Membership Functions

Different Location

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

Figure 3.8: The number of concept-drift item by the algorithms along with different support threshold in database.

Experiment were at last made database to evaluate efficiency of the algorithms.

Figure 3.9 showed the execution efficiency on database for different support threshold varying from 1 to 0.1.

0 200 400 600 800 1000 1200 1400 1600 1800

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Concept-drift Items

Thresholds

Support

Different Location

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

Figure 3.9: The execution efficiency of the four different time and location with different support threshold in database.

There are some concept-drift patterns with higher threshold values. However, these patterns are mostly represented by different types of concept-drifts. The proposed algorithm should be set at a suitable threshold value to attain a reasonable number of patterns as well as making a good representation.

0 100 200 300 400 500 600 700 800

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Execution Time(sec.)

Thresholds

Support

Different Location

First Half with Second Half of A Year

Random Months

A Random Month with Whole Year

CHAPTER 4 CONCEPT DRIFT FOR FUZZY ASSOCIATION RULES

4.1 Definitions and Review Fuzzy Association Rules

In this chapter, we will present the fuzzy association rules concept-drift patterns mining (CDFAR).

4.1.1 Fuzzy Membership Functions by Fuzzy C-means

In this part, our previous method is introduced to generate the set of membership functions by fuzzy C-means. Each membership function is designed as a triangle and encoded as a pair (c, w). The peak of triangle is located at c and the distance between c and left acme is w.

Membership function plays a role in converting commodity items into something similar with human semantics. Figure 4.1 shows membership functions set. Figure 4.1 consists of three membership functions representing low, medium, and high for each different purchase amount.

3.0 7.0 11.0 1.0

Apple

Low Medium High

Membership value

Quantity

Figure 4.1: Membership functions set that apple.

We combine the data of the two different databases and generate fuzzy membership functions by this data. The fuzzy membership functions which is generated by Fuzzy C-means is fixed and the same in order to apply to the two databases.

4.1.2 Generating Fuzzy Association Rules by Fuzzy Apriori

In this part, we will generate fuzzy membership functions which are produced by the method in chapter 4.1.1 as the input data by fuzzy Apriori.

Table 4.1 is a transaction database. In the fuzzy association rules, the membership functions is applied to turn information into semantic words.

Table 4.1: An example of a transaction database.

ID Expanded Items 1 (A, 3)(C, 6)(E, 9) 2 (B, 4)(C, 7)(D, 10) 3 (B, 2)(C, 5)(E, 8) 4 (C, 1)(E, 14)

membership functions, we can get fuzzy values of different linguistic terms of each item. So the original transaction database can be converted into a database with fuzzy linguistic terms. An example is shown in Table 4.2.

Table 4.2: Table 4.1 after converting the fuzzy database

TID Fuzzy set

2 3 4

Next step, fuzzy frequent itemsets are generated by fuzzy linguistic terms, and then the fuzzy association rules are obtained by fuzzy apriori algorithm [50]. The processes of Apriori algorithm as following :

INPUT: n : quantitative database consisting of transaction; a set of membership functions; α : the minimum support threshold.

OUTPUT: The concept-drift rule for fuzzy membership functions.

STEP 1: Transform the quantitative value vij of each item Ij in the i-th transaction into a fuzzy set fij represented as (fij/Rj1+ fij/Rj2+…+ fijh/Rjh) using the given membership functions, where h is the number of fuzzy regions (linguistic terms) for Ij , Rjl is the l-th fuzzy region of Ij, 1 ≤ l ≤ h, and fijl is vij’s fuzzy membership value in region Rjl.

. )

STEP 2: Calculate the scalar cardinality of each fuzzy region (linguistic terms) Rjl in the transaction data as:

(4-1) STEP 3: Check whether the value countjl of the fuzzy region Rjl is larger than or equal

to the minimum count n × α. If the count of a fuzzy region R^jl is equal to or greater than the minimum count, put the fuzzy region in the set of frequent fuzzy regions (L1). That is

(4-2) STEP 4: If L1 is null, then exit the algorithm; otherwise, do the next step.

STEP 5: Set r = 1, where r is used to represent the number of items kept in the current large itemsets.

STEP 6: Generate the candidate set Cr+1 form Lr. Restated, the algorithm joins Lr and Lr under the condition that r+1 items in the two itemsets are the same and the other one is different. Store in Cr+1 the itemsets which have all their sub-r-itemsets in Lr.

STEP 7: Calculate the following sub-steps for each newly formed (r+1)-itemset s with items (s1, s2, …, sr+1) in Cr+1.

(A) For each transaction datum D, calculate its fuzzy value on s as , where is membership value of D is sj. If the

minimum operator is used for the intersection, then

(4-3) (B) Calculate the count of s in the transaction as :

(4-4) (C) If counts is larger than or equal to the minimum support value α, put s in L^r+1. STEP 8: If Lr+1 is null, then do the next step; otherwise, set r = r + 1 and repeat Steps

6 to 7.

在文檔中數量型資料庫中模糊概念轉移探勘之研究 (頁 25-0)