An Example of the CMFFP-mine Algorithm - Compressed Multiple Fuzzy FP-tree Algorithm

CHAPTER 4 Compressed Multiple Fuzzy FP-tree Algorithm

4.6 An Example of the CMFFP-mine Algorithm

For the built CMFFP tree in Figure 4.4, the proposed CMFFP-mine algorithm is

then processed to find the fuzzy frequent itemsets as follows:

STEP 1: The fuzzy regions in the Header_Table are processed one by one from

bottom to top. In this example, the fuzzy regions are processed in the order of {B.Low}, {C.High}, {A.Middle}, and {C.Middle}. Take fuzzy region {B.Low} as an example to illustrate the following steps.

STEP 2: The nodes with the currently processed fuzzy region {B.Low} in the

CMFFP tree are found. In this example, there are two nodes in the CMFFP tree that contain the fuzzy region {B.Low}; they are shown in Figure 4.5.

2.0

Header_Table

Fuzzy regions Count

C.Middle 2.4 A.Middle 3.4

C.High 2.8

B.Low 2.2

root

C.Middle 2.4

A.Middle 3.4

C.High 2.2

C.High 0.6

1.8

B.Low

1.6 1.2 1.4 1.2

B.Low

0.6 0.4 0.6

Figure 4.5. Tree nodes associated with {B.Low}

STEP 3: The fuzzy itemsets and their membership values are extracted from the

array stored in each extracted node of {B.Low}. For the first node {B.Low} (left side), the associated fuzzy itemsets and memberships values are extracted from the attached array, which are {C.Middle, B.Low: 1.2}, {A.Middle, B.Low: 1.4}, and {C.Middle,

A.Middle, B.Low: 1.2}. For the second node {B.Low} (right side), the associated

fuzzy itemsets and memberships values are extracted from the attached array; they are {C.Middle, B.Low: 0.4} and {C.High, B.Low: 0.6}.

STEP 4: The membership values of the same fuzzy itemsets are summed. In this

example, the associated fuzzy itemsets with {B.Low} are {C.Middle, B.Low: 1.6}, {A.Middle, B.Low: 1.4}, {C.High, B.Low: 0.6}, and {C.Middle, A.Middle, B.Low:

1.2}.

STEP 5: The above steps are repeated for the other fuzzy regions until all regions

in Header_Table are processed. The results are then sorted in the set of C. The results are shown in Table 4.6.

Table 4.6. All derived fuzzy itemsets from the CMFFP tree 1-itemsets Count 2-itemsets and 3-itemsets Count

{A.Middle} 3.4 {A.Middle, C.Middle} 2.0

{B.Low} 2.2 {A.Middle, C.High} 1.8

{C.High} 2.8 {B.Low, C.Middle} 1.6

{C.Middle} 3.4 { A.Middle, B.Low} 1.4

{B.Low, C.High} 0.6

{A.Middle, B.Low, C.Middle} 1.2

STEP 6: All fuzzy itemsets in Table 4.6 are then checked against the predefined

minimum count 1.2. The fuzzy itemsets that satisfy this condition are then output as the fuzzy frequent itemsets. The results are shown in Table 4.7.

Table 4.7. All derived fuzzy frequent itemsets from the CMFFP tree 1-itemsets Count 2-itemsets and 3-itemsets Count

{A.Middle} 3.4 {A.Middle, C.Middle} 2.0

{B.Low} 2.2 {A.Middle, C.High} 1.8

{C.High} 2.8 {B.Low, C.Middle} 1.6

{C.Middle} 3.4 { A.Middle, B.Low} 1.4

{A.Middle, B.Low, C.Middle} 1.2

CHAPTER 5 Upper-Bound Multiple Fuzzy FP-tree Algorithm

For avoiding the overhead of the attached arrays in the CMFFP tree, we proposed an upper-bound multiple fuzzy FP-tree (abbreviated as UBMFFP-tree) algorithm to efficiently derive fuzzy frequent itemsets from quantitative transactions.

It adopts a two-phase approach to derive fuzzy frequent itemsets from the UBMFFP tree. The notation used in the proposed algorithm is first stated as follows.

5.1 Notation

D the original quantitative database;

n the number of transactions in D;

T the i-th transaction in D, 1in; m the number of items in D;

Ij the j-th item, ¹^ ^j^^m;

hj the number of fuzzy regions for Ij; Rjl the l-th fuzzy region of I_j, 1lh_j; v_ij the quantitative value of I_j in T;

fijl the membership value of vij in region Rjl; countjl the count of the fuzzy region Rjl in D;

s the predefined minimum support threshold;

o(Rjl) the occurrence frequency of the fuzzy region Rjl;

C the set of derived candidate fuzzy itemsets from the UBMFFP tree.

5.2 The UBMFFP-tree Construction Algorithm

INPUT: A quantitative database consisting of n transactions, a set of membership

functions, and a predefined minimum support threshold s.

OUTPUT: An upper-bound multiple fuzzy FP tree (UBMFFP tree).

STEP 1: Transform quantitative value v_ij of each item I_j in the i-th transaction into a fuzzy set fij represented as (fij1/Rj1 + fij2/Rj2 + …+ fijh/Rjh) using the given membership functions, where h is the number of fuzzy regions for Ij, Rjl is the l-th fuzzy region of Ij, 1lh, and fijl is vij’s fuzzy membership value in region Rjl. Note that fijl/Rjl means that the membership value of region Rjl

is fijl.

STEP 2: Calculate the scalar cardinality of each fuzzy region R_jl in the transaction data as:





 ⁿ

i ijl

jl f

count

STEP 3: Check whether the value count_jl of the fuzzy region R_jl is larger than or equal to the predefined minimum count n*s. If the count of a fuzzy region R_jl is equal to or greater than the minimum count, it then be treated as a fuzzy frequent itemset and put it in the set of L₁. That is:

L₁ = {R_jl| count_jln*s, 1jm}.

STEP 4: Calculate the occurrence frequency o(R_jl) of each fuzzy region in L₁.

STEP 5: Build the Header_Table by sorting the fuzzy regions (fuzzy frequent

itemsets) in L1 in descending order of their occurrence frequencies.

STEP 6: Remove the fuzzy regions of the items not existing in L₁ from the transactions of the transformed database. Sort the remaining fuzzy regions in descending order of their occurrence frequencies in each transaction.

STEP 7: Initially set the root node of the UBMFFP tree as {root}.

STEP 8: Insert the transactions of the transformed database into the UBMFFP tree

tuple by tuple. The following two cases may exist.

Substep 8-1: If a fuzzy region R_jl in a transaction is at the corresponding branch of the UBMFFP tree, add the fuzzy value fijl of Rjl in the processed transaction to the node of Rjl in the branch.

Substep 8-2: Otherwise, add a node of R_jl at the end of the corresponding branch, set the count of the node as the fuzzy value f_ijl of R_jl, and connect the node of R_jl in the last branch with the current node as a sequence. If there is no such branch with the node of R_jl, insert a node-link from the entry of R_jl in the Header_Table to the added node.

In STEP 8, a corresponding branch is the branch built in the UBMFFP tree

according to the sorted fuzzy regions in the transformed transaction. After STEP 8, the final UBMFFP tree is thus built.

5.3 An Example of the UBMFFP-tree Construction Algorithm

Below, an example is given to illustrate how to construct an UBMFFP tree from quantitative transaction data, which is shown in Table 5.1. It consists of 8 transactions and 5 items, denoted A to E. The minimum support threshold s is initially set to 30%.

Table 5.1. Eight transactions with purchased items and its quantitative values

TID Items

1 (A:2), (C:8), (E:10) 2 (A:2), (C:5), (E:10) 3 (B:4), (C:8), (D:5)

4 (B:9), (D:4)

5 (A:3), (B:5), (C:5), (D:9) 6 (A:2), (B:7), (C:11) 7 (B:5), (C:3), (D:3), (E:9) 8 (A:3), (B:7), (C:9), (E:9)

Assume that the fuzzy membership functions are the same for all items shown in Figure 5.1. In this example, amounts are represented by three fuzzy regions: {Low}, {Middle}, and {High}. Thus, three fuzzy membership values are produced for each item in a transaction according to the predefined membership functions in Figure 5.1.

Note that the proposed approach also works when the membership functions of the amounts for the items are not the same.

0 1 6 11 Amount

Membership value

1 Low Middle High

Figure 5.1. Membership functions used in this example

The UBMFFP tree for this example is thus constructed using the proposed approach as follows.

STEP 1: The quantitative values of the items in the transactions are represented

as fuzzy sets using the membership functions shown in Figure 5.1. Take item {A} in transaction 1 as an example to illustrate the procedure. The amount “2” of {A} can be

converted into the fuzzy set (

Middle A

Low

A .

2 . , 0 .

8 .

0 ) by the membership functions in

Figure 5.1. This step is repeated for the other items in Table 5.1, and the results are shown in Table 5.2.

Table 5.2. Fuzzy sets transformed from Table 5.1

STEP 2: The scalar cardinality of each fuzzy region in transactions is calculated

as the count value. Take the fuzzy region {A.Low} as an example to explain the procedure. {A.Low} appears in transactions 1, 2, 5, 6, and 8, and its scalar cardinality is calculated as (0.8 + 0.8 + 0.0 + 0.0 + 0.6 + 0.8 + 0.0 + 0.6) (= 3.6). This step is repeated for the other regions; the results are shown in Table 5.3.

Table 5.3. Counts of fuzzy regions

Item Count Item Count Item Count

A.Low 3.6 C.Low 1.0 E.Low 0.0

A.Middle 1.4 C.Middle 3.6 E.Middle 1.2

A.High 0.0 C.High 2.4 E.High 2.8

B.Low 0.8 D.Low 1.2

B.Middle 4.2 D.Middle 2.2

B.High 1.0 D.High 0.6

STEP 3: The fuzzy regions in Table 5.3 are then checked against the predefined

minimum count, which is calculated as (8 * 0.3) (= 2.4). For example, the counts for {A.Low}, {A.Middle}, and {A.High} are 3.6, 1.4, and 0.0, respectively. Since the count for {A.Low} is larger than the minimum count, {A.Low} is then kept for the subsequent mining process. The satisfied fuzzy regions are considered as fuzzy frequent itemsets and kept them in the set of L1 for later building the UBMFFP tree.

Thus, L1 = {A.Low:3.6, B.Middle:4.2, C.Middle:3.6, C.High:2.4, E.High:2.8}.

STEPs 4 & 5: The occurrence frequency of each fuzzy region in L₁ is also calculated while executing the above step. For example, fuzzy region {A.Low}

appears in transactions 1, 2, 5, 6, and 8. Its occurrence frequency is thus 5. The results are shown in Table 5.4.

Table 5.4. The fuzzy regions, its counts and its occurrence frequencies Item Count Occurrence frequency

A.Low 3.6 5

B.Middle 4.2 6

C.Middle 3.6 6

C.High 2.4 4

E.High 2.8 4

The fuzzy regions in L1 are then sorted in descending order according to their

Figure 5.2. The built Header_Table

STEP 6: The fuzzy regions not existing in L₁ are then removed from each transaction in Table 5.2. The remaining fuzzy regions at each transaction are then sorted according to their occurrence frequencies. The updated transactions of the sorted results are shown in Table 5.5.

Table 5.5. The updated transactions for constructing the UBMFFP tree

TID Fuzzy regions

1 CMiddle A Low CHigh E.High

4 B.Middle

STEP 7: The root of the UBMFFP tree is initially set as {root}.

STEP 8: The updated transactions in Table 5.5 are used to construct the

UBMFFP tree tuple by tuple from the first transaction to the last one. Each node consists of the fuzzy frequent 1-itemset and its membership value within it. Take the

first transaction as an example to illustrate the construction process. In this example, the first transaction (

new nodes, respectively with items (C.Middle: 0.6), (A.Low: 0.8), (C.High: 0.4), and (E.High: 0.8), are then created and linked consequentially. The results after the first transaction is processed are shown in Figure 5.3.

Header_Table

Figure 5.3. The UBMFFP tree after the first transaction has been processed

The second updated transaction (

High tree for the second transaction. The value of node (C.Middle) in the path is (0.6 + 0.8) (= 1.4) and the value of node (A.Low) in the path is (0.8 + 0.8) (= 1.6). A new node is created for item (E.High: 0.8) and linked to node (A.Low: 1.6). The second updated transaction is inserted into the UBMFFP tree; the results are shown in Figure 5.4.

Header_Table

Figure 5.4. The UBMFFP tree after the second transaction has been processed

The same process is then executed for the other transactions. The results of Header_Table and the built UBMFFP tree are shown in Figure 5.5.

Header_Table

Figure 5.5. The finally constructed UBMFFP tree

After STEP 8, a complete UBMFFP tree has been constructed. The proposed UBMFFP-growth algorithm is then proposed to derive the upper-bound fuzzy counts of all the fuzzy frequent itemsets, which is described in the next section.

5.4 The UBMFFP-growth Algorithm

The proposed UBMFFP-growth algorithm processes the fuzzy frequent 1-itemsets in the Header_Table one by one from bottom to top for generating the candidate fuzzy itemsets in two phases. In the first phase, the approach first finds the upper-bound fuzzy counts of itemsets using the minimum operation. Based on downward-closure property, it can reduce the search space by pruning unpromising itemsets early. Note that the transformed fuzzy frequent regions from a given item cannot form the fuzzy frequent itemsets because they would be meaningless.

In the second phase, the transformed database is then re-scanned to find the actual values of the remaining candidate fuzzy itemset. Although a rescan of the database is required, the cost is greatly reduced. The details of the UBMFFP-growth algorithm are described as follows.

INPUT: The built UBMFFP tree, its corresponding Header_Table, the transformed

database, and the pre-calculated minimum count n*s.

OUTPUT: The fuzzy frequent itemsets.

STEP 1: Process the fuzzy regions in the Header_Table one by one from bottom to

top using the following steps. The currently processed fuzzy region is set as Rjl.

STEP 2: Find all the nodes with the fuzzy region R_jl in the UBMFFP tree through the sequenced connection between nodes.

STEP 3: Trace the prefix paths of the currently processed fuzzy region R_jl in the UBMFFP tree. Merge the extracted paths to recursively form the conditional UBMFFP tree for generating fuzzy itemsets with the currently processed fuzzy region Rjl. The minimum operation is thus used to get the upper-bound fuzzy counts of the derived fuzzy itemsets. Note that any of fuzzy regions associated with the same I_j of the currently processed region R_jl cannot be formed as fuzzy itemsets due to its meaningless.

STEP 4: If the upper-bound fuzzy count of the derived fuzzy itemset is larger than or

equal to the pre-calculated minimum count (n*s), add it to the candidate set C.

STEP 5: Repeat STEPs 2 to 4 for the other fuzzy regions until all regions in the

Header_Table have been processed.

STEP 6: Rescan the transformed database to get the actual fuzzy counts of the fuzzy

candidate itemsets in C and check them against the predefined minimum count.

STEP 7: Output the candidate fuzzy itemsets with their fuzzy counts equal to or

larger than the predefined minimum count as the fuzzy frequent itemsets.

After STEP 7, the desired fuzzy frequent itemsets are then derived from the built UBMFFP tree. In the next subsection, an example is given to illustrate how the proposed UBMFFP-growth algorithm generates fuzzy frequent itemsets from the UBMFFP tree.

5.5 An Example of the UBMFFP-growth Algorithm

For the built UBMFFP tree in Figure 5.5, the proposed UBMFFP-growth algorithm is then processed to find the fuzzy frequent itemsets as follows.

STEP 1: The fuzzy regions in Header_Table are processed one by one and

bottom-up. In this example, the fuzzy regions are processed in the order of {E.High}, {C.High}, {A.Low}, {C.Middle} and {B.Middle}. The fuzzy region {E.High} is first processed by the following steps.

STEP 2: The nodes with the currently processed fuzzy region {E.High} in the

UBMFFP tree are then found through node-link of sequenced connection between nodes. In this example, there are four nodes in the UBMFFP tree containing the fuzzy region {E.High}.

STEPs 3 & 4: The prefix paths of the currently processed node {E.High} are

then found for recursively generating fuzzy frequent itemsets. The currently processed nodes of {E.High} are marked in red color, and the prefix paths are marked in blue color, respectively in Figure 5.6.

Header_Table

Figure 5.6. The processed nodes {E.High} with its prefix paths

In this example, four prefix paths are then extracted from the UBMFFP tree and set their fuzzy values the same as the processed nodes of {E.High} in the path. Thus,

four extracted paths are {C.Middle: 0.8, A.Low: 0.8, C.High: 0.8}, {C.Middle: 0.8, A.Low: 0.8}, {B.Middle: 0.6, C.Middle: 0.6} and {B.Middle: 0.6, C.Middle: 0.6,

A.Low: 0.6, C.High: 0.6}. The above paths are then merged together to form the

conditional UBMFFP tree of {E.High}. The results are then shown in Figure 5.7.

C.Middle 2.8 E.High

2.8

Figure 5.7. The conditional UBMFFP-tree of {E.High}

In Figure 5.7, the fuzzy frequent itemsets of {C.Middle} with {E.High} can be generated as (C.Middle, E.High): 2.8} since the minimum operation is used to find the merged fuzzy value of two nodes. Since there is no any nodes existing in the conditional UBMFFP tree of {C.Middle, E.High}, the recursive process for {E.High}

is then terminated. After this step, the fuzzy 2-itemset {(C.Middle, E.High): 2.8} is thus considered as a candidate fuzzy itemset and put into the set of C.

STEP 5: The same process is then repeated for the other fuzzy regions until all

regions in the Header_Table are processed. After STEP 5, the final candidate fuzzy itemsets with their upper-bound fuzzy counts in C are shown in Table 5.6.

Table 5.6. The final set of candidate fuzzy itemsets Candidate fuzzy 2-itemsets

Itemset Upper-bound fuzzy count

(A.Low, C.Middle) 2.6

(C.Middle, E.High) 2.8

STEP 6: The transformed database in Table 5.5 is then re-scanned to find the

actual fuzzy counts of the derived candidate fuzzy itemsets in Table 5.6. The actual fuzzy counts of (A.Low, C.Middle) and (C.Middle, E.High) are 2.4 and 2.2, respectively.

STEP 7: Since the fuzzy value of the candidate fuzzy itemset (A.Low, C.Middle)

is equal to the predefined minimum count, it is then output as the fuzzy frequent itemset. The final derived fuzzy frequent itemsets are shown in Table 5.7.

Table 5.7. All derived fuzzy frequent itemsets from the UBMFFP tree

1-itemset Count

{A.Middle} 3.6

{B.Low} 4.2

{C.Middle} 3.6

{C.High} 2.4

{D.Low} 2.8

2-itemset Count

{A.Low, C.Middle} 2.4

CHAPTER 6 Multiple Fuzzy FP-tree Merging Algorithm

Data mining is usually used to find the relationships between the purchased items for indicating the purchasing habits of customers. That is, it is a powerful tool for making the efficient and correct decisions for the manager in the company. The proposed algorithms from Chapter 3 to 5 are used for processing the whole database to find the desired information. In real-world applications, however, a parent company may own multiple branches, and each branch has its own locally database. The manager in parent company needs to make the decision for the entirely company from the collected databases in different branches. Thus, it is important to efficiently integrate many different databases forming a useful decision.

In this chapter, we propose a MFFP-tree merging algorithm for integrating different databases into one, forming an integrated MFFP (iMFFP) tree. The iMFFP tree inherits the property of MFFP tree for handing quantitative databases in fuzzy data mining. Each branch of a company thus has its specified MFFP tree for making its own decision. The parent company then integrates those individual MFFP trees for making the global decision for the company.

6.1 Notation

N the number of quantitative database;

DBk the quantitative k database, 1kN; n the number of transactions in D;

T the i-th transaction in D,1in; m the number of items in D;

I_j the j-th item, ¹^ ^j^^m; h_j the number of fuzzy regions for I_j Rjl the l-th fuzzy region of Ij, 1lh_j;

t the level of the processed fuzzy region Rjl, which is then increased from bottom to top;

vij the quantitative value of Ij in T;

f_ijl the membership value of v_ij in region R_jl; countjl the count of the fuzzy region Rjl in D;

s the predefined minimum support threshold.

6.2 The Multiple Fuzzy FP-tree Merging Algorithm

In this section, details of the proposed MFFP-tree merging algorithm are described below. Note that it is unnecessary to create the Header_Table for each sub-MFFP tree since the sub-MFFP trees will be integrated to form an integrated MFFP (iMFFP) tree. Only the iMFFP tree requires the Header_Table as an index to mine fuzzy frequent itemsets for decision making.

INPUT: Multiple quantitative databases DB_k, each of them consisted of n transactions, a set of membership functions, and a predefined minimum support threshold s.

OUTPUT: An integrated MFFP (iMFFP) tree.

STEP 1: Transform quantitative value v_ij of each item I_j in the i-th transaction of each database DBk into a fuzzy set fij represented as (fij1/Rj1 + fij2/Rj2 + …+ fijh/Rjh) using the given membership functions, where h is the number of fuzzy regions for Ij, Rjl is the l-th fuzzy region of Ij, 1lh, and fijl is vij’s fuzzy membership value in region Rjl. Note that fijl/Rjl means that the membership value of region Rjl is fijl.

STEP 2: Calculate the scalar cardinality count_jl of each fuzzy region R_jl in the transactions for all DB_k as:



 



k n

i ijl

jl f

count

STEP 3: Check whether the value count_jl of the fuzzy region R_jl is larger than or equal to the predefined minimum count n*s. If the count of a fuzzy region R_jl is equal to or greater than minimum count, it can be treated as a fuzzy frequent itemset and put it in the set of L₁. That is:

L₁ = {R_jl | count_jln*s, 1jm}.

STEP 4: Build the sub-MFFP tree of each DB_k (1k N), which only keeps the

在文檔中利用樹狀結構探勘完整語意項目集 (頁 62-0)