• 沒有找到結果。

An Example of PWA

CHAPTER 2 Review of Related Works

3.2 The Projection-based Weighted Frequent Itemset Mining Algorithm,

3.2.4 An Example of PWA

In this section, an example is given to illustrate how to find weighted frequent itemsets from a transaction database by the proposed algorithm. Assume there are five transactions in a transaction database, as shown in Table 3.1, and the eight items in the transactions are denoted as A to H, respectively. In addition, assume the individual weights of the weight items are set in Table 3.2. The minimum weighted support threshold λ is set at 30%.

According to the given data, the details of the proposed algorithm are as follow.

STEP 1: The transaction maximum weight tmw of each transaction in the transaction database TDB is first found. Take the first transaction Trans1 in Table 3.1 as an example. The transaction includes the four items, A, C, E and F, their weight values are 0.30, 0.45, 0.40, and 0.50, respectively. The maximum weight value among them is 0.50, and the value is viewed as the transaction maximum weight of the transaction Trans1. All the other transactions in Table 3.1 can be processed in the same way, and the results are shown in Table 3.3.

Table 3.3: Transaction maximum weights of the five transactions in the example.

TID Transactions tmwy

Trans1 {ACEF} 0.50

Trans2 {BH} 0.95

Trans3 {BCE} 0.60

Trans4 {ACDEGH} 0.95

Trans5 {ADEF} 0.50

STEP 2: In the example, because the transaction maximum weights of the five transactions in Table 3.3 are 0.50, 0.95, 0.60, 0.95, and 0.50, the total transaction maximum weight ttmw can be found as 0.50 + 0.95 + 0.60 + 0.95 + 0.50 = 3.50.

STEP 3: The transaction-weighted upper bound (twub) and weighted support (wsup) of each item in TDB are found simultaneously. Take item A in Table 3.3 as an example. Item A appears in three transactions, Trans1, Trans4, and Trans5, and the transaction maximum weights of the three transactions are 0.50, 0.95, and 0.50, respectively. In addition, the weight of item A in Table 3.2 is 0.30, and the total transaction maximum weight ttmw of Table 3.3 is 3.50. Therefore, the transaction-weighted upper bound twubA of item A can be calculated as (0.50 + 0.95 + 0.50) / 3.50, which is 55.71%, and its weighted support wsupA can be calculated as (0.30 + 0.30 + 0.30) / 3.50, which is 25.71%. The same process can be done for all the other items in TDB. The results for the transaction-weighted upper bounds and the weighted supports of all 1-items in TDB are shown in Table 3.4.

Table 3.4: The transaction-weighted upper bounds and weighted supports of all candidate frequent 1-itemsets (WF1) can be found simultaneously from Table 3.4. Take 1-itemset {D} in Table 3.4 as an example. The transaction-weighted upper bound and the weighted support values of 1-itemset {D} in Table 3.4 are 47.14% and 11.42%, respectively. Since the transaction-weighted upper-bound twub{D} is larger than or equal to the minimum weighted support threshold 30%, 1-itemset {D} is a weighted frequent upper-bound 1-itemset, but not a weighted frequent 1-itemset. All the other 1-itemsets in Table 3.4 can be processed in the same way. After this step is finished, the set of the weighted frequent upper-bound 1-itemsets (WFUB1) contains the following six itemsets, {A}, {B}, {C}, {D}, {E}, and {H}, and the set of weighted frequent 1-itemsets (WF1) contains, {B}, {C}, {E}, and {H}. The results for the set of the weighted frequent upper-bound 1-itemsets (WFUB1) and the set of weighted frequent 1-itemsets (WF1) are shown in Table 3.5 and Table 3.6, respectively.

Table 3.5: The set of the weighted frequent upper-bound 1-itemsets in the example.

Table 3.6: The set of the weighted frequent 1-itemsets in the example.

Itemset wsup

{B} 34.28%

{C} 38.57%

{E} 45.71%

{H} 54.28%

STEP 5: The variable r is set to 1 initially, where r represents the number of items in the processed itemsets.

STEP 6: In the example, the set of weighted frequent upper-bound 1-itemsets contains {A}, {B}, {C}, {D}, {E}, and {H}. The six items A, B, C, D, E, and H are collected from the set of WFUB1, and then they are denoted as PI1 (Possible Items).

STEP 7: For each transaction in Table 3.3, the items not appearing in the set of PI1 are removed from the transaction. Take the last transaction Trans5 in Table 3.3 as an example.

The last transaction contains four items, A, D, E, and F, and the set of PI1 contains six items A, B, C, D, E, and H. Because the last item F in Trans5 is not appearing in the set of PI1, the

item F is removed from Trans5, and the transaction is modified to {ADE}. In addition, the transaction maximum weight of the modification transaction Trans5 is still the original value of 0.50. Next, since the number of items (= 3) kept in the modified transaction is larger than or equal to the value of two, the modified transaction can be kept in the Table 3.3. All the other four transactions in Table 3.3 can be processed in the same way. The results for the modified new transactions and transaction maximum weight values of all the modified transactions are shown in Table 3.7.

Table 3.7: All the modified transactions in this example.

Transactions tmwy

{ACE} 0.50

{BH} 0.95

{BCE} 0.60

{ACDEH} 0.95

{ADE} 0.50

STEP 8: Each 1-itemset in the set of WFUB1 is processed in alphabetical order with 1-itemset {A} being processed first. In this example, the projected transactions tdb{A} in which item A is appearing for the prefix item A in Table 3.7 include three transactions, Trans1:{ACE}, Trans4:{ACDEH}, and Trans5:{ADE}, respectively. In addition, the transaction maximum weight values of the three transactions are 0.50, 0.95, and 0.50.

According to the information above, the weighted frequent itemsets with prefix itemset {A}

are then found by using the Finding-WF(X, tdbX, r) procedure with the parameters X = {A}, tdbX = tdb{A}, and r = 1, and the procedure is stated below.

PSTEP 1: The temporary itemset table, TI{A}, is initialized as an empty table, in which each tuple consists of three fields: itemset, transaction-weighted upper bound (twub) of the itemset, and actual weighted support (wsup) of the itemset.

PSTEP 2: For each transaction in the projected transactions tdb{A} of {A}, all possible 2-itemsets with the prefix {A} are generated. Take the first projected transaction {ACE} in

tdb{A} as an example. Since there are two items located after prefix itemset {A} in the transaction, the two 2-itemsets, {AC} and {AE}, are generated from the transaction {ACE}, and the weight values of {AC} and {AE} in the transaction are 0.375 and 0.35, respectively.

Then, the two 2-itemsets are then put in the TI{A} table, and the transaction maximum weight value (= 0.50) of the transaction {ACE} and the weight values of two 2-itemsets with the prefix {A} are also added in the suitable field values of the two 2-itemsets in the TI{A} table.

The other two projected transactions in tdb{A} can be processed in the same way. The results for the transaction-weighted upper bounds and weighted supports of all the possible 2-itemsets with the prefix {A} are shown in Table 3.8.

Table 3.8: The transaction-weighted upper bounds and the actual weighted supports of all the 2-itemsets with the prefix {A} in this example.

Itemset twub wsup

{AC} 41.42% 21.42%

{AD} 41.42% 14.28%

{AE} 55.71% 30.0%

{AH} 27.14% 17.85%

PSTEP 3: All weighted frequent upper-bound 2-itemsets in Table 3.8 (WFUB2, {A}) and weighted frequent 2-itemsets (WF{A}) in Table 3.8 can be found simultaneously. The process is the same as that mentioned in the STEP 4. In this example, the three 2-itemsets, {AC}, {AD}

and {AE}, are put in the set of WFUB2,{A}, since their transaction-weighted upper bounds satisfy the minimum weighted support threshold (= 30%). However, only the itemset {AE}

can be put in the set of WF{A}.

PSTEP 4: In the example, the four items A, C, D and E are collected from the set of the weighted frequent upper-bound 2-itemsets with prefix itemset {A}, and they are then denoted as PI2,{A}.

PSTEP 5: The value of the variable r is updated as 2.

PSTEP 6: For each projected transaction in tdb{A}, the items not appearing in PI2,{A} in the projected transaction are removed. The process in STEP 7 can be similarly done for this step. After the step, the results of all the modified transactions with the original transaction maximum weight values in tdb{A} are shown in Table 3.9.

Table 3.9: The modified transactions in tdb{A} and their transaction maximum weight values.

TID Transactions tmwy

Trans1 {ACE} 0.50

Trans2 {ACDE} 0.95

Trans3 {ADE} 0.50

PSTEP 7: Each itemset in the set of WFUB2,{A} is processed in alphabetical order. The prefix itemset to be processed is thus {AC}, and the projected transactions, {ACE} and {ACDE}, in Table 3.9, are put in the set of the projected transactions tdb{AC} of prefix itemset {AC}. The other itemsets in WFUB1 can be recursively processed in the same way until all the 1-itemsets in the set of WFUB1 have been done. The results for all the weighted frequent itemsets are shown in Table 3.10.

Table 3.10: The final set of WF in the example.

Itemsets wsup

{B} 34.28%

{E} 38.57%

{H} 45.71%

{AE} 30.0%

{CE} 36.42%

STEP 9: In this example, the four weighted frequent itemsets in Table 3.10 are output to users.

3.3 Projection-based Weighted Mining Algorithm with Improved Strategies, PWAI

In this section, a projection-based weighted mining algorithm with effective strategies, tightening and filtering, are proposed to improve the efficiency of the PWA. The tightening strategy is then described in 3.3.1 and filtering strategy is then described in 3.3.2.

相關文件