CHAPTER 4 Incremental Utility Mining Algorithm for Record Insertion
4.2 The Proposed Incremental Utility Mining Algorithm for Record Insertion
INPUT: The profit values of the items, the minimum average-utility ratio , an original database D with its minimum average-utility threshold αD (= total utility*), high upper-bound average-utility itemsets (HUD) and high average-utility itemsets (HD), and a set of new transactions N = {T1, T2, … , Tn}.
OUTPUT: A set of high average-utility itemsets (HU) for the updated database U (= D∪ N).
STEP 1: Calculate the minimum average-utility thresholds (αN and αU) respectively for the
new transactions N and for the updated database U as follows:
and ( ),
where αDis the minimum average-utility threshold for the original database, d is the number of transactions in the original database, and n is the number of new transactions.
STEP 2: Calculate the utility value ujkof each item Ijin each new transaction Tkas ujk= qjk* pj, where qjkis the quantity of Ijin Tk, pjis the profit value of Ij, j = 1 to m and k = 1 to n.
STEP 3: Find the maximal item-utility value mukin each new transaction Tkas muk= max{u1k, u2k, … , umk}, k = 1 to n.
STEP 4: Set k = 1, where k records the number of items in the itemsets currently being processed.
STEP 5: Generate the candidate k-itemsets and calculate their average-utility upper bounds from the new transactions. The average-utility upper bound ubs of each candidate k-itemset s is set as the summation of the maximal item-utilities of the transactions
which include s. That is:
STEP 6: Check whether the average-utility upper bound of each candidate k-itemset s from
the new transactions is larger than or equal to N. If s satisfies the above condition,
put it in the set of high upper-bound average-utility k-itemsets for the new transactions, HUkN.
STEP 7: For each k-itemset s in the set of high upper-bound average-utility itemsets (HUkD)
from the original database, if it appears in the set of high upper-bound average-utility k-itemsets (HUkN) in the new transactions, do the following substeps.
Substep 7-1: Set the newly updated average-utility upper bounds of itemset s as:
ubU(s) = ubD(s) + ubN(s).
Substep 7-2: Put s in the set of updated high upper-bound average-utility k-itemsets,
U
HUk .
STEP 8: For each k-itemset s in the set of high upper-bound average-utility itemsets (HUkD)
from the original database, if it does not appear in the set of high upper-bound average-utility k-itemsets (HUkN) in the new transactions, do the following substeps.
Substep 8-1: Set the updated average-utility upper bound of itemset s as : ubU(s) = ubD(s) + ubN(s).
Substep 8-2: Check whether the average-utility upper bound of itemset s is larger than or
equal toU. If it satisfies the above condition, put it in the set of updated high upper-bound average-utility k-itemsets, HUkU.
STEP 9: For each k-itemset s in the set of high upper-bound average-utility itemsets (HUkN)
in the new transactions, if it does not appear in the set of high upper-bound
average-utility k-itemsets (HUkD) in the original database, do the following substeps.
Substep 9-1: Rescan the original database to determine the average-utility upper bound (ubD(s)) of itemset s.
Substep 9-2: Set the updated average-utility upper bound of itemset s as:
ubU(s) = ubD(s) + ubN(s).
Substep 9-3: Check whether the average-utility upper bound of itemset s is larger than or
equal toU. If it satisfies the above condition, put it in the set of updated high upper-bound average-utility k-itemsets, HUkU.
STEP 10: Generate the candidate (k+1)-itemsets from the set of high upper-bound average-utility k-itemsets (HUkN) in the new transactions; If any k-sub-itemsets of a
candidate (k+1)-itemsets is not contained in the set of updated high upper-bound average-utility k-itemsets (HUUk ), remove it from the candidate set.
STEP 11: Set k = k+1.
STEP 12: Repeat STEPs 5 to 11 until no new candidate itemsets are generated.
STEP 13: For each high upper-bound average-utility itemset s in HUUof the updated database, if it appears in the set of high upper-bound average-utility itemsets (HUD) of the original database, do the following substeps.
Substep 13-1: Calculate the actual average-utility value of each itemset s for the new transactions as:
( ) | |
where ujk is the utility value of each item Ij in transaction Tkand |s| is the number of items in s.
Substep 13-2: Set the new actual average-utility value of s in the updated database as:
auU(s) = auD(s) + auN(s).
Substep 13-3: Check whether the actual average-utility value of itemset s is larger than or equal to αU. If it satisfies the above condition, put it in the set of updated high average-utility itemsets, HU.
STEP 14: For each high upper-bound average-utility itemset s in HUUof the updated database, if it does not appears in the set of high upper-bound average-utility itemsets (HUD) of the original database, do the following substeps.
Substep 14-1: Calculate the actual average-utility value of each itemset s for the new transactions as:
where ujk is the utility value of each item Ij in transaction Tkand |s| is the number of items in s.
Substep 14-2: Rescan the original database to determine the actual average-utility value auD(s) in HUD.
Substep 14-3: Set the new actual average-utility value of s in the updated database as:
auU(s) = auD(s) + auN(s).
Substep 14-4: Check whether the actual average-utility value of itemset s is larger than or equal to αU. If it satisfies the above condition, put it in the set of updated high average-utility itemsets, HU.
After Step 14, the final updated high average-utility itemsets for the updated database can then be found.
4.3 An Example
In this section, an example is given to demonstrate the proposed incremental average-utility mining algorithm for record insertion. This is a simple example to show how the proposed algorithm can be easily used to efficiently find out high average-utility itemsets from incrementally coming transaction data without less rescans of original databases.
Assume the original database includes 10 transactions, shown in Table 4-1. Each transaction consists of its transaction identification (TID) and items purchased. The numbers represents the quantities purchased.
Table 4-1: The set of ten transaction data in the original database.
Also assume that the profit value of each item is defined in Table 4-2.
Table 4-2: The predefined profit values of the items.
Item Profit
Suppose the minimum average-utility ratio is set at 20%. Thus, the minimum average-utility threshold is calculated as the total utility value multiplied by 20%, which is 45.4. Using the batch mining algorithm for the original database, the set of high upper-bound average-utility itemsets generated in Phase 1 are shown in Table 4-3. The average-utility
upper bound and the actual average-utility value of each high upper-bound average-utility itemset are also recorded in Table 4-3.
Table 4-3: The average-utility upper bounds and the actual average-utility values of the high upper-bound average-utility itemsets from the original
database.
Itemset Average-Utility
Upper Bound Average-Utility
A 67 36
B 88 80
C 72 11
D 105 60
E 70 40
AD 67 33
BC 50 29.5
BD 68 51
BE 60 45
CD 51 15.5
DE 50 29.5
Assume the three new transactions shown in Table 4-4 are inserted after the initial data set is processed. The proposed incremental average-utility mining algorithm proceeds as follows.
Table 4-4: The three newly inserted transactions.
TID A B C D E
t11 1 1 1 2 0
t12 0 3 4 1 0
t13 2 0 2 0 1
STEP 1: The minimum average-utility thresholds (αN and αU) respectively for the newly inserted transactions N and for the updated database U are calculated. In this example, there are 3 newly inserted transactions and thus 13 (10+3) transactions in the updated database. According to the formulas, αN and αU are calculated as
follows:
STEP 2: The utility value of each item occurring in each newly inserted transaction is calculated. Take item {D} in transaction 11 as an example. The quantity of item {D} in transaction 11 is 2, and its profit is 6. The utility value of {D} is thus calculated as 2*6, which is 12. The utility values of all the items in each newly inserted transaction are shown in Table 4-5.
Table 4-5: The utility values of all the items in the newly inserted transactions.
TID A B C D E
t11 3 10 1 12 0
t12 0 30 4 6 0
t13 6 0 2 0 5
STEP 3: The utility values of the items in a transaction are compared and the maximal utility value in the transaction is found. Take transaction 12 as an example. It can be observed from Table 4-5 that the utility value of {B} is 30, which is the maximal in transaction 12. The maximal utility value in each transaction is shown in Table 4-6.
Table 4-6: The maximal utility values in the newly inserted transactions.
TID A B C D E Maximal Utility Value
in a Transaction
t11 3 10 1 12 0 12
t12 0 30 4 6 0 30
t13 6 0 2 0 5 6
STEP 4: k is set to 1, where k is used to record the number of items in the itemsets currently being processed.
STEP 5: The average-utility upper bounds of the 1-itemsets in the newly inserted transactions are first calculated. Take item {A} as an example. It appears in
transactions 11 and 13. The average-utility upper bound of {A} is thus the total amount of the maximal utility values of these transactions. It is calculated as 12+6 (=18) in the example. The upper-bound values of all the items in the new transactions are shown in Table 4-7.
Table 4-7: The average-utility upper bounds of the 1-itemsets in the new transactions.
1-Itemset Average-Utility Upper Bound
A 18
B 42
C 48
D 42
E 6
STEP 6: The average-utility upper bounds of the 1-itemsets are checked against the
minimum average-utility thresholdN(which is 13.62) for the new transactions. In this example, the four 1-itemsets {A}, {B}, {C}, {D} are larger thanN. The four items are then put in the set of high upper-bound average-utility 1-itemsets for the new transactions, HU1N, which are thus shown in Table 4-8.
Table 4-8: The set of high upper-bound average-utility 1-itemsets for the new transactions, HU1N.
1-Itemset Average-Utility Upper Bound
A 18
B 42
C 48
D 42
STEP 7: For each 1-itemset s in the set of high upper-bound average-utility itemsets (HU1D)
from the original database, if it appears in the set of high upper-bound average-utility 1-itemsets (HU1N) in the new transactions, the following substeps
are done. In this case, the four 1-itemsets {A}, {B}, {C} and {D} are then processed.
Substep 7-1: The newly updated average-utility upper bound of each itemset is calculated. Take {A} as an example. Its average-utility upper bounds in the original database and in the new transactions are 67 and 18, respectively.
As a result, the average-utility upper bound of {A} in the updated database is calculated as 67+18, which is 85. The average-utility upper bounds for the other three items can be easily calculated in the same way.
Substep 7-2: The four 1-itemsets, {A}, {B}, {C} and {D}, are then put into the set of updated high upper-bound average-utility 1-itemsets, HU1U.
STEP 8: For each 1-itemset s in the set of high upper-bound average-utility itemsets (HU1D)
from the original database, if it does not appear in the set of high upper-bound average-utility 1-itemsets (HU1N) in the new transactions, the following substeps
are done. In this case, only {E} satisfies the condition.
Substep 8-1: The updated average-utility upper bound of {E} is calculated as 70+6, which is 76.
Substep 8-2: The updated average-utility upper bound of the itemset {E} is larger than
U, which is 59.02. Itemset {E} is thus put in the set of updated high upper-bound average-utility 1-itemsets, HU1U.
STEP 9: In this case, there are no 1-itemsets in the set of high upper-bound average-utility itemsets (HU1N ) in the new transactions not appearing in the set of high upper-bound average-utility 1-itemsets (HU1D) in the original database, this step is
then skipped.
After Step 9, all the updated high upper-bound average-utility 1-itemsets are shown in Table 4-9.
Table 4-9: The set of all the updated high upper-bound average-utility 1-itemsets, HU1U.
1-Itemset Average-Utility Upper Bound
A 85
B 130
C 120
D 147
E 76
STEP 10: The candidate 2-itemsets are generated from the set of high upper-bound average-utility 1-itemsets (HU1N) in the new transactions; If any 1-sub-itemsets of
a candidate 2-itemsets is not contained in the set of updated high upper-bound average-utility 1-itemsets (HU1U ), it will be removed from the candidate set. In
this case, the 2-itemsets are {AB}, {AC}, {AD}, {BC}, {BD} and {CD}.
STEP 11: k is set to 2, where k is used to record the number of items in the itemsets currently being processed.
STEP 12: The average-utility upper bounds of 2-itemsets in the newly inserted transactions are calculated. The upper-bound values of all the 2-itemsets in the new transactions are shown in Table 4-10.
Table 4-10: The average-utility upper bounds of 2-itemsets in the new
STEP 13: The average-utility upper bounds of the 2-itemsets are checked against the
minimum average-utility thresholdN(which is 13.62) for the new transactions. In this example, the four 2-itemsets {AC}, {BC}, {BD}, {CD} are larger than N. The four itemsets are then put in the set of high upper-bound average-utility 2-itemsets for the new transactions, HU2N, which are thus shown in Table 4-11.
Table 4-11: The set of high upper-bound average-utility 2-itemsets for the new transactions, HU2N.
2-Itemset Average-Utility
STEP 14: For each 2-itemset s in the set of high upper-bound average-utility itemsets (HU2D)
from the original database, if it appears in the set of high upper-bound average-utility 2-itemsets (HU2N) in the new transactions, the following substeps
are done. In this case, the three 2-itemsets {BC}, {BD} and {CD} are then processed.
Substep 14-1: The newly updated average-utility upper bounds of itemsets {BC}, {BD}
and {CD} are calculated, which are (92), (110) and (93), respectively.
Substep 14-2: The three 2-itemsets, {BC}, {BD} and {CD}, are then put into the set of updated high upper-bound average-utility 2-itemsets, HU2U.
STEP 15: For each 2-itemset s in the set of high upper-bound average-utility itemsets (HU2D)
from the original database, if it does not appear in the set of high upper-bound average-utility 2-itemsets (HU2N) in the new transactions, the following substeps
are done. In this case, the three 2-itemsets {AD}, {BE} and {DE} are then processed.
Substep 15-1: The newly updated average-utility upper bounds of itemsets {AD}, {BE}
and {DE} are calculated, which are (79), (60) and (50), respectively.
Substep 15-2: The updated average-utility upper bounds of the itemsets {AD} and {BE}
are larger thanU, which is 59.02. Itemsets {AD} and {BE} are thus put in the set of updated high upper-bound average-utility 2-itemsets, HU2U .
STEP 16: For each 2-itemset s in the set of high upper-bound average-utility itemsets (HU2N)
in the new transactions, if it does not appear in the set of high upper-bound average-utility 2-itemsets (HU2D) in the original database, the following substeps
are done. In this case, the 2-itemset {AC} is then processed.
Substep 16-1: The original database is rescanned to determine the average-utility upper bound of itemset {AC}, which is 41.
Substep 16-2: The updated average-utility upper bound of itemset {AC} is calculated.
The average-utility upper bounds of {AC} in the original database and new transactions are 41 and 18, respectively. Thus, the updated average-utility upper bound of {AC} is calculated as 41+18, which is 59.
Substep 16-3: The updated average-utility upper bound of the itemset {AC} is smaller
thanU, which is 59.02. Thus, nothing has to be done.
After Step 16, all the updated high upper-bound average-utility 2-itemsets are shown in Table 4-12.
Table 4-12: The set of all the updated high upper-bound average-utility 2-itemsets, HU2U.
2-Itemset Average-Utility Upper Bound
AD 79
BC 92
BD 110
BE 60
CD 93
STEP 17: The candidate 3-itemsets are generated from the set of high upper-bound average-utility 2-itemsets (HU2N) in the new transactions; If any 2-sub-itemsets of
a candidate 3-itemsets is not contained in the set of updated high upper-bound average-utility 2-itemsets (HU2U ), it will be removed from the candidate set. In
this case, the 3-itemset is {BCD}.
STEP 18: k is set to 3, where k is used to record the number of items in the itemsets currently being processed.
STEP 19: Repeat STEPs 5 to 11 until no new candidate itemsets are generated.
The set of all the updated high upper-bound average-utility itemsets are shown in Table 4-13.
Table 4-13: The set of all the updated high upper-bound average-utility
STEP 20: For each high upper-bound average-utility itemset s in HUU of the updated database, if it appears in the set of high upper-bound average-utility itemsets (HUD) of the original database, the following substeps are done.
Substep 20-1: The actual average-utility value of each itemset s for the new transactions is calculated. Take the itemset {BD} as an example. The actual utility values of items {B} and {D} in transaction 11 are 10 and 12, respectively.
Since the itemset {BD} contains 2 items, its actual average-utility value in transaction 11 is calculated as (10 + 12) / 2, which is 11. The itemset {BD}
appears in transactions 11 and 12. The actual average-utility value of {BD}
is thus the total amount of actual average-utility values of these transactions. The value is calculated as (11+18), which is 29.
Substep 20-2: The new actual average-utility value of s in the updated database is set.
Also take the itemset {BD} as an example. The actual average-utility value of itemset {BD} in original database and new transactions is 51 and 29, respectively. Thus, the new actual average-utility value of itemset {BD} in updated database is calculated as 51+29, which is 80.
The actual average-utility value of each high upper-bound average-utility itemset s in HUUof the updated database appearing in the set of high upper-bound average-utility itemsets (HUD) of the original database is shown in Table 4-14.
Table 4-14: The actual average-utility value of each high upper-bound average-utility itemset s in HUUof the updated database appearing in the
set of high upper-bound average-utility itemsets (HUD) of the original database.
Substep 20-3: Check whether the actual average-utility value of each itemset is larger than or equal to αU, which is 59.02. In this example, the average-utility values of itemsets {B}, {D} and {BD} are larger than αU. These itemsets are thus put in the set of high average-utility itemsets, HU.
STEP 21: For each high upper-bound average-utility itemset s in HUU of the updated database, if it does not appears in the set of high upper-bound average-utility itemsets (HUD) of the original database, the following substeps are done. In this case, itemset {BCD} is then processed.
Substep 21-1: The actual average-utility value of itemset {BCD} for the new transactions
is calculated, which is 21.
Substep 21-2: The original database is rescanned to determine the actual average-utility value of itemset {BCD} in the original database, which is 18.
Substep 21-3: The updated actual average-utility value of itemset {BCD} is calculated as 18 + 21, which is 39.
Substep 21-4: The actual average-utility value of itemset {BCD} is smaller than αU, which is 59.02. Thus, itemset {BCD} is not a high average-utility itemset.
After Step 21, all the high average-utility itemsets for the updated database can then be found.
As shown in Table 4-15.
Table 4-15: All the high average-utility itemsets for the updated database, HU.
High Average-Utility
Itemset Average-Utility
B 120
D 78
BD 80
CHAPTER 5
Incremental Utility Mining Algorithm for Record Deletion
In this chapter, an incremental utility mining algorithm is proposed to maintain the discovered high average-utility itemsets for record deletion. The proposed algorithm first scans the deleted transactions to obtain all candidate 1-itemsets with their average-utility upper bounds. The candidate 1-itemsets are then checked against the minimum average-utility threshold to decide whether they are high upper-bound average-utility itemsets for the deleted transactions.
For each 1-itemset in the set of high upper-bound average-utility itemsets from the original database, if it appears in the set of high upper-bound average-utility 1-itemsets in the
For each 1-itemset in the set of high upper-bound average-utility itemsets from the original database, if it appears in the set of high upper-bound average-utility 1-itemsets in the