• 沒有找到結果。

Problem and Definitions

CHAPTER 2 Review of Related Works

3.1 Problem and Definitions

To understand the problem of weighted frequent itemset mining, consider the transaction database given in Table 3.1 and, in which each transaction consists of two features, the transaction identification (TID) and items purchased (or events frequency). There are eight items in the transactions, respectively denoted as A to H. The predefined weight of each item is shown in Table 3.2.

Table 3.1: Set of five transactions for given example.

Table 3.2: Weights of items given in Table 3.1.

Item Weight

For the formal definition of weighted frequent itemset mining, a set of terms related to the problem of weighted frequent itemset mining [45] is defined below.

Definition 1. An itemset X is a subset of items or events, X ⊆ I. if |X| = r, the itemse X is called an r-itemset. Here I = {i1, i2, …, im} is a set of items or events, which may appear in transactions. For example, the itemset {AB} contains two items and is so called a 2-itemset.

Note that the items in an itemset are sorted in alphabetical order.

Definition 2. A transaction database TDB is composed of a set of transactions. That is,

TDB = {Trans1, Trans2, …, Transy, …, Transz}, where Transy is the y-th transaction in TDB.

Definition 3. The weight of an item i, wi, ranges from 0 to 1. For example, wA = 0.30 in Table 3.2.

Definition 4. The weight of an itemset X, wX, is the sum of the weights of all items in X divided by the number of items in X. That is:

X

where lX is the number of items in itemset X. For example, in Table 3.2, the weights of the two items in the itemset {AB} are 0.30 and 0.60, respectively, and the number of items in {AB} is 2. Therefore, w{AB} = (0.30 + 0.60) / 2 = 0.45.

Based on the fourth definition, the formula is an average weight function. To obtain the calculation base of weighted support value in a database for an itemset, the maximum weight in a transaction is regarded as the transaction weight of the transaction. The reason for this is that the weight value of any sub-itemset in a transaction has to be less than the maximum weight in the transaction. The weighted support of an itemset is further described below.

Definition 5. The transaction maximum weight of a transaction Trans, tmwTrans is the maximum weight value among those of all items in transaction Trans. For example, in Table 3.1, the second transaction includes two items, B and H, whose weights are 0.60 and 0.95, respectively. Therefore, tmw{BH} = 0.95.

Definition 6. The total transaction maximum weight of a transaction database TDB,

ttmw, is the sumof the transaction maximum weight values of all transactions in TDB. That is: maximum weight ttmw of the transaction database TDB. That is:

ttmw

Definition 8. Let λ be a pre-defined minimum weighted support threshold. An itemset X is called a weighted frequent itemset (WF) if wsupX ≥ λ. For example, if λ = 30%, then the itemset {AE} is a weighted frequent itemset, since wsup{AE} = 30% ≥ 30%.

However, the downward-closure property used in association-rule mining does not hold with regard to the problem of weighted frequent itemset mining. This is because the weight function is an average concept, and thus the actual weight supports for itemsets cannot be directly used to find the weighted frequent itemsets in databases. Take the item A in Table 3.1

as an example. There are three transactions that include this item in Table 3.1, and the weight of the item A in Table 3.2 is 0.30. The weighted support value of the itemset {A} can be then calculated as (0.30 + 0.30 + 0.30) / 3.50, which is 25.71%. If λ is set at 30%, then the itemset {A} is not a weighted frequent itemset, but its super-itemset {AE} is a weighted frequent itemset. As this example shows, the problem of weighted frequent itemset mining is more difficult to solve compared with traditional frequent itemset mining. Yun et al.

subsequently proposed an upper-bound model to address this, in which the maximum weight in a database is regarded as the upper bound of weight value of each transaction to hold the downward-closure property on weighted frequent itemset mining [45]. However, the downward-closure property can be further achieved by using the maximum weight in a transaction. We thus, propose an effective transaction maximum weight (TMW) model to tighten the upper bounds of the weight values used when mining itemsets, and the relevant terms used in our proposed TMW model are defined as follows.

Definition 9. The transaction-weighted upper bound of an itemset X, twubX, is the sum of the transaction maximum weights of the transactions including X in TDB divided by the total transaction maximum weight, ttmw of TDB. That is:

ttmw and Trans5, whose transaction maximum weights are 0.50, 0.60, 0.95 and 0.50, respectively.

Therefore, twub{E} = 2.55 / 3.50 = 72.85%.

Definition 10. Let λ be a pre-defined minimum weighted support threshold. An itemset

X is called a weighted frequent upper-bound itemset (WFUB) if twubX ≥ λ. For example, if λ = 30%, then the itemset {E} is a weighted frequent upper-bound itemset since twub{E} =

72.85% ≥ λ.

Based on the definitions given above, a weighted frequent itemset considers the individual weights of items in a transaction dataset. The goal is to solve effectively and efficiently find all the weighted frequent itemsets whose weights are larger than or equal to a predefined minimum weighted support threshold λ in a given transaction database. The details of the proposed PWA are given in the next section.

3.2 The Projection-based Weighted Frequent Itemset

相關文件