• 沒有找到結果。

The problem of mining association rules in the presence of taxonomy information was addressed first in [5] and [11]. In [11], the problem isnamed “mining generalized association rules,”which aims to find associations between items at any level of the taxonomy under the minsup and minconf constraints. Their work, however, did not recognize the varied support requirements inherent in items at different hierarchy levels.

In [5], the problem was stated somewhat different from that in [11]. They generalized the uniform minimum support constraint into a form of level-wise assignment, i.e., items at the same level receive the same minimum support. The objective was mining associations level-by-level in a fixed hierarchy.

That is, only associations between items at the same level were examined progressively from the top level to the bottom.

Another form of association rules involving mining with multiple minimum supports was proposed in [7]. Their method allows users to specify different minimum support for different items and can find rules involving both frequent and rare items. However, their model considers no taxonomy at all and hence fails to find associations between items at different hierarchy levels.

To our knowledge, [5] is the only work considering both aspects of item taxonomy and multiple supports. However, their intention was quite different from ours. First, although several variants were proposed, all of them follow a level-wise, progressively deepening strategy that performs a top-down traversing of the taxonomy to generate all frequent itemsets. An Apriori-like algorithm is applied at each level, which leads to p database scans, where plkland klis the maximum k-itemset at level l.

This is quite a large overhead compared with our algorithm, which requires only maxlkltimes. Second, the minimum supports are specified uniform at each taxonomy level, that is, items at the same taxonomy level receive the same minimum support. This restrains the flexibility and power of association rules. Furthermore, together with the progressively deepening strategy, their approaches would fail to discover all frequent itemsets, especially those involving level-crossing associations. Let us illustrate this with an example, and for self-explanatory demonstration, a generic description of their approaches is given in Figure 25.

Ď: a taxonomy-information-encoded transaction database;

minsup[l]: the minimum support threshold for each concept level l;

for (l = 1; L[l, 1]and l < max_level; l++) do L[l, 1] = the frequent 1-itemsets at level l;

for (k = 2; L[l, k1] ; k++) do Ckapriori-gen(L[l, k 1]);

for each transaction tĎdo Ct= subset(Ck, t);

for each candidate ACtdo increase the count of A;

endfor

L[l, k] = {ACk| sup(A)minsup[l]};

endfor

LL[l] = kL[l, k]; /* LL[l]: the set of frequent itemsets at level l */

end for

ResultlLL[l];

Fig. 25. A generic description of multi-level association mining algorithms presented in [5].

Example 7. Consider the example used in [5], as shown in Figure 26, where the minimum support is set to be 4 at level 1, and 3 at levels 2 and 3. Each item is encoded as a sequence of digits, representing its positions in the taxonomy. For example, the item ‘White Old Mills Bread’is encoded as ‘211’in which the first digit, ‘2’, represents ‘bread’at level-1, the second ‘1’for ‘White’at level 2, and the third ‘2’for ‘Old Mills’at level 3. To discover all frequent itemsets, the proposed algorithms first apply the Apriori algorithm to T, generating all level-1 frequent itemsets. The result is

L[1, 1]{{1* *}, {2* *}}, and

L[1, 2]{{1* *, 2* *}}.

According to the level-wise deepening paradigm, only descendants of the frequent itemsets at level-1 are inspected to generate frequent itemsets at level-2. The resulting level-2, frequent 1-itemset is

L[2, 1]{{11*}, {12*}, {21*}, {22*}}.

Note that {32*} and {41*} are missed in L[2, 1] though they are frequent. For the same reason, {323}

is missed in L[3, 1] and so are level-crossing frequent itemsets {111, 12 *}, {12 *, 221}, {11 *, 12 *, 221}.

Food Milk

...

2%

..

Chocolate

Dairyland

..

Foremost

Bread

White

..

Wheat

Old Mills

..

Wonder

1

1

1 2 1 2

2

2

2 1

Level

1

2

3

Encoded table Đ

tid Items Purchased

11 {111, 121, 211, 221}

12 {111, 211, 222, 323}

13 {112, 122, 221, 411}

14 {111, 121}

15 {111, 122, 211, 221, 413}

16 {211, 323, 524}

17 {323, 411, 524, 713}

Fig. 26. The example of taxonomy and encoded transaction table in [5].

6. Conclusions

We have investigated in this paper the problem of mining generalized association rules in the presence of taxonomy and multiple minimum support specification. The classic Apriori itemset generation works in the presence of taxonomy but fails in the case of non-uniform minimum supports. We presented two algorithms, MMS_Cumulate and MMS_Stratify, for discovering these generalized frequent itemsets. Empirical evaluation showed that these two algorithms are very effective and have good linear scale-up characteristic. Between the two algorithms, MMS_Stratify performed slightly

better than MMS_Cumulate, with the gap increasing with the problem size, such as the number of transactions and/or candidate itemsets. As for the specification for non-uniform, multiple item supports, we also presented a confidence-lift specification, which is beneficial for discovering less-supported but perceptive rules without suffering from combinatorial explosion.

References

[1] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, in: Proc. 1993 ACM-SIGMOD Int. Conf. on Management of Data, (Washington, D.C., 1993) 207-216.

[2] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in: Proc. 20th Int. Conf.

on Very Large Data Bases, (Santiago, Chile, 1994) 487-499.

[3] S. Brin, R. Motwani, and C. Silverstein, Beyond market baskets: generalizing association rules to correlations, in: Proc. 1997 ACM-SIGMOD Int. Conf. on Management of Data, (1997) 207-216.

[4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamicitemsetcounting and implication rules for market-basket data,”in:Proc. 1997 ACM-SIGMOD Int. Conf. on Management of Data, (1997) 207-216.

[5] J. Han and Y. Fu, Discovery of multiple-level association rules from large databases, in: Proc.

21st Int. Conf. on Very Large Data Bases, (Zurich, Switzerland, 1995) 420-431.

[6] W. Y. Lin, M. C. Tseng, and J.H.Su,“A confidence-lift support specification for interesting associationsmining,”in: Proc. 6th Pacific Area Conference on Knowledge Discovery and Data Mining (PAKDD-2002), Taipei, Taiwan, R.O.C., May 2002.

[7] B.Liu,W.Hsu,and Y.Ma,“Mining association ruleswith multipleminimum supports,”in:Proc.

1999 Int. Conf. on Knowledge Discovery and Data Mining, (San Deige, CA, 1999) 337-341.

[8] B. Lin, W. Hsu, and Y. Ma, Pruning and summarizing the discovered association, in: Proc. 1999 ACM-SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. (San Diego, CA, 1999) 125-134.

[9] J.S.Park,M.S.Chen,and P.S.Yu,“An effectivehash-based algorithm for mining association rules,”in:Proc. 1995 ACM-SIGMOD Int. Conf. on Management of Data, (San Jose, CA 1995) 175-186.

[10] A.Savasere,E.Omiecinski,and S.Navathe,“An efficientalgorithm for mining association rules in largedatabases,”in: Proc. 21st Int. Conf. on Very Large Data Bases, (Zurich, Switzerland, 1995) 432-444.

[11] R.Srikantand R.Agrawal,“Mining generalized association rules,”Future Generation Computer Systems, Volume 13, Issues 2-3, November 1997, 161-180.

相關文件