Experimental Results - Effective Utility Mining with the Measure of Average Utility

Experiments were made to show the performance of the proposed approach. All the

experiments were performed on an Intel Core 2 Duo E6550 (2.33GHz) PC with 2 GB

main memory, running the Windows XP Professional operating system. The proposed

algorithm was implemented in Visual C# 9.0.

A real data set from a major grocery chain store in America was used for the

experiments. There were 21,556 transactions and 1,559 distinct items in the database.

Each transaction consisted of the products sold and their quantities. The average

transaction length was 4.03. The total utility from all the transactions in the dataset was

104,450,739. Figure 1 shows the number of candidate itemsets generated by our

proposed approach (TPAU) and Liu’s two-phased approach (TP), respectively. The

minimum utility threshold varied from 0.008% to 0.012%. From the figure, it could be

observed that TPAU generated much fewer candidate itemsets than TP did. The

number of candidate itemsets generated by TPAU decreased substantially. The

computation time could thus be greatly reduced.

0 50000 100000 150000 200000 250000

0.008% 0.009% 0.010% 0.011% 0.012%

Number of Candidate Itemsets

Minimum Utility Threshold

TPAU TP

Figure 1. Numbers of candidate itemsets along with different minimum utility thresholds for the two approaches

Table 16 presents the summary of the numbers of candidate itemsets (CI), high

average-utility itemsets (HAUI) generated by our approach and high utility itemsets

(HUI) generated by Liu’s two-phased approach. In Phase I, TPAU generated much

fewer candidate itemsets than TP did. In Phase II, the number of high average-utility

itemsets (HAUI) was much less than that of high utility itemsets (HUI). TPAU could

discover high average-utility itemsets whose utility values were much closer to the

minimum utility threshold when compared to high utility itemsets.

Table 16: Comparison of the numbers of candidate itemsets (CI), high average-utility itemsets (HAUI) and high utility itemsets (HUI) of the two approaches.

Phase I Phase II

Threshold TPAU TP TPAU TP

CI CI HAUI HUI

0.012% 1583 37707 1556 3497

0.011% 1614 53324 1557 4557

0.010% 1677 80735 1565 6486

0.009% 1896 125920 1579 9997

0.008% 2288 197251 1605 18005

Figure 2 shows the execution time of the two approaches. The execution time of

TPAU is much less than TP especially when the minimum utility threshold is small.

two-phase mining algorithm to discover high average-utility itemsets. The proposed

mining algorithm is divided into two phases. In phase I, this algorithm overestimates

the utility of itemsets for maintaining the “downward closure” property. The property

is then used to efficiently prune impossible utility itemsets level by level. In phase II,

one database scan is needed to determine the actual high average-utility itemsets from

the candidate itemsets generated in phase I. Since the number of candidate itemsets has

been greatly reduced when compared to that by the traditional approaches, a lot of

computational time may be saved. Considering that the length of itemsets is a major

factor to influence the utility values of itemsets in traditional approaches, the measure

“average-utility” is good to avoid the influence of the length. It can thus get a trade-off

between high utility and time complexity. The experimental results also show the

above points.

References

[1] R. Agarwal, C. Aggarwal, and V. Prasad, "A Tree Projection Algorithm

for Generation of Frequent Itemsets," Journal of Parallel and Distributed

Computing, vol. 61, pp. 350-371, 2001.

[2] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules

between Sets of Items in Large Databases," The 1993 ACM SIGMOD

International Conference on Management of Data, pp. 207-216, 1993.

[3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association

Rules," The 20th International Conference on Very Large Data Bases, pp.

487-499, 1994.

[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," The 11th

International Conference on Data Engineering, pp. 3-14, 1995.

[5] R. Agrawal and R. Srikant, "Mining Sequential Patterns: Generalizations

and Performance Improvements," The 5th International Conference on

Extending Database Technology, pp. 3-17, 1996.

[6] B. Barber and H. J. Hamilton, "Algorithms for Mining Share Frequent

Itemsets Containing Infrequent Subsets," Lecture Notes in Computer

Science, vol. 1910, pp. 76-99, 2000.

[7] B. Barber and H. Hamilton, "Extracting Share Frequent Itemsets with

Infrequent Subsets," Data Mining and Knowledge Discovery, vol. 7, pp.

153-185, 2003.

[8] F. Berzal, J. Cubero, N. Marin, and J. Serrano, "TBAR: An Efficient

Method for Association Rule Mining in Relational Databases," Data &

Knowledge Engineering, vol. 37, pp. 47-64, 2001.

[9] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, "Dynamic Itemset

Counting and Implication Rules for Market Basket Data," The 1997 ACM

SIGMOD International Conference on Management of Data, pp. 255-264,

1997.

[10] C. Chang and C. Lin, "Perfect Hashing Schemes for Mining Association

Rules," The Computer Journal, vol. 48, pp. 168-179, 2005.

[11] G. Grahne and J. Zhu, "Fast Algorithms for Frequent Itemset Mining

Using FP-Trees," IEEE Transactions on Knowledge and Data

Engineering, vol. 17, pp. 1347-1362, 2005.

[12] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu,

"FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining," The

6th ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, pp. 355-359, 2000.

[13] J. Han, J. Pei, Y. Yin, and R. Mao, "Mining Frequent Patterns without

Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining

and Knowledge Discovery, vol. 8, pp. 53-87, 2004.

[14] K. Hu, Y. Lu, L. Zhou, and C. Shi, "Integrating Classification and

Association Rule Mining: A Concept Lattice Framework," Lecture Notes

In Computer Science, vol. 1711, pp. 443-447, 2004.

[15] H. Jiawei, P. Jian, and Y. Yiwen, "Mining Frequent Patterns without

Candidate Generation," The ACM SIGMOD International Conference on

Management of Data, pp. 1-12, 2000.

[16] L. Junqiang, P. Yunhe, W. Ke, and H. Jia-wei, "Mining Frequent Item

Sets by Opportunistic Projection," The 8th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, pp. 229-238,

2002.

[17] Y.-C. Li and C.-C. Chang, "A New FP-Tree Algorithm for Mining

Frequent Itemsets," Lecture notes in computer science vol. 3309, pp.

266-277, 2004.

[18] Y. Li, J. Yeh, and C. Chang, "Direct Candidates Generation: A Novel

Algorithm for Discovering Complete Share-Frequent Itemsets," Lecture

Notes in Computer Science, vol. 3614, p. 551, 2005.

[19] Y. Li, J. Yeh, and C. Chang, "Efficient Algorithms for Mining

Share-Frequent Itemsets," The 11th World Congress of International

Fuzzy Systems Association, pp. 543-539, 2005.

[20] Y. Li, J. Yeh, and C. Chang, "A Fast Algorithm for Mining

Share-Frequent Itemsets," The 7th Asia Pacific Web Conference, pp.

417-428, 2005.

[21] Y. Liu, W.-k. Liao, and A. Choudhary, "A Two-Phase Algorithm for Fast

Discovery of High Utility Itemsets," Lecture Notes in Computer Science,

vol. 3518, pp. 689-695, 2005.

[22] Y. Liu, W. Liao, and A. Choudhary, "A Fast High Utility Itemsets Mining

Algorithm," The 1st International Workshop on Utility-Based Data

Mining, pp. 90-99, 2005.

[23] J. Park, M. Chen, and P. Yu, "An Effective Hash-Based Algorithm for

Mining Association Rules," The 1995 ACM SIGMOD International

Conference on Management of Data, pp. 175-186, 1995.

[24] Y. G. Sucahyo and R. P. Gopalan, "Building a More Accurate Classifier

Based on Strong Frequent Patterns," Lecture Notes in Computer Science,

vol. 3339, pp. 1036-1042, 2005.

[25] H. Yao, H. Hamilton, and C. Butz, "A Foundational Approach to Mining

Itemset Utilities from Databases," The 4th SIAM International

Conference on Data Mining, pp. 211-225, 2004.

[26] H. Yao and H. Hamilton, "Mining Itemset Utilities from Transaction

Databases," Data & Knowledge Engineering, vol. 59, pp. 603-626, 2006.

[27] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "New Algorithms for

Fast Discovery of Association Rules," The 3rd International Conference

on Knowledge Discovery and Data Mining, pp. 283-286, 1997.

在文檔中 Effective Utility Mining with the Measure of Average Utility (頁 23-31)