CHAPTER 4 Multi-criteria Utility Mining with Maximum Constraints
4.4 An Example of Using TPM max Algorithm
4.5.3 Evaluation on Efficiency
Experiments were then made on the synthetic datasets to evaluate the execution efficiency of the proposed two approaches using minimum constraints and maximum constraints (TPMmin and TPMmax), and the traditional two-phase mining approach, TP.
Figure 4.4 and Figure 4.5 showed the efficiency of the three compared approaches for the datasets under different parameter settings, min and D, respectively.
Figure 4.3: Performance comparison of the three approaches under different λmin.
Figure 4.4: Performance comparison of the three approaches under different D.
As could be seen in these figures, the proposed TPMmax approach was better than
0 20 40 60 80
0.20% 0.40% 0.60% 0.80% 1.00%
Execution Time (Sec.)
min: Minimum Value of All Minimum Utilities
TPMmax TPMmin TP
D10.I4.N4K.D200K Dataset
0 50 100 150 200
100K 200K 300K 400K 500K
Execution Time (Sec.)
D: Number of Transactions
TPMmin TPMmax TP
D10.I4.N4K.DxK Datasets
the other two approaches, TPMmin and TP, in terms of execution efficiency when min
decreased or D increased. The reason is the same as that mentioned previously in Sec-tion 4.5. That is, the proposed TPMmax approach adopted the maximum constraint to effectively avoid generating a huge number of candidate itemsets in mining. In addition, different from the traditional TP approach, the proposed TPMmin approach adopted the minimum constraint, which the minimum value of minimum utility thresholds of all items in an itemset as the criterion of the itemset, to copy with the problem of utility mining. Accordingly, the proposed two approaches, TPMmax and TPMmin, outperformed the traditional TP approach in execution efficiency under various parameter settings.
CHAPTER 5
Conclusions and Future Work
In this thesis, we propose a new research issue, called multi-criteria utility min-ing (abbreviated as MUM). The two kinds of viewpoints, the minimum and the max-imum constraints, are presented for defining the minmax-imum utilities of itemsets in a database when items in that database have different minimum utilities. Through the two kinds of viewpoints, natures of items can be reflected under different minimum utilities when compared to traditional utility mining with only a single minimum utility.
To our best knowledge, this work is the first work on mining high utility itemsets with different minimum utilities of items in the field of utility mining.
For the minimum constraint, more interesting utility itemsets in databases can be discovered under this constraint. In particular, this work also presents an effective strategy to keep the characteristic of downward-closure property in mining, such that a two-phase mining (TPMmin) approach can effectively avoid any information lose case in mining. The experimental results show that proposed TPMmin approach has good performance on execution efficiency on several synthetic datasets among various pa-rameter settings.
In the second part of this thesis, we have also introduced another issue,
mul-ti-criteria utility mining with the maximum constraint. Different from the minimum constraint, the maximum constraint, which may be well explained and be suitable to some mining domains, can be used to reduce number of the discovered high utility itemsets than that using the minimum constraint. Under the constraint, the characteris-tic of the downward-closure property in traditional utility mining can be kept, such that the original two-phase approach can be easily extended to find high utility itemsets in databases when items have different minimum utilities. Experimental results show that the mining time consumed by the TPMmax algorithm is much less than the TPMmin al-gorithm and the mined high utility itemset set is more compact and with higher average utility values.
In the future, we would apply the proposed algorithms to other data applications, such as data stream mining, multi-source data mining, and among others. In addition, we will also attempt to handle the maintenance problems of multi-criteria utility min-ing with the consideration of the two constraints when the transactions are inserted, deleted or modified.
References
[1] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, "Efficient tree structures for high utility pattern mining in incremental databases," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 3, pp. 1708-1721, 2009.
[2] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, "HUC-Prune: An efficient candidate pruning technique to mine high utility patterns," Applied Intelligence, vol. 34, no. 2, pp. 181-198, 2011.
[3] R. Agrawal, T. Imieliński, and A. Swami, "Mining association rules between sets of items in large databases," ACM SIGMOD International Conference on Management of data, pp. 207-216, 1993.
[4] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," The International Conference on Very Large Data Bases, pp.
487-499, 1994.
[5] F. Berzal, J. C. Cubero, N. Marín, and J. M. Serrano, "TBAR: An efficient method for association rule mining in relational databases," Data and Knowledge Engineering, vol. 37, pp. 47-64, 2001.
[6] C. H. Chen, T. P. Hong, V. S. Tseng, and C. S. Lee, "A genetic-fuzzy mining ap-proach for items with multiple minimum supports," The International Journal of the ACM on Soft Computing, vol. 13, pp. 521-533, 2008.
[7] C. J. Chu, Vincent S. Tseng, and T. Liang, "An efficient algorithm for mining high utility itemsets with negative item values in large databases," Applied Mathe-matics and Computation, vol. 215, no. 2, pp. 767-778, 2008.
[8] A. Erwin, R. P. Gopalan, and N. R. Achuthan, "CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach," The 7th IEEE International Conferences on Computer and Information Technology, pp.71–76,
2007.
[9] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, "Mining optimized association rules for numeric attributes," ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 182-191, 1996.
[10] Y. H. Hu and Y. L. Chen, "Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism," Decision Support Systems, pp. 1-24, 2006.
[11] Y. H. Hu, F. Wu, and Y. J. Liao, "An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports," Journal of Systems and Software, vol. 86, no. 5, pp. 1224-1238, 2013.
[12] Y. H. Hu and I. C. Chiang, "Mining cyclic patterns with multiple minimum repe-tition supports," The Eighth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1545-1549, 2011.
[13] T. C. K. Huang, "Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions," Information Sciences, vol. 222, pp. 126-146, 2013.
[14] IBM Quest Data Mining Project, Quest synthetic data generation code. Available at (http://www.almaden.ibm.com/cs/quest/syndata.html).
[15] H. Jiang, Y. Zhao, C. Yang, and X. Dong, "Mining both positive and negative weighted association rules with multiple minimum supports," The International Conference of Computer Science and Software Engineering, pp. 407-410, 2008.
[16] R. Uday Kiran and P. Krishna Reddy, "An improved multiple minimum support based approach to mine rare association rules," The IEEE Symposium on Com-putational Intelligence and Data Mining, pp. 340-347, 2009.
[17] R. Uday Kiran and P. Krishna Reddy, "Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms," The 14th International Conference on Extending Database Technology, pp. 11-20, 2011.
[18] G. C. Lan, T. P. Hong, and Vincent S. Tseng, "Discovery of high utility itemsets from on-shelf time periods of products," Expert Systems with Applications. vol. 38, pp.5851–5857, 2011
[19] G. C. Lan, T. P. Hong, and Vincent S. Tseng, "Reducing database scans for on-shelf utility mining," IETE TECHNICAL REVIEW, Special Issue on Advances
on Soft Computing: Theory and Applications, vol. 2, no. 2, pp. 103-112, 2011.
[20] Y. C. Lee, T. P. Hong, and W. Y. Lin, "Mining association rules with multiple minimum supports using maximum constraints," International Journal of Approximate Reasoning, vol. 40, pp. 44-54, 2005.
[21] Y. C. Lee, T. P. Hong, and T. C. Wang, "Mining multiple-level association rules under the maximum constraint of multiple minimum supports," The 19th Inter-national Conference on Industrial, Engineering and Other Applications of
Ap-plied Intelligent Systems, pp. 1329-1338, 2006.
[22] Y. C. Lee, T. P. Hong, and W. Y. Lin, "Mining fuzzy association rules with multi-ple minimum supports using maximum constraints," The 8th International Con-ference on Knowledge-Based Intelligent Information and Engineering Systems,
vol. 3214, no.1, pp. 1283-1290, 2004.
[23] Y. C. Li, J. S. Yeh, and C. C. Chang, "Isolated items discarding strategy for dis-covering high utility itemsets," Data & Knowledge Engineering. vol. 64, no. 1, 198-217, 2008.
[24] C. W. Lin, G. C. Lan, and T. P. Hong, "An incremental mining algorithm for high utility itemsets," Expert Systems with Applications, vol. 39, no. 8, 7173-7180, 2012
[25] C. W. Lin, T. P. Hong, and W. H. Lu, "The Pre-FUFP algorithm for incremental
mining," Expert Systems with Applications, vol. 36, no. 5, pp. 9498-9505, 2009.
[26] B. Liu, W. Hsu, and Y. Ma, "Mining association rules with multiple minimum supports," The International Conference on Knowledge Discovery and Data Mining, pp.337-341, 1999.
[27] Y. Liu, W. K. Liao, and A. Choudhary, "A fast high utility itemsets mining algorithm," The International Workshop on Utility-based Data Mining, pp.
90-99, 2005.
[28] W. Ouyang, "Discovery of direct and indirect fuzzy sequential patterns with multiple minimum supports in transaction databases," IEEE Fuzzy Systems and Knowledge Discovery, pp. 302-306, 2012.
[29] W. Ouyang and Q. Huang, "Mining direct and indirect fuzzy association rules with multiple minimum supports in large transaction databases," The Eighth In-ternational Conference on Fuzzy Systems and Knowledge Discovery, vol. 4,
pp.947-951, 2011.
[30] K. Wang, Y. H, and J. Han, "Mining frequent itemsets using support constraints,"
The 26th International Conference on Very Large Data Bases, pp. 43-52, 2000.
[31] H. Yao, H. J. Hamilton, and C. J. Butz, "A foundational approach to mining itemset utilities from databases," The 4th SIAM International Conference on Data Mining. pp. 482–486, 2004.
[32] H. Yao and H. J. Hamilton, "Mining itemset utilities from transaction databases,"
Data & Knowledge Engineering, vol. 59, no. 3, pp. 603–626, 2006.