結論與討論 - 衡量分類關聯規則的新方法

從以上實驗中我們確實可以觀察出 CARC 演算法比 CBA 演算法有更好的準 確度，在實驗一中可以發現當 MinSup 的值超過 5%時，CBA 的準確度明顯降低，

那是因為當 Support 值設得太高時，所產生的關聯規則變少，許多重要的規則未 被產生，因此造成要對測試資料進行分類的判斷時，會導致分類錯誤的產生。

除了因為 Support 值設定太高造成產生之規則過少，預設類別的選擇也是影 響分類錯誤的重要因素之ㄧ。從實驗一中可得知 CARC 並不因為 MinSup 值過大 而導致準確度的降低，依然維持著相當高的準確率。從實驗二和實驗三中亦可看 出 CARC 精確度比 CBA 高或是不相上下。在實驗參數設定的部份，鑒於 MinSup 過高會產生太多無意義的規則，設定太低又會有許多規則無法被產生，因此，在 實驗中將 MinSup 值設定在 0.01%和 30%之間。另外，在實驗四中也可發現當 k 最小設定為 5 時，準確率大致上是比較低的，所以，可知預設類別的選擇也是影響準確度因素之ㄧ。

本論文所提出之 CARC 演算法經實驗結果證明，確實能達到良好的正確率，

找出正確的規則，不但可以找出 Support 高的規則，亦可以找出 Support 低卻緊密性高的規則，並能提供好的預設類別演算法，將測試資料分類正確，提高演算法的準確率。CARC 演算法在產生規則的部份是以 Condenseness 為過濾規則的指標，其中 Condenseness 的求值公式又與 Lift 值的大小有關聯，因此，在未來研究上可以針對此部分加以探討，因為除了 Lift 這種衡量規則間關係的衡量指標，尚有其他衡量指標(如第 3.1 節中所介紹的各種規則衡量標準)可以加以整合利用，以達到較高準確率的分類結果。

參考文獻

[1] A. Veloso, W. Meira Jr., M. J. Zaki, “Lazy Associative Classification,” in Proceedings of the Sixth International Conference on Data Mining 2006 IEEE, 2006.

[2] Anthony K. H. Tung, H. Lu, J. Han and L. Feng. “Efficient Mining of Intertransaction Association Rules,” In IEEE Transactions on Knowledge and Data Engineering,” vol.15, no.1, January/February 2003.

[3] T. Bäck, U. Hammel, and H. P. Schwefel, “Evolutionary Computation:

Comments on The History and Current State,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 3-17, 1997.

[4] Bäck, Thomas, “Evolutionary Algorithms in Theory and Practice,” Oxford University Press, New York, 1996.

[5] B. Liu, W. Hsu, Y. Ma, “Integrating Classification and Association Rule Mining,”

The Fourth International Conference on Knowledge Discovery and Data Mining, New York, USA, 1998.

[6] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classification and Regression Tree,” Wadsworth, 1984.

[7] Edward R. Omiecinski, “Alternative Interest Measures for Mining Associations in Databases, ” IEEE Transaction on Knowledge and Data Engineering, 2003.

[8] G. Piatetsky-Shapiro, “Discovery, Analysis, and Presentation of Strong Rules,”

AAAI/MIT Press, Chapter 13, 1991.

[9] H. Cherfi, Y. Toussaint. “How Far Association Rules and Statistical Indices help Structure Terminology ?,” Workshop of the biennial European Conference on Artificial Intelligence 2002, Natural Language Processing and Machine Learning for Ontology Engineering OLT'02, Lyon, France, 2002.

[10] J. Han and Micheline Kamber, “Data Mining: Concepts and Technique,” Morgan Kaufmann Publishers, San Francisco, 2001.

[11] J. R. Quinlan, “C4.5: Programs for Machine Learning, ” Morgan Kaufmann, 1993.

[12] J. A. Berry and G. S. Linoff, “ Data Mining Techniques for Marketing, Sales and Customer Support,” John Wiley & Sons, New York, 1997.

[13] K. Wang, S. Zhou, Y. He, “Growing decision trees on support-less association rules, ” Knowledge Discovery and Data Mining, pp.265-269, 2000.

[14] M. Berry, and G. Linoff, “Data Mining Techniques: For Marketing, Sales, and Customer Support” , New York: John Wiley and Sons, 1997.

[15] M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective, ” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883, 1996.

[16] M. Song and S. Rajasekaran, “A Transaction Mapping Algorithm for Frequent Itemsets Mining,” IEEE Transactions on Knowledge and Data Engineering, vol.

18, issue 4, pp.472-481, April 2006.

[17] M. Kubat, A. Hafez, V. V. Raghavan, J. R. Lekkala, W. K. Chen, “Itemset Trees for Targeted Association Querying,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 6, pp.1522-1534, 2003.

[18] R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association rules,” in Proceedings of the 20th International Conference on Very Large Databases, pp.487-499, 1994.

[19] R. Agrawal, T. Imielinksi and A. Swami, “Mining Association Rules between Sets of Items in Large Database,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.207-216, 1993.

[20] R. J. Bayardo Jr. and R. Agrawal, “Mining the Most Interesting Rules,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.145-154, 1999.

[21] S. Brin, R. Motwani, J. D. Ullman and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 255-264, 1997.

[22] S. Haykin, “ Neural Networks: A Comprehensive Foundation,” Prentice Hall, 1999.

[23] W. Li, J. Han, J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 369-376, 2001.

[24] W. Y. Lin and M. C. Tseng, “Automated Support Specification for Efficient mining of Interesting Association Rules,” Journal of Information Science, vol. 32, no. 3, pp. 238-250, 2006.

[25] UCI Machine Learning：http://mlearn.ics.uci.edu/

[26] CBA tool：http://www.comp.nus.edu.sg/~dm2/

在文檔中衡量分類關聯規則的新方法 (頁 48-51)