• 沒有找到結果。

本論文提出在資料流滑動視窗模型下狀態變動項目集的探勘方法,利用兩棵 相同字首樹的樹狀結構,分別稱為 Base-Tree 及 Delta-Tree,來儲存在任一時間點

t 時滑動視窗中所有交易資料,及從 t 經過 1 個時間點之間新增及過時的交易資料。

本論文提出 CV-SCD 演算法,利用 Base-Tree 及 Delta-Tree 的資訊判斷出狀態變動 項目集,並同時對兩棵樹遞迴建立包含特定項目的條件樹,以探勘出更長的狀態 變動項目集。

此外,本論文對於探勘出的狀態變動項目集提出以狀態變動資料項集快照加 以儲存,提供可指定特定區間對其中各狀態變動項目集的變動情形進行特性分類。

並為維護歷史快照資料,採用金字塔式時間框架的結構來儲存快照,以節省快照 儲存空間。

實驗結果顯示 CV-SCD 在新增及過時的交易資料相對於滑動視窗資料為少量 時,相較於比較基準方法能很有效率的探勘出狀態變動項目集。在支持度小的情 況下,特別能顯示出 CV-SCD 的執行增進效率。當資料集中所包含之交易資料長 度或常見項目集長度在 5 以下時,亦相對能節省至少一半以上的探勘時間。此外,

在加進狀態變動項目集快照儲存分析時,幾乎不會影響 CV-SCD 的探勘執行時 間。

在此論文,我們由實驗中發現 CV-SCD 主要受變動資料中項目集大小影響很 大,未來可進一步探討如何根據資料特性採用不同探勘策略。此外目前 CV-SCD 假設每一交易區塊所包含之交易資料皆為固定筆數的情況,將如何擴充處理交易 區塊包含不同交易筆數的情況,有效率進行狀態變動項目集探勘為可繼續研究探 討的方向。

參考文獻

[1] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation, ” in Proc. of ACM SIGMOD Int. Conf. on Management of data,

2000.

[2] Y. Chi, H. Wang , P. S. Yu ,and R. R. Muntz,” Moment: Maintaining closed frequent itemsets over a stream sliding window,” In Proc. of Int. Conf. on Data

Mining and Knowledge Discovery, 2004.

[3] N.Jiang and L.Gruenwald, ”CFI-Stream: mining closed frequent itemsets in data streams,” In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery

and Data Mining,2006.

[4] H. J. Woo, W. S. Lee, “estMax: Tracing maximal frequent itemsets over online data streams,” in Proc. of the 2007 Seventh IEEE Int. Conf. on Data Mining,

2007.

[5] B. Mozafari, H. Thakkar, C. Zaniolo, “Verifying and Mining Frequent Patterns from Large Windows over Data Streams,” in Proc. of the 25th Int. Conf. on Data

Engineering, 2008.

[6] M. Feng, G. Dong, J. Li, Y. Tan, L. Wong, “Evolution and Maintenance of Frequent Pattern Space when Transactions are Removed,” in Proc. of 11th

Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2007.

[7] S. K. Tanbeer, C. F. Ahmed, B. Jeong, Y. Lee, “CP-Tree: A Tree Structure for

Single-Pass Frequent Pattern Mining” in Proc. of 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2008.

[8] C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Clustering Evolving Data Streams,” in Proc. of the 29th Int. Conf. on Very large data bases - Volume

29, 2003.

[9] E. J. Spinosa, A. P. de L. F. de Carvalho, J. Gama, “OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams, ” in Proc. of the

2007 ACM symposium on Applied computing, 2007.

[10] P. Zhang, X. Zhu, Y. Shi, “Categorizing and Mining Concept Drifting Data Streams, ” in Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge

discovery and data mining, 2008.

[11] S. K. Tanbeer, C. F. Ahmed, B. S. Jeong, Y. K. Lee, “Efficient Frequent Pattern Mining over Data Streams, ” in Proc. of the 17th ACM Conf. on Information and

knowledge management, 2008.

[12] D. Burdick, M. Calimlim, J. Gehrke, “MAFIA: A Maximal Frequent Itemset

Algorithm for Transactional Databases, “ in Proc. of the 18th Int. Conf. on Data Engineering, 2001.

[13] J. Cheng, Y. Ke, W. Ng, “A Survey on Algorithms for Mining Frequent Itemsets

over Data Streams, “ in Knowledge and Information Systems, 2006.

[14] K. Li, Y. Wang, M. Elahi, X. Li, H. Wang, “Mining Recent Frequent Itemsets in

Data Streams with Optimistic Pruning, “ in in Proc. of 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2008.

[15] X. Sun, M. E. Orlowska, X. Li, “Finding Frequent Itemsets in High-Speed Data Streams, ” in Proc. of 6th SIAM International Conference on Data Mining, 2006.

[16] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. 20th Int. Conf. Very Large Data Bases, 1994.

[17] M.-S. Chen, J. Han, P. S. Yu, “Data Mining: An Overview from a Database Perspective,” in Proc. of the IEEE Transactions on Knowledge and Data

Engineering, Volume 8(6): pages 866-883, 1996.

[18] Manku, G. Singh, “Approximate frequency counts over data streams,” in Proc.

of the 28th Int. Conf. on Very Large Data Bases, 2002.

[19] J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams,” In Journal of Information Science

and Engineering, Vol. 20, No. 4, July 2004.

[20] C. Y. Lin, “Frequent Patterns Change Detection over Data Streams, ” 2007.

[21] http://www.almaden.ibm.com/cs/disciplines/iis/

[22] http://www.ecn.purdue.edu/KDDCUP/

相關文件