Future Work - Conclusions and Future Work

Chapter 7 Conclusions and Future Work

7.2 Future Work

With the mining capabilities of the proposed algorithms, there are several interesting extensions on frequent pattern mining and change mining, as listed below.

Resource-aware mining of frequent patterns over data streams.

Resource such as CPU, memory space, and sometimes energy, are very precious in a stream mining environment. They are very likely to be used up when processing data streams which arrive with rapid speed and a huge amount. How to use these resources when we use the proposed algorithms for mining frequent itemsets and changes is an important research issue in our future work.

112

Online mining of sequential patterns over data streams with a sliding window.

Online mining of sequential patterns in data streams is more complicated than mining of frequent itemset. There are several challenges of mining of sequential patterns from data streams, such as how to define the meaning of sequential patterns in a stream environment, how to define the model of sliding window for mining sequential patterns of data streams, and how to design an efficient single-pass algorithm for mining the set of sequential patterns from data streams.

Online mining of high utility itemsets over data streams with a sliding window.

Although mining itemsets correlations is important in some applications, in many applications people are more interested in finding out how a set of items that is useful by some measure, such as utility. The frequent itemsets do not reflect the impact of any other factor except frequency of the presence or absence of an item. Frequent itemsets may only contribute a small portion of the overall profit, whereas infrequent itemsets may contribute a large portion of the profit. Hence, utility mining is likely to be useful in a wide range of practical application. There are several challenges on mining high utility itemsets over data streams, such as how to define the model of sliding window for mining high utility itemsets of data streams, how to define the meaning of high utility itemsets in a stream environment, and how to design an efficient one-pass algorithm for discovering the set of high utility itemsets from data streams with a sliding window.

113

References

[1] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A framework for clustering evolving data streams, in: Proc. VLDB, 2003, pp.81-92.

[2] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases, in: Proc. SIGMOD, 1994, pp. 207-216.

[3] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in: Proc. VLDB, 1994, pp. 487-499.

[4] N. Alon, P. Gibbons, Y. Matias, and M. Szegedy, Tracking join and self-join sizes in limited storage, in: Proc. PODS, 1999, pp. 10-20.

[5] N. Alon, Y. Matias, and M. Szegedy, The space complexity of approximating the frequency moments, in: Proc. STOC, 1996, pp. 20-29.

[6] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, Models and issues in data stream systems, in: Proc. PODS, 2002, pp.1-16.

[7] B. Babcock and C. Olston, Distributed top-k monitoring, in: Proc. ACM SIGMOD, 2003, pp. 28-39.

[8] S. Babu and J. Widom, Continuous queries over data streams, SIGMOD Record, 30(3) (2001) 109-120.

[9] J. Borges and M. Levene, Data mining of user navigation patterns, in: Proc. WEBKDD, 1999, pp. 92-111.

[10] J. H. Chang & W. S. Lee. Finding recent frequent itemsets adaptively over online data streams, in: Proc. ACM SIGKDD, 2003, pp. 487-492.

[11] J. Chang and W. Lee. A sliding window method for finding recently frequent itemsets over online data streams, Journal of Information Science and Engineering (JISE) 20 (4) (2004).

[12] M. Charilar, K. Chen, and M. Farach-Colton, Finding frequent items in data streams, in:

114

Proc. ICALP, 2002, pp. 693-703.

[13] M.-S. Chen, J.-S. Park and P. S. Yu, Efficient data mining for path traversal patterns, IEEE TKDE, 10 (2) (1998) 209-221.

[14] Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, Multi-dimensional regression analysis of time-series data streams, in: Proc. VLDB, 2002, pp. 323-334.

[15] X. Chen and X. Zhang, A popularity-based prediction model for web prefetching, IEEE Computer 36 (3) (2003) 63-70.

[16] W. Cheung and O. R. Zaïane, Incremental mining of frequent patterns without candidate generation or support constraint, in: Proc. IDEAS, 2003, pp 111-116.

[17] Y.L. Cheung, A. W.-C. Fu, Mining association rules without support threshold: with and without item constraints, IEEE TKDE, 16(9), 2004, pp 1052-1069.

[18] Y. Chi, H. Wang, P. Yu, and R. Muntz. MOMENT: Maintaining closed frequent itemsets over a stream sliding window, in: Proc. ICDM, 2004, pp. 59-66.

[19] R. Cooley, B. Mobasher, and J. Srivastava, Web mining: information and pattern discovery on the World Wide Web, in: Proc. ICTAI, 1997, pp. 558-567.

[20] G. Cormode and S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, ACM Trans. Database Syst. 30(1) (2005) 249-278.

[21] M. Datar, A. Ginois, P. Indyk, and R. Motwani, Maintaining stream statistics over sliding windows, in: Proc. SODA, 2002, pp. 635-644.

[22] E. Demaine, A. López-Ortiz, and J. I. Munro, Frequent estimation of internet packet streams with limited space, in: Proc. ESA, 2002, pp. 348-360.

[23] P. Domingos and G. Hulten, Mining high-speed data streams, in: Proc. ACM SIGKDD, 2000, pp. 71-80.

[24] G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang and P.S. Yu, Online mining of changes from data streams: Research problems and preliminary results, in: Proc. ACM

115

SIGMOD MPDS, 2003.

[25] G. Dong and J. Li, Efficient mining of emerging patterns: discovering trends and differences, in: Proc. ACM SIGKDD, 1999, pp. 43-52.

[26] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Moteani, and J. D. Ullman, Computing iceberg queries efficiently, in: Proc. VLDB, 1998, pp. 299-310.

[27] J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan, An approximate L1-difference algorithm for massive data streams (extended abstract), in: Proc. IEEE FOCS, 1999, pp. 501-511.

[28] A. W.-C. Fu, R. W.-W. Kwong, and J. Tang, Mining n-most interesting itemsets, in: Proc.

ISMIS, 2000.

[29] V. Ganti., J. Gehrke, and R. Ramakrishnan, A framework for measuring changes in data characteristics, in: Proc. PODS, 1999, pp. 126-137.

[30] V. Ganti., J. Gehrke, and R. Ramakrishnan, Mining data streams under block evolution, SIGKDD Explorations, 3(2), 2002, pp. 1-10.

[31] C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu. Mining frequent patterns in data streams at multiple time granularities, in: Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT, H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), 2003.

[32] P. B. Gibbons and Y. Matias, Synopsis data structures for massive data sets, in: Proc.

SODA, 1999, pp. 909-910.

[33] L. Golab and M. T. Ozsu, Issues in data stream management, SIGMOD Record 32 (2) (2003) 5-14.

[34] S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering data streams, in: Proc.

FOCS, 2000, pp. 359-366.

[35] J. Han, J. Pei, Y. Yin and R. Mao, Mining frequent patterns without candidate generation:

116

a frequent-pattern tree approach, Data Mining and Knowledge Discovery, 8 (1) (2004) 53-87.

[36] J. M. Hellerstein, P. J. Haas, and H. Wang, Online aggregation, in: Proc. ACM SIGMOD, 1997, pp. 171-182.

[37] M. R. Henzinger, P. Raghavan, and S. Rajagopalan, Computing data streams, Technical Report 1998-011, Digital Eqipment Corporation, Systems Research Center, May, 1998.

[38] G. Hulten, L. Spencer, and P. Domingos, Mining time-changing data streams, in: Proc.

ACM SIGKDD, 2001, pp. 97-106.

[39] R. Jin and G. Agrawal. An algorithm for in-core frequent itemset mining on streaming data. In: Proc. IEEE ICDM, 2005.

[40] R. Karp, C. Paradimitriou, and S. Shenker, A simple algorithm for finding elements in sets and bags, ACM TODS, 28 (1) (2003) 51-55.

[41] H.-F. Li and S.-Y. Lee, Single-pass algorithms for mining frequency change patterns with limited space in evolving append-only and dynamic transaction data streams, in: Proc EEE, 2004.

[42] H.-F. Li, C.-C. Ho, M.-K. Shan, and S.-Y. Lee, Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sliding Window, in: Proc. IEEE SMC, 2006.

[43] H.-F. Li, S.-Y. Lee, and M.-K. Shan, An efficient algorithm for mining frequent itemsets over the entire history of data streams, in: Proc. IWKDDS, 2004.

[44] H.-F. Li, S.-Y. Lee, and M.-K. Shan, Online mining maximal frequent structures in continuous landmark melody streams, Pattern Recognition Letters, 26(11), August 2005, pp. 1658-1674.

[45] H.-F. Li, S.-Y. Lee, and M.-K. Shan, On mining webclick streams for path traversal patterns, in: Proc. WWW, 2004, pp. 404-405.

117

[46] H.-F. Li, S.-Y. Lee, and M.-K. Shan, Online mining (recently) maximal frequent itemsets over data streams, in: Proc. RIDE, 2005.

[47] H.-F. Li, S.-Y. Lee, and M.-K. Shan, DSM-TKP: mining top-k path traversal patterns over web click-streams, in: Proc. WI, 2005.

[48] H.-F. Li, S.-Y. Lee, and M.-K. Shan, DSM-PLW: Single-pass mining of path traversal patterns over streaming web click-sequences, Computer Networks: Special Issue on Web Dynamics, accepted, to appear.

[49] H.-F. Li, S.-Y. Lee, and M.-K. Shan, Online mining changes of items over continuous append-only and dynamic data streams, Journal of Universal Computer Science: Special Issue on Knowledge Discovery in Data Streams, 11(8), 2005, pp. 1411-1425.

[50] M.-Y. Lin and S.-Y. Lee, Fast discovery of sequential patterns through memory indexing and database partitioning, Journal of Information Sciences and Engineering (JISE), 21 (1) (2005) 109-128.

[51] C.H. Lin, D.Y. Chiu, Y.H. Wu and A.L.P. Chen, Mining frequent itemsets from data streams with a time-sensitive sliding window, in: Proc. SIAM SDM, 2005.

[52] B. Liu, W. Hsu, H.-S. Han, and Y. Xia, Mining changes for real-life applications, in: Proc.

DaWaK, 2000, pp. 337-346.

[53] G. S. Manku and R. Motwani. Approximate frequency counts over data streams, in: Proc.

VLDB, 2002, pp. 346-357.

[54] A. Metwally, D. Agrawal, A. E. Abbadi, Efficient computation of frequent and top-k elements in data streams, in: Proc. ICDT, 2005, pp. 398-412.

[55] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, Streaming-data algorithms for high-quality clustering, in: Proc. ICDE, 2002, pp. 685-.

[56] Z. Pabarskaite, Decision trees for web log mining, Intelligent Data Analysis, 7 (2) (2003) 141-154.

118

[57] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, Mining access patterns efficiently from Web logs, in: Proc. PAKDD, 2000, pp. 396-407.

[58] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, Mining sequential patterns by pattern-growth: the PrefixSpan approach, IEEE Trans.

Knowl. Data Eng. 16 (10) (2004) 1424-1440.

[59] S. Schechter, M. Krishnan, and M. D. Smith, Using path profiles to predict HTTP requests, Computer Networks, 30 (1-7) (1998).

[60] M.-K. Shan and H.-F. Li, Fast discovery of structure navigation patterns from web user traversals, in: Proc. SPIE DMKD, 2002, pp. 272-283.

[61] M. Spiliopoulou, L. C. Faulstich, and K. Winkler, A data miner analyzing the navigational behaviour of web users, in: Proc. ACAI, 1999, pp. 588-589.

[62] J. Srivastava, R. Cooley, M. Deshpande, P.-N. Tan, Web usage mining: discovery and applications of usage patterns from web data, SIGKDD Explorations, 1 (2) (2000) 12-23.

[63] W.G. Teng, M.-S. Chen, and P. S. Yu. A regression-based temporal pattern mining scheme for data streams, in: Proc. VLDB, 2003, pp. 93-104.

[64] W.-G. Teng, M.-S. Chen, and P. S. Yu. Using wavelet-based resource-aware mining to explore temporal and support count granularities in data streams, in Proc: SIAM SDM, 2004.

[65] P. Tzvetkov, X. Yan, J. Han, TSP: Mining top-k closed sequential patterns, in: Proc.

ICDM, 2003, pp. 347-354.

[66] J. Wang, J. Han, Y. Lu, and P. Tzvetkov, TFP: An efficient algorithm for mining top-k frequent closed itemsets, IEEE TKDE, 17(5), 2005, pp. 652-664.

[67] D. Xing and J. Shen, Efficient data mining for web navigation patterns, Information and Software Technology, 46 (1) (2004) 55-63.

[68] J.-X. Yu, Z. Chong, H. Lu, and A. Zhou. False Positive or False Negative: Mining

119

frequent itemsets from high speed transactional data streams, in: Proc. VLDB, 2004, pp.

204-215.

[69] Z. Zheng, R. Kohavi, and L. Mason, Real world performance of association rule algorithms, in: Proc. ACM SIGKDD, 2001, pp.401-406.

[70] Y. Zhu and D. Shasha, StatStream: statistical monitoring of thousands of data streams in real time, in: Proc.VLDB, 2002, pp. 358-369.

[71] http://www.ecn.purdue.edu/KDDCUP/

120

Publication List

Journal Papers

1. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan (2006), "DSM-PLW: Single-Pass Mining of Path Traversal Patterns over Streaming Web Click-Sequences," Computer Networks: Special Issue on Web Dynamics, Volume 50, Issue 10, July 2006, pp.

1474-1487. (SCI-E, JCR 2004 IF = 1.226)

2. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan (2005), "Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams," Journal of Universal Computer Science: Special Issue on Knowledge Discovery in Data Streams, Volume 11, No. 8, 2005, pp. 1411-1425. (SCI-E, JCR 2004 IF=0.456) 3. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan (2005), "Online Mining Maximal

Frequent Structures in Continuous Landmark Melody Streams," Pattern Recognition Letters, Volume 26, Issue 11, August 2005, pp. 1658-1674 (SCI-E, JCR 2004 IF=0.576)

4. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee (2006), “DSM-FI: An Efficient Algorithm for Mining Frequent Itemsets in Data Streams, “Knowledge and Information Systems: An International Journal, under revision. (SCI-E & EI)

5. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee (2006), “DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams,” in preparation.

6. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee (2006), “Efficient Maintenance and Mining of Frequent Itemsets over Stream Sliding Windows, “in preparation.

7. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee, Online Mining of Frequent Query Trees over Data Streams, in preparation.

8. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee, Mining and Detecting Changes in

121

User-Centered Music Query Streams, in preparation.

Conference Papers

1. Hua-Fu Li, Chin-Chuan Ho, Man-Kwan Shan, and Suh-Yin Lee, " Efficient Maintenance and Mining of Frequent Itemsets over Stream Sliding Windows," in Proc. of IEEE International Conference on Systems, Man, and Cybernetic (IEEE SMC-2006), Taipei, Taiwan, October 8-10, 2006. (EI)

2. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee, "Detecting Changes in User-Centered Music Query Streams," in Proc. of IEEE International Conference on Multimedia and Expo (ICME-2006), Toronto, Ontario, Canada, July 9-12, 2006.

(EI)

3. Hua-Fu Li, Chin-Chuan Ho, Man-Kwan Shan, and Suh-Yin Lee, "Online Mining of Recent Music Query Streams," in Proc. of IEEE International Conference on Multimedia and Expo (ICME-2006), Toronto, Ontario, Canada, July 9-12, 2006.

(EI)

4. Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee, "Online Mining of Frequent Query Trees over Data Streams," in Proc. of the 15th World Wide Web Conference (WWW-2006), Edinburgh, Scotland, May 23-26, 2006. (EI) (poster)

5. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams," in Proc. of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005), France, September 19-22, 2005. (EI)

6. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "Online Mining (Recently) Maximal Frequent Itemsets over Data Streams," in Proc. of the 15th IEEE International Workshop on Research Issues on Data Engineering (RIDE2005),

122

Tokyo, Japan, April 3-4, 2005. (EI)

7. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "Mining Maximal Frequent Itmesets in Data Streams," in Proc. of 2004 International Computer Symposium (ICS2004), Taipei, Taiwan, December 15-17, 2004. (Best Paper Award)

8. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams," in the Proc. of First International Workshop on Knowledge Discovery in Data Streams, to be held in conjunction with the 15th European Conference on Machine Learning (ECML 2004) and the 8th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2004), Pisa, Italy, September 20-24, 2004.

9. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "On Mining Webclick Streams for Path Traversal Patterns," in Proc. of the 13th World Wide Web Conference (WWW2004), New York, May 17-22, 2004. (EI)

10. Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan, "Mining Frequent Closed Structures in Streaming Melody Sequences," in Proc. of IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, 2004.(EI) 11. Hua-Fu Li and Suh-Yin Lee, "Single-Pass Algorithms for Mining Frequency

Change Patterns with Limited Space in Evolving Append-only and Dynamic Transaction Data Streams", in the Proc. of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE-04), Taipei, Taiwan, 2004.

12. Man-Kwan Shan and Hua-Fu Li, "Fast Discovery of Structure Navigation Patterns from Web User Traversals," in Proc. of SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, Orlando, Florida, USA, 2002. (EI)

123

13. Hua-Fu Li and Man-Kwan Shan, "PNP: Mining of Profile Navigational Patterns,"

in Proc. of SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, Orlando, Florida, USA, 2002. (EI)

14. Hua-Fu Li and Man-Kwan Shan, "Mining Non-Simple Traversal Paths from Web Access Logs," in Proc. of 2000 Workshop on Internet and Distributed Systems, Tainan, Taiwan, 2000.

124

Vita

Hua-Fu Li (李華富) was born on February 24, 1976 in Taoyuan, Taiwan, Republic of China. He received the BS degree in Computer Science and Engineering from Tatung Institute of Technology and the MS degree in Computer Science from National Chengchi University, in 1998 and 2000, respectively. He is currently working towards the Ph.D. degree in National Chiao-Tung University. He coauthored with his advisor Dr. Suh-Yin Lee for their works which received the 2004 ICS (International Computer Symposium) Best Paper Award.

His research interests include data mining, data stream management, multimedia information systems and bioinformatics.

在文檔中在串流資料中高效率頻繁樣式探勘演算法之研究 (頁 129-142)