• 沒有找到結果。

Prediction of User Navigation Patterns by Mining the Temporal Web Usage Evolution

N/A
N/A
Protected

Academic year: 2021

Share "Prediction of User Navigation Patterns by Mining the Temporal Web Usage Evolution"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

Soft Comput (2008) 12:157–163 DOI 10.1007/s00500-007-0190-y

F O C U S

Prediction of user navigation patterns by mining the temporal

web usage evolution

Vincent S. Tseng · Kawuu Weicheng Lin · Jeng-Chuan Chang

Published online: 23 May 2007 © Springer-Verlag 2007

Abstract Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user’s patterns is important in supporting intelli-gent Web applications like personalized services. Although numerous studies have been done on Web usage mining, few of them consider the temporal evolution characteristic in dis-covering web user’s patterns. In this paper, we propose a novel data mining algorithm named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navi-gation by considering the temporality property in Web usage evolution. Moreover, three kinds of new measures are pro-posed for evaluating the temporal evolution of navigation patterns under different time periods. Through experimen-tal evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction pre-cision, in particular when the web user’s navigating behavior changes significantly with temporal evolution.

Keywords Temporal patterns· Navigation patterns · Data mining· Personalized services

V. S. Tseng (

B

)· K. W. Lin · J.-C. Chang

Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, ROC

e-mail: [email protected] K. W. Lin e-mail: [email protected] J.-C. Chang e-mail: [email protected] 1 Introduction

Advances in the data mining technologies have enabled the intelligent Web abilities in various applications like page rec-ommendation, page prefetching and personalization naviga-tion by utilizing the hidden user behavior patterns discovered from the Web logs (Borges and Levene 1999;Nanopoulos

et al. 2003;Padmanabhan and Mogul 1996). The behavior

patterns contain a lot of useful information because the pat-terns directly reflect the Web site usage of users, and thus form the basis of intelligent Web development. However, discovering the patterns from the big amount of Web logs is challenging, and it is becoming an important research topic of data mining recently, namely Web Usage Mining.

For the research on Web mining, numerous studies have been done on discovering the users’ behavior patterns in var-ious aspects. In Tan and Kumar(2002), the authors apply the association rules to the discovery of associated page-views. An intuitive application, for example, is using the dis-covered associated pages to improve the Web site structure. For the linking characteristic of Web sites, several studies

(Tan et al. 2000) discussed the indirect association relation.

The sequential pattern (Agrawal and Srikant 1995), which reveals the sequential page-views of users, was a widely dis-cussed topic. Moreover, some studies focused on develop-ing the clusterdevelop-ing methods (Frias-Martinez and Karamcheti

2002;Wang and Zaiane 2002) to cluster the users with similar

behavior or cluster the Web pages.

Most of past studies assumed the Web usage patterns are invariant with time (Frias-Martinez and Karamcheti 2002;

Gündüz and Özsu 2003a,b; Pitkow and Pirolli 1999,

Srivastava et al. 2000;Su et al. 2000;Tan and Kumar 2002)

and few of them took into account the temporal characteris-tic or temporal evolution of Web usage. In fact, user’s Web usage patterns may change with time, i.e., a Web visitor may

123

(2)

162 V. S. Tseng et al. This is because the number of rules with high confidence in

“Weekdays” is more than in “All” case. In other words, the “Weekdays” model can capture the inherent property that is implicit when the temporality is ignored. The recall clarifies this feature as illustrated in Fig.5.

Although we have investigated the accuracy for all page-views, some page-views may lack the property of temporal-ity. For example, users may have temporal behavior when they visit a general homepage for variable products. How-ever, it may not be the case when users visit a detail page-view of particular products. Hence, we are interested in the accuracy of the page-views with prediction rule changes. Figure6shows the accuracy and it is observed that the accu-racy is more distinct than that for all page-views.

Finally, we simulate a special dataset with evident “Hours” property in order to test different kinds of temporal prop-erties. Figure7shows the accuracy on the simulated data-set under different data-settings of confidence. Although the sim-ulated dataset carries the “Hours” property, it is not clear whether the “Hours” model is a good model through Fig.7. This is because the page-views with “Hours” property take up only two percent of the total data in our simulated data. However, as shown in Fig.8, the proposed TN-gram model outperforms substantially traditional N-gram model in terms of accuracy if we consider only the temporal page-views. The experimental results show that the average value of predic-tion rule changes is 4 and 48% for “Weekdays” and “Hours”, respectively (the figures are not shown here due to space limitation). 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 confidence ll ac er

All Hours Weekdays

Fig. 5 The recall on the Clarknet log

0 0.1 0.2 0.3 0.4 0.5 NASA-Weekdays yc ar uc ca

All Temporal Model

NASA-Hours Clarknet-Weekdays Clarknet-Hours

Fig. 6 The accuracy of the temporal model and “All”

0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 confidence yc ar uc ca

all 3-interval weekdays

Fig. 7 The accuracy on simulated dataset by varying confidence

0 0.1 0.2 0.3 0.4 0.5 Sim.-Weekdays yc ar uc ca

All Temporal Model

Sim.-Hours

Fig. 8 The accuracy on simulated data for different models

6 Conclusions

Our work aims at exploring the temporality property for identifying the time period in which user’s navigation pat-terns change significantly so as to improve the prediction accuracy. This can provide useful insight for intelligent web-sites in strategy planning like personalized services and mar-keting promotion. In this paper, we have proposed a novel method named Temporal N-Gram (TN-Gram) for construct-ing prediction models of Web user navigation. After the pre-diction model is constructed, three kinds of new measures, namely Support-based Fundamental Rule Changes, Confi-dence-based Fundamental Rule Changes, and Changes of Prediction Rules are used to evaluate the temporal evolution of navigation patterns. For empirical evaluation, we adopted two real datasets and we also design a simulator to generate dataset that carries the temporal navigation characteristics of users. Through experimental evaluation on both of the real-life and simulated datasets, the proposed TN-Gram method is shown to outperform other existing approaches like N-gram modeling in terms of the prediction precision.

For the future work, we will apply the TN-Gram model on different kinds of web sites like popular auction sites so as to evaluate its performance and effectiveness in more details. Moreover, we will also consider the user group issue and integrate it with TN-Gram to discover more interesting pat-terns. Besides, since the discovered temporal evolution can be exploited in wide applications, we will apply the TN-Gram method on applications like personalized services, with the aim to enhance the richness and quality of applications in web systems.

Acknowledgments This research was supported by Ministry of

Eco-nomic Affairs, Taiwan, ROC, under grant no. 93-EC-17-A-02-51-024, and by National Science Council, Taiwan, ROC, under grant no. NSC 95-2422-H-006 -001.

References

Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceed-ings of the international conference on data engineering (ICDE), Taipei, Taiwan, March 1995

Borges J, Levene M (1999) Data mining of user navigation patterns. In: Proceedings of the workshop on web usage analysis and user profiling (WEBKDD’99), San Diego, CA, August 15, 1999, pp 31–36

(3)

Prediction of user navigation patterns by mining the temporal web usage evolution 163 Frias-Martinez E, Karamcheti V (2002) A prediction model for user

access sequences. In: Proceedings of the WEBKDD workshop: web mining for usage patterns and user profiles, ACM SIGKDD international conference on knowledge discovery and data mining, July 2002

Gündüz ¸S, Özsu MT (2003a) A user interest model for web page navi-gation. In: Proceedings of international workshop on data mining for actionable knowledge (DMAK), Seoul, Korea, April 2003, pp 46–57

Gündüz ¸S, Özsu MT (2003b) A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the ninth ACM international conference on knowledge discov-ery and data mining (KDD), Washington, DC, August 2003, pp 535–540

Li Y, Ning P, Wang XS, Jajodia S (2002) Discovering calendar-based temporal association rules. J Data Knowl Eng (DKE) 44(2): 193–218

Liu B, Hsu W, Ma Y (2001) Discovering the set of fundamental rule changes. In: Proceedings of the ACM SIGKDD international con-ference on knowledge discovery and data mining (KDD-2001), San Francisco, CA, August 20–23, 2001

Nanopoulos D, Katsaros, Manolopoulos Y (2003) A data mining algo-rithm for generalized web prefetching. IEEE Transactions on Knowledge and Data Engineering

Nicholson E, Zukerman I, Albrech DW (1998) A decision-theoretic approach for pre-sending information on the WWW. In: Proceed-ings of the fifth Pacific Rim international conference on artificial intelligence, 1998, pp 575–586

Padmanabhan V, Mogul J (1996) Using predictive prefetching to improve world wide web latency. ACM SIGCOMM Computer Comm Rev 26(3)

Palpanas T, Mendelzon A (1999) Web prefetching using partial match prediction. In: Proceedings of the fourth web caching workshop (WCW ’99), March 1999

Papoulis A (1991) Probability, random variables, and stochastic processes. McGraw Hill, New York

Pitkow J, Pirolli P (1999) Mining longest repeating subsequences to predict world wide web surfing. In: Proceedings of the USENIX symposium on Internet technologies and systems (USITS ’99), October 1999

Srivastava J, Cooley R, Deshpande M, Tan P (2000) Web usage min-ing: discovery and applications of usage patterns from web data. In: SIGKDD Explorations, ACM SIGKDD, January 2000 Su Z, Yang Q, Lu Y, Zhang H (2000) Whatnext: a prediction system for

web requests using n-gram sequence models. In: Proceedings of the first international conference on web information systems and engineering conference, Hong Kong, June 2000, pp 200–207 Tan P, Kumar V (2002) Mining association patterns in web usage data.

In: Proceedings of the international conference on advances in infrastructure for e-business, e-education, e-science, and e-medi-cine on the Internet

Tan P, Kumar V, Srivastava J (2000) Indirect association: mining higher order dependencies. In: Proceedings of the fourth European con-ference on principles and practice of knowledge discovery in databases, Lyon, France, pp 632–637

Wang W, Zaiane OR (2002) Clustering web sessions by sequence alignment. In: Proceedings of the third international workshop on management of information on the web in conjunction with 13th international conference on database and expert systems appli-cations DEXA’2002, Aix en Provence, France, September 2–6, pp 394–398

Yang Q, Li T, Wang K (2004) Building association rule based sequential classifiers for web document prediction. J Data Min Knowl Discov 8(3):253–273

Zukerman I, Albrecht DW, Nicholson AE (1999) Predicting user’s request on the WWW. In: Proceedings of the seventh international conference on user modeling, 1999

數據

Figure 6 shows the accuracy and it is observed that the accu- accu-racy is more distinct than that for all page-views.

參考文獻

相關文件

2 machine learning, data mining and statistics all need data. 3 data mining is just another name for

In developing LIBSVM, we found that many users have zero machine learning knowledge.. It is unbelievable that many asked what the difference between training and

Hofmann, “Collaborative filtering via Gaussian probabilistic latent semantic analysis”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and

• Information retrieval : Implementing and Evaluating Search Engines, by Stefan Büttcher, Charles L.A.

Since the FP-tree reduces the number of database scans and uses less memory to represent the necessary information, many frequent pattern mining algorithms are based on its

We try to explore category and association rules of customer questions by applying customer analysis and the combination of data mining and rough set theory.. We use customer

Furthermore, in order to achieve the best utilization of the budget of individual department/institute, this study also performs data mining on the book borrowing data

in Proceedings of the 20th International Conference on Very Large Data