Efficient mining and prediction of user behavior patterns in mobile web systems

(1)

Efficient mining and prediction of user behavior patterns

in mobile web systems

Vincent S. Tseng *, Kawuu W. Lin

Institute of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan, ROC Received 28 October 2005; accepted 2 December 2005

Available online 23 January 2006

Abstract

The development of wireless and web technologies has allowed the mobile users to request various kinds of services by mobile devices at anytime and anywhere. Helping the users obtain needed information effectively is an important issue in the mobile web systems. Discovery of user behavior can highly benefit the enhancements on system performance and quality of services. Obviously, the mobile user’s behavior patterns, in which the location and the service are inherently coexistent, become more complex than those of the traditional web systems. In this paper, we propose a novel data mining method, namely SMAP-Mine that can efficiently discover mobile users’ sequential movement patterns associated with requested services. Moreover, the corresponding prediction strategies are also proposed. Through empirical evaluation under various simulation conditions, SMAP-Mine is shown to deliver excellent performance in terms of accuracy, execution efficiency and scalability. Meanwhile, the proposed prediction strategies are also verified to be effective in measurements of precision, hit ratio and applicability. q2006 Elsevier B.V. All rights reserved.

Keywords: Location-based services; Location prediction; Mobility prediction; Mobile web system; Data mining

1. Introduction

Effectively modeling the behavior patterns of users in the mobile web systems benefits not only the users in smart access by caching or prefetching but also the mobile service (m-service) providers in financial profit like advertising. In the mobile web environments, the mobile users may request various kinds of services and applications by cellular phone, PDA, or notebook from arbitrary locations at any time via GSM, GPRS or wireless networks. Obviously, the behavior pattern, in which the location and the service are inherently coexistent, of mobile users becomes more complex than that of the traditional web systems. To help the user get desired information in a short time is one of the promising applications, especially in the mobile environments, where the users do not have much time to surf the web pages. For accelerating the resource access, several researchers studied the problem of caching and prefetching[5,16]to improve system performance

under the WWW environments, and for the mobile systems, the resource allocation problem were widely explored

[6,11,21,22,24,25]. However, the deficiency of existed studies is that they considered only one of the characteristics, i.e. location or service requested. Obviously, both movement and service requests should be considered simultaneously in order to discover complete information of user behavior patterns.

In the mobile web environments, each service request submitted by a user can be associated with the current location acquired by positioning techniques like GPS embedded on mobile devices. A sequence of requests of a user form a location-service stream and we name this kind of streams as user’s behavior patterns. This research aims at mining the behavior patterns such that suitable services can be predicted and recommended for users. Note that the behavior patterns discussed in this paper contain not only the traversal path but also the sequentially requested services. Consider the following scenario as an example. Suppose we discover from the server log that considerable number of people in New York exhibit the following similar behavior: If some person is at Manhattan area and query the theater information in Broadway, then he/she would move to Soho area and query about the information of nearby restaurants for dining. Therefore, if some person is currently in Manhattan area and just performing the action as ‘Browse the theater information in

www.elsevier.com/locate/infsof

* Corresponding author. Tel.: C886 6 2757575x62536; fax: C886 6 2747076.

(2)

Broadway’, we shall have strong confidence to recommend the restaurants information in Soho area to the people via the mobile device. In this way, intelligent location-based services can be achieved.

In this paper, we propose a novel data mining algorithm for discovering user’s behavior patterns in mobile web systems. Moreover, some corresponding prediction strategies are also proposed for predicting user’s behavior in terms of location associated with services. The main contributions of this paper are as follows:

1. A novel data mining mechanism is proposed for efficiently discovering the user’s behavioral patterns, namely sequen-tial mobile access pattern (SMAP), which is composed of the user’s sequential movement associated with requested services. Over the past few years, a considerable number of studies have been made on using data mining techniques to discover interesting rules/patterns from WWW

[4,7,8,20,27] or large databases [1,2,9]. Although some recent studies explored the issue of analyzing the mobility data, most of them focused on only the aspect of movement behavior analysis[12,13,28,31], and some other researchers studied the problem of location tracking[3,14]or resource allocation. However, the existing studies did not consider both the patterns of movement and service request at the same time. Tseng et al.[29,30]first studied the problem of mining associated service patterns in mobile web environ-ments. In this paper, we propose a novel data mining algorithm named SMAP-Mine that can efficiently discover sequential mobile access patterns that contain both move-ment and service requests simultaneously. To our best knowledge, this is the first work on this research issue. 2. An effective prediction mechanism is proposed by

modeling the user’s behavior via the Markov model

[19,23]. The prediction mechanism proposed is to provide appropriate recommendation for users in terms of service, location, and location with service. In this work, we first generate three categories of rules, which can be used to predict the next location, next requested service, and next location and associated service, respectively. Afterwards, the generated rules will be stored by category in a rule repository for online recommendation according to the historical behavior of the user. There existed some variations of Markov models, such as Dependency Graph (DG) [17], Prediction-by-Partial-Match (PPM) [18], and N-gram model[26]. In this paper, we extend the N-gram

[26]model to support these three categories of predictions. 3. The effectiveness of proposed data mining algorithm and prediction strategies is validated through experimental evaluation. The proposed data mining algorithm named SMAP-Mine was evaluated under various experimental conditions and it is shown to deliver excellent performance in terms of execution efficiency, scalability and accuracy. The precision, applicability and hit ratio of proposed prediction strategies were also studied in details to show its effectiveness.

The rest of this paper is structured as follows. In Section 2, we describe the system architecture in details. In Section 3, the proposed data mining method and the underlying data structures are introduced. The prediction strategies are proposed in Section 4. The empirical evaluation results for performance study are given in Section 5. Section 6 gives an overview of related research work. The conclusions and future directions are given in Section 7.

2. System architecture

Fig. 1shows the system architecture for the proposed data mining and prediction mechanisms. The workflow of the system consists of three phases: Data Integration Phase, Mining Phase and Prediction Phase. Due to the distributed property of mobile web systems, the logs for users’ movement, and those for users’ service requests may be stored in different databases. Hence, the Data Integration Phase is to collect and integrate these logs into one dataset for efficient access. For the collection work, these logs can be obtained from distributed sources like Home Location Register (HLR) and Visiting Location Register (VLR). The HLR stores the permanent subscriber information in a mobile network, while the VLR maintains temporary user information like current location to manage requests from subscribers who are out of the home area. For the integration work, the attributes related to user’s service requests will be extracted from these dispersed log files and joined to form an integrated log file by using the user’s identifier as the key. For current mobile systems, the integrated log file may include important attributes like user ID, service request time, service ID, caller location, duration, etc.

For the Mining Phase, a novel data mining method is deployed to discover the frequent Sequential Mobile Access Patterns (SMAP, described in details in Section 3) from the integrated log dataset. For the Prediction Phase, whenever a user submits a service request at some place, the user’s information including current location, currently requested service and the recent behavior (sequential movement associated with requested services) are input to the prediction component. The prediction component then retrieves the

(3)

matched rules from the repository of sequential mobile access rules (generated from the mined SMAPs and will be described in details in Section 4) according to the user’s current behavior. Note that the prediction mechanism is to provide appropriate recommendation for users in terms of service, location, and location with service. Subsequently, the best recommended results are returned to the service agent, which in turn incorporates the recommendation results into the requested service page as a new rendered page for sending to the user. In this way, the intelligent mobile web services can be provided by considering both of movement and service requests patterns of users.

3. Mining of sequential mobile access patterns

In this section, we first give the formal definition for this mining problem and then propose a novel algorithm named SMAP-Mine that can discover the sequential mobile access patterns efficiently.

3.1. Problem definition

Consider two sets L and S, namely location set and service set, respectively. For each element l in L and each element s in S, we form an ordered pair pZ(l, s), where l and s are taken as the first and second element of p, respectively. Two ordered pairs (l1, s1) and (l2, s2) are said to be equivalent if and only if

l1Zl2and s1Zs2. Let P be the set of all ordered pairs and we

write it as

P Z L !S Z fðl;sÞjl 2L and s 2Sg:

Let T Z!ðp1;t1Þðp2;t2Þ.ðpm;tmÞO, where element (pi, ti) is

composed of an ordered pair p and a time point t. We say T is a mobile access pattern with length m, namely m-pattern. Meanwhile, (pi, ti) is defined as earlier than (pj, tj) if and

only if ti!tj, and it is written as (pi, ti)!(pj, tj) or simply pi!pj.

Note that the value of each time point t is unique in an access pattern. i.e. tiwill never be equal to tj. The ascending order of

elements of access pattern is sorted by using t as the key. Considering only the ordered pairs piin T, we define the pattern

TsZ!ðp1Þðp2Þ.ðpmÞO as a sequential mobile access pattern

(SMAP).

Definition 1. An access pattern Ts0Z!ðp10Þðp02Þ.ðpm0ÞO is a

sub-pattern of another access pattern TsZ!ðp1Þðp2Þ.ðpnÞO,

written as Ts03Ts, if m%n and there exists a strictly increasing

sequence of indices, namely i1, i2,.,im, such that p0jZ pij for

all jZ1,2,.,m. Here, Tsis called the super-pattern of Ts0.

Definition 2. Given a database DZ fTs1;Ts2;.;TsNg that

contains N access patterns. The support of pattern Tsis defined

as

supðTsÞ Z

jfTsijTs3Tsiand 1% i% Ngj

N :

Definition 3. An access pattern Ts is called a frequent

sequential mobile access pattern (F-SMAP) if sup(Ts) is

greater than the user-specified support threshold d.

With the above definitions, the problem of sequential mobile access patterns mining is defined as follows. Given a database D containing the logs of the mobile users’ access patterns and a user-specified support threshold d, the problem is to discover all the F-SMAPs existed in the database. After-wards, the discovered F-SMAPs can then be the basis of behavior-aware recommendations. Furthermore, the service providers can also do resource prefetching or precaching more efficiently and effectively according to the F-SMAPs. We will also discuss how to generate the rules from F-SMAPs in Section 4.

3.2. Proposed data mining method: SMAP-Mine

In this section, we describe the data mining method we propose, namely SMAP-Mine. The input to the SMAP-Mine algorithm is the log of mobile access patterns, which is obtained by integrating both of movement log and service request log. For the SMAP-Mine algorithm, two phases are included, namely (i) construction of SMAP-Tree, and (ii) mining of sequential mobile access patterns. In the following, we describe the SMAP-Mine algorithm in details.

3.2.1. Construction of SMAP-Tree

The purpose of constructing SMAP-Tree is to aggregate the access patterns into the memory in a compact form so that the mining of frequent patterns can be done efficiently. The main merits of SMAP-Tree are (1) only one physical database scan is needed to mine all of the frequent patterns, and (2) the SMAP-Tree is compact so that it can be loaded into memory for efficient processing.

Before giving the detailed algorithm, we illustrate how the SMAP-Tree is constructed by an example. Given the access patterns as listed in Table 1, we construct the SMAP-Tree as shown in Fig. 2. Take the first access pattern as example, the movement sequence ha, b, c, di would first be extracted from h(a, 1) (b, 2) (c, 5) (d, 8)i and be inserted into the SMAP-Tree. The count of each node’s label traversed by the sequence will be increased by 1. At the tail node of this sequence in SMAP-Tree, we construct a Service Request Tree (SR-Tree) on it. The SR-Tree records the different sequences of requested services under the same movement behavior. In this paper, the movement behavior is called the movement sequence of this SR-Tree. Therefore, each tail node in each sequence will have an SR-Tree on itself. Then, we extract the requested service

Table 1

An integrated mobility log

User ID Access pattern

1 h(a,1)(b,2)(c,5)(d,8)i 2 h(a,1)(b,3)(c,5)(d,8)i 3 h(a,3)(b,2)(d,7)i 4 h(c,6)(b,2)(d,7)i 5 h(c,8)(b,1)i 6 h(a,3)(b,6)(c,8)(d,7)i

(4)

sequence h1, 2, 5, 8i from h(a, 1) (b, 2) (c, 5) (d, 8)i and insert it into the corresponding SR-Tree, as shown inFig. 2.

In a SMAP-Tree structure, a node table is maintained to record the first-occurrence address and total count for distinct labels. Each node of the SMAP-Tree (denoted as ST-node) is of the following structure:

ST node :Z flabel; parent link; next link; children tableg

The parent-link and next-link are two pointers linking to the parent node and next node for the corresponding label. All of the children nodes for the node are tabulated in the children table.

For each SR-Tree, we keep a head table to record the level structure, where the nth element in the table links to the first node on level n in the SR-Tree. The height of a node in a tree is denoted as level or height. In the paper, we use the term level to represent the height of a node in the tree. Each node in a SR-Tree has a link that points to the next peer node. The node structure in SR-Tree is as shown below:

SR node :Z flabel; parent link; peer link; children tableg Therefore, given the access patterns in Table 1, four SR-Trees will be constructed with the detailed structure as illustrated inFig. 2.Fig. 3shows the algorithm for constructing SMAP-Tree.

Some properties of SMAP-Tree are also given in the following:

Definition 4. Two nodes A and B are in peer relation if these two nodes are on the same level of the SR-Tree.

Definition 5. Given a node A in a SMAP-Tree and a node B in a SR-Tree, called bst, we say that A and B are cross-peer nodes if the level of A is equal to the level of B and the bst can be reached through any path containing node A.

3.2.2. SMAP-Mine algorithm

Fig. 4shows the detailed algorithm for SMAP-Mine, which is based on the depth-first search (DFS) approach. It recursively constructs the SMAP-Trees and mines the trees until termination condition is met. First, we list all the labels with count greater than the support threshold d by scanning the node table of current SMAP-Tree. The labels are stored in a temporary set M_L1. If M_L1 is empty, the prefix pattern of current SMAP-Tree is output as return. The output prefix pattern will be one of the F-SMAPs. Otherwise, for each label l in M_L1, all the nodes with label name as l are stored into a temporary set l_tmp. For each node N in l_tmp, we sum the

Fig. 2. The SMAP-Tree constructed for the log data inTable 1.

(5)

user’s behavior patterns in terms of the location, services and location associated with services. The proposed data mining method, namely SMAP-Mine, can efficiently discover the patterns of sequential movement associated with service requests for mobile users since one physical scan on the database is needed. Our study on integrated analysis of both mobility and service requests complement the insufficiency of the past studies that focused only on the aspect of mobility analysis. To our best knowledge, this is the first work on mining the patterns of sequential movement associated with service requests. Through empirical evaluation and sensitivity analysis under various system conditions, SMAP-Mine is shown to perform excellently in terms of execution efficiency, accuracy and scalability.

As to the prediction strategies, we generate three categories of rules, which can be used to predict the next location, next requested service, and next L&S of mobile users, respectively. The generated rules are stored in a rule repository and the prediction function is achieved by using the proposed SMAR-N-gram algorithm, which is based on the N-gram model. In the experiments, we measured the precision, applicability and hit ratio in order to provide the system administrator or service providers with more information to improve the system performance or service quality. We also demonstrated that ranking by strength instead of confidence delivers higher hit ratio when the TOP-N constraint is specified.

For the future work, we will apply SMAP-Mine on more real datasets and evaluate its performance under different system conditions. Moreover, we will also consider the temporal issue and integrate it with SMAP-Mine to discover more interesting patterns. Besides, since the discovered F-SMAPs can be exploited in wide applications, we will apply the SMAP-Mine method on applications like data allocation, data replication and location-based services, with the aim to enhance the richness and quality of new applications in mobile web systems.

Acknowledgements

This research was partially supported by National Science Council, Taiwan, ROC, under grant number NSC 93-2213-E006-030.

References

[1] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the 20th International Conference Very Large Data Bases, 1994, pp. 487–499.

[2] R. Agrawal, R. Srikant, Mining sequential patterns, in: Proceedings of the 11th International Conference Data Engineering, 1995, pp. 3–14. [3] F. Akyildiz, J. Menair, J.S.M. Ho, H. Uzunalioglu, W. Wang, Mobility

management in next-generation wireless system, Proc. IEEE 87(8) (1999) 1347–1384.

[4] J. Borges, M. Levene, Data mining of user navigation patterns, in: Proceedings of the Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), 1999, pp. 31–36.

[5] C.Y. Chang, M.S. Chen, Integrating web caching and web prefetching in client-side proxies, in: Proceedings of the ACM 11th International Conference Information and Knowledge Management, 2002.

[6] J.L. Chen, Resource allocation for cellular data services using multiagent schemes, IEEE Trans. Syst. Man Cybern. 31 (6) (2001) 864–869. [7] M.S. Chen, J.S. Park, P.S. Yu, Efficient data mining for path traversal

patterns, IEEE Trans. Knowl. Data Eng. 2 (1998) 209–221.

[8] Z. Chen, A.W.C. Fu, Linear time algorithms for finding maximal forward references, in: Proceedings of the 2003 IEEE International Conference on Information Technology: Coding and Computing, 2003.

[9] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the ACM International Conference Management of Data, 2000.

[10] G. Hooker, M. Finkelman, Sequential analysis for learning modes of browsing, in: Proceedings of the WebKDD 2004: KDD Workshop on Web Mining and Web usage Analysis, 2004.

[11] J.L. Huang, M.S. Chen, W.C. Peng, Exploring group mobility for replica data allocation in a mobile environment, in: Proceedings of the ACM International Conference Information and Knowledge Management, 2003, pp. 161–168.

[12] M. Kyriakakos, S. Hadjiefthymiades, N. Frangiadakis, L.F. Merakos, Multi-user driven path prediction algorithm for mobile computing, in: Proceedings of the 14th International Workshop on Database and Expert Systems Applications, 2003, pp. 191–195.

[13] J.T. Lee, Y.T. Wang, Efficient data mining for calling path patterns in GSM networks, Inf. Syst. 28 (8) (2003) 929–948.

[14] G.H. Li, K.Y. Lam, T.W. Kuo, Location update generation in cellular mobile computing systems, in: Proceedings of the International Work-shop on Parallel and Distributed Real-time Systems, 2001.

[15] K.W. Lin, S.M. Tseng, A data generator for mobile web environments, Technical Report. CSIE Department, National Cheng Kung University, Taiwan, 2002.

[16] Nanopoulos, D. Katsaros, Y. Manolopoulos, Exploiting web log mining for web cache enhancement, in: Proceedings of the WebKDD 2001: KDD Workshop on Web Mining and Web Usage Analysis, 2001, pp. 68–87.

[17] V. Padmanabhan, J. Mogul, Using predictive prefetching to improve world wide web latency, ACM SIGCOMM Comput. Commun. Rev. 26 (3) (1996).

[18] T. Palpanas, A. Mendelzon, Web prefetching using partial match prediction, in: Proceedings of the Fourth Web Caching Workshop, 1999. [19] Papoulis, Probability, Random Variables, and Stochastic Processes,

McGraw-Hill, New York, 1991.

[20] J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu, Mining access patterns efficiently from web logs, in: Proceedings of the Fourth Pacific Asia Conference Knowledge Discovery and Data Mining, 2000, pp. 396– 407.

[21] W.C. Peng, M.S. Chen, Mining user moving patterns for personal data allocation in a mobile computing system, in: Proceedings of the International Conference Parallel Processing, 2000, pp. 573–580. [22] W.C. Peng, M.S. Chen, Allocation of shared data based on mobile user

movement, in: Proceedings of the Third International Conference Mobile Data Management, 2002, pp. 105–112.

[23] J. Pitkow, P. Pirolli, Mining longest repeating subsequences to predict world wide web surfing, in: Proceedings of the USENIX Symposium Internet Technologies and Systems, 1999.

[24] Pramudiono, T. Shintani, K. Takahashi, M. Kitsuregawa, User behavior analysis of location aware search engine, in: Proceedings Third International Conference Mobile Data Management, 2002, pp. 139–145. [25] V. Saygin, O. Ulusoy, Exploiting data mining techniques for broadcasting data in mobile computing environments, IEEE Trans. Knowl. Data Eng. 14 (6) (2002) 1387–1399.

[26] Z. Su, Q. Yang, Y. Lu, H. Zhang, WhatNext: a prediction system for web requests using N-gram sequence models, in: Proceedings of the First International Conference Web Information Systems and Engineering, 2000, pp. 200–207.

(6)

[27] P.N. Tan, V. Kumar, Mining indirect associations in web data, in: Proceedings of the WebKDD 2001: KDD Workshop on Web Mining and Web Usage Analysis, 2001, pp. 145–166.

[28] S. Tseng, W.C. Chan, Mining complete user moving paths in a mobile environment, in: Proceedings of the International Workshop on Databases and Software Engineering (held with ICS), 2002.

[29] S. Tseng, C.F. Tsui, An efficient method for mining associated service patterns in mobile web environments, in: Proceedings of the ACM Symposium on Applied Computing, 2003, pp. 455–459.

[30] S. Tseng, C.F. Tsui, Mining multi-level and location-aware associated service patterns in mobile web environments, IEEE Trans. Syst. Man Cybern. 34 (6) (2004).

[31] Y. Wang, E.P. Lim, S.Y. Hwang, Efficient group pattern mining using data summarization, in: Proceedings of the Database Systems for Advances applications, 2004, pp. 895–907.

[32] Q. Yang, T. Li, K. Wang, Building association rule based sequential classifiers for web document prediction, J Data Min. Knowl. Discovery 8 (3) (2004) 253–273.