• 沒有找到結果。

Mining Mobile Sequential Patterns in a Mobile Commerce Environment

N/A
N/A
Protected

Academic year: 2021

Share "Mining Mobile Sequential Patterns in a Mobile Commerce Environment"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

promise among the use of various knowledge to solve the mining on mobile sequential patterns is a challenging issue. We devise three algorithms (algorithm TJLS, algorithm TJPT, and algo-rithm TJPF) for determining the frequent sequential patterns, which are termed large sequential patterns in this paper, from the mobile transaction sequences. Algorithm TJLSis devised in light of the concept of association rules and is used as the basic scheme. Algorithm TJPTis devised by taking both the concepts of associa-tion rules and path traversal patterns into consideraassocia-tion and gains performance improvement by path trimming. Algorithm TJPF is devised by utilizing the pattern family technique which is de-veloped to exploit the relationship between moving and purchase behaviors, and thus is able to generate the large sequential pat-terns very efficiently. A simulation model for the mobile commerce environment is developed, and a synthetic workload is generated for performance studies. In mining mobile sequential patterns, it is shown by our experimental results that algorithm TJPF signifi-cantly outperforms others in both execution efficiency and memory saving, indicating the usefulness of the pattern family technique de-vised in this paper. It is shown by our results that by taking both moving and purchase patterns into consideration, one can have a better model for a mobile commerce system and is thus able to ex-ploit the intrinsic relationship between these two important factors for the efficient mining of mobile sequential patterns.

Index Terms—Data mining, mobile computing, mobile sequen-tial patterns, user behavior.

I. INTRODUCTION

T

HE EMERGENCE of powerful portable devices, along with advance in wireless communication technologies, has made the mobile services available. In the near future, it is expected that tens of millions of users will carry mobile phones or portable devices that use wireless connection to ac-cess a worldwide information network for business or personal use from anywhere at any time, making the mobile commerce

(MC) a reality [1], [57], [58]. For example, eNetwork Web

Express [19] enables mobile users to use commercial Web ap-plications over wide-area wireless networks (WANs). Bluetooth technology [20] allows terminals and cash registers to talk di-rectly to each other for the purpose of mobile commerce. The

Manuscript received May 2, 2003; revised July 9, 2004. The work was sup-ported in part by the National Science Council of Taiwan, R.O.C., under Contract NSC93-2752-E-002-006-PAE.

The authors are with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. (e-mail: mschen@cc.ee.ntu.edu.tw).

Digital Object Identifier 10.1109/TSMCC.2005.855504

Fig. 1. Illustrative example for a mobile transaction sequence where cells are underlined if items are purchased there.

Wireless Access Protocol (WAP) [21] brings the MC environ-ment a world-wide standard for providing Internet communi-cations to digital mobile phones. In an MC environment, cus-tomers can make any transaction from anywhere at any time with the payment mechanism provided by banks or credit card companies [58]. In addition, some kind of Nokia mobile phones provide the wallet application that enables customers to get easy access to mobile services and to make convenient online mo-bile transactions [2]. In the wallet, customers can store sensitive personal information, such as payment and loyalty card details, delivery addresses, and notes, as well as service profiles. In addi-tion, with the wallet applicaaddi-tion, the Nokia mobile phones have the capability of storing the transactions with moving patterns and purchasing patterns of customers.

Example 1.1: One example scenario envisioned for a mo-bile transaction sequence is shown in Fig. 1, where a customer

moves in the mobile commerce environment and makes trans-actions in the corresponding cell through the mobile device. Fig. 1(a) shows the moving patterns of this customer and the mobile transaction sequence data is recorded in Fig. 1(b), where for example, item i1was purchased when the customer moved

to the cell A.

It is important to note that since customers are moving along an MC environment to search for desired items to purchase, the implications from moving patterns and purchase patterns are in fact entangled, and both are of great importance for studying customer behaviors. Clearly, the distinctive features of knowl-edge discovery in an MC environment increase the difficulty of

(2)

extracting information from the mobile transaction sequences. However, as these mobile commerce services are becoming in-creasingly popular nowadays, it is imperative to devise efficient algorithms for deriving customer buying behavior to improve the quality of these services. As a result, the design and devel-opment of efficient mining algorithms for knowledge discovery in an MC environment while fully exploring the intrinsic rela-tionship between moving and purchase patterns is taken as the objective of this paper. Conducting the mining on the moving and purchase patterns of customers in an MC environment is called the mining of mobile sequential patterns (i.e., large

se-quential patterns) in this paper. In addition, a novel knowledge,

called mobile sequential rules, can be derived from the mobile sequential patterns for the measurement of customer purchase behavior association.

Example 1.2: For the example shown in Fig. 1, the customer

has one kind of moving pattern ABC and two kinds of purchas-ing patterns{A; t1 and C; t9} where itemset t1={i1} and

itemset t9={i2, i3}. If there are sufficient customers having

the same patterns, the mobile sequential pattern is an implica-tion of the form{A; t1, C; t9}: ABC, which means that

most customers usually purchase itemset t1 in cell A and then

purchase itemset t9in cell C with the specific path ABC. In

ad-dition, the mobile sequential rule is an implication of the form

{A; t1 =⇒ C; t9: ABC which means that customers

pur-chasing itemset t1in cell A are usually moving along path ABC

to cell C for purchasing itemset t9. With the mobile sequential

rule, when a customer purchases itemset t1in cell A, the cellular

phone company could send the coupons of products (i.e., item

i2and item i3) in itemset t9to boost the sales through the base

stations in the cells A, B, or C in accordance with their broad-casting schedules. More description about mobile commerce is available in [1].

The details of related works are given in Section II-A. De-spite some efforts having been elaborated upon examining the user behavior, none of the prior work, to the best of our knowl-edge, has taken both moving and purchase patterns together into consideration to model the customer behavior in a mobile com-merce environment. This can in part be explained by the fact that the cost is expensive to track and log detailed movements of mobile users today.1However, it is expected that such cost will

decrease soon and the cellular phone will become the popular interface of the interconnection networks for accessing various services [57], thus justifying the practicality and necessity of conducting mobile sequential pattern mining. It is understood that the records of cells visited and items purchased, required for mining mobile sequential patterns, may belong to different companies, and for these companies, they may have different considerations on using their data to improve the mobile com-merce services provided. It should go without saying that such data analysis should be done solely for the purposes of system and service improvements and should be conducted in a contin-gent way that neither any law is violated nor is the privacy of customers intruded. Nevertheless, with the legality and privacy 1The cost to locate a mobile user is estimated to be about US $0.01 to

US $0.05 each time according to a major mobile phone service provider.

Fig. 2. Notion of mining mobile sequential patterns.

issues considered, the knowledge discovery from the MC data is believed to be an increasingly challenging technical prob-lem which is of great practical importance for the evolving MC techniques.

Consequently, to better reflect the customer buying behav-ior in the MC environment, we propose an innovative mining model that takes both the moving patterns and purchase patterns into consideration. In essence, the mining of mobile sequential patterns aggregates the concepts on mining association rules, mining path traversal patterns, and mining sequential patterns, and thus requires a combined use of corresponding techniques. The notion of mining mobile sequential patterns is shown in Fig. 2, where the relationship among these mining capabili-ties is depicted. How to strike a compromise among the use of various knowledge to solve the mining on mobile sequential patterns is a challenging issue. As an effort to solve this prob-lem, we devise a procedure, namely mobile sequential patterns

MSPs), to conduct the mining of mobile sequential patterns.

With the details described in the Section II-C, the procedure MSP splits the problem of mining mobile sequential patterns into four phases, namely: 1) the large-transaction generation

phase; 2) the large-transaction transformation phase; 3) the sequential-pattern generation phase; and 4) the sequential-rule generation phase.

In this paper, the performance bottleneck is in phase 3), i.e., the sequential-pattern generation phase. By having different priorities on the factors involving large itemsets, traversal paths and orders of purchases, we devise three algorithms (algorithm TJLS, algorithm TJPT, and algorithm TJPF) to

determine mobile sequential patterns. First, algorithm TJLSis

devised in light of the concept of itemset joining in association rules mining [6]. However, as will be seen later, without fully utilizing the traversal paths of mobile sequential patterns, algo-rithm TJLStends to count the supports of a lot of out-of-path

sequential patterns (i.e., the sequential patterns which do not

stay within a path), thus degrading the performance. Next, to eliminate the out-of-path sequential patterns, algorithm TJPT is devised by taking both the concepts of association

(3)

bile sequential pattern mining. For better readability, we defer the detailed description of the pattern family technique and the corresponding theoretical properties to Section III. It will be shown that utilizing the information of pattern family, algo-rithm TJPFcompares the path with time complexity O(1) and

generates fewer uncertain candidate sequential patterns, thus re-ducing the corresponding computational overhead and memory consumption.

After all mobile sequential patterns are obtained, the mobile sequential rules can be derived with a straightforward way and it is described clearly in Section II-C4. A simulation model for the MC environment is developed and a synthetic workload is generated for performance studies. By utilizing pattern family technique, TJPF is shown to be able to determine large

se-quential patterns very efficiently. As validated by the synthetic workload, it is shown by our experimental results that algorithm TJPFsignificantly outperforms others in both the execution

ef-ficiency and the memory saving. It is shown by our results that by taking both moving patterns and purchase patterns into con-sideration, one can have a better model for an MC system and is thus able to exploit the intrinsic relationship between these two customer behaviors.

This paper is organized as follows. Preliminaries are given in Section II. In Section III, three algorithms (TJLS, TJPT, and

TJPF) are devised for determining large sequential patterns.

Experimental studies are conducted in Section IV. This paper concludes with Section V.

II. PRELIMINARIES

In this section, the problem of mining mobile sequential pat-terns is described in Section II-A, the related works are described in Section II-B, and the procedure of mining mobile sequential rules is outlined in Section II-C.

A. Problem Formulation

In the mobile commerce environment where items are sold in various cells, customers may move among the cells to pur-chase items of interest with either traditional or electronic com-merce trading mechanisms. In either case, customers pay for items through the mobile devices and the purchasing records are logged. Let N ={n1, n2, . . . , ng} be a set of cells in the MC

environment and I ={i1, i2, . . . , ih} be a set of items sold in

that environment. We are given a database of mobile transaction

sequences, where each mobile transaction sequence consists of sequence-id, cells visited, and a list of itemsets purchased in the corresponding cells, ordered by customer movements among cells.

A path is denoted by n1n2. . . ny, where nj ∈ N, for

1≤ j ≤ y. Thus, the sequence of cells visited implicitly forms the path of the mobile transaction sequence. In this pa-per, the discovered patterns and their notation are given in Table I. A transaction, denoted asC; {i1, i2, . . . , ip}, means

that itemset {i1, i2, . . . , ip} was bought in cell C, where

C∈ N, and {i1, i2, . . . , ip} ⊆ I. Thus, the list of itemsets

purchased in the corresponding cells implicitly forms the list of transactions of the mobile transaction sequence. As a re-sult, each mobile transaction sequence contains the informa-tion of path and a list of transacinforma-tions. Given a database DM

of mobile transaction sequences, the problem of mining mo-bile sequential patterns is to discover the frequent sequential patternss among all mobile transaction sequences. A sequential pattern is represented by the formlist of transactions: path, where the transactions are made along the path. The support for a sequential pattern is defined as the number of mobile trans-action sequences which support this sequential pattern. A large

sequential pattern is a sequential pattern with the minimum

support (i.e., a sequential pattern that appeared in a sufficient number of mobile transaction sequences).

The length of a large sequential pattern is the number of trans-actions in that large sequential pattern. A large sequential pattern of length k is called a large k-sequential pattern. Thus, a large 1-sequential pattern can be represented by the formtransaction: cell, where the transaction is made in the cell. Note that each transaction in a large k-sequential pattern must meet the mini-mum support. In [7], an itemset with minimini-mum support is called a large itemset or litemset. Similarly, we call a transaction with minimum support large transaction or L-transaction, which can be represented asC; tj , where tjrepresents a litemset in cell

C. Thus, if the transactionC; {i1, i2, . . . , ip} has the

mini-mum support, the litemset{i1, i2, . . . , ip} will be represented

as tj and the L-transactionC; {i1, i2, . . . , ip} will be

repre-sented asC; tj. Since each transaction in a large k -sequential

pattern will have the minimum support, a large k-sequential pattern can be represented as{x1, x2, . . . , xk}: n1n2. . . ny,

where xj is an L-transaction made along the path n1n2. . . ny.

Recall that in association rules [6], a large itemset is a frequently purchased itemset. In sequential patterns [7], a large

(4)

sequence is a frequently purchased set of itemsets ordered by the purchase time. In traversal patterns [12], a large reference is a frequently traveled path. In this paper, a large sequential pattern is a pattern containing: 1) the frequently purchased itemset (meaning that the itemset in an L-transaction must have the minimum support); 2) the frequently purchased set of itemsets ordered by the purchase time (meaning that the set of L-transactions in a large sequential pattern must have the minimum support); and 3) the frequently traveled path (meaning that the path in a large sequential pattern must have the minimum support).

B. Related Work

Recently, mining of databases has attracted a growing amount of attention in database communities due to its wide applica-bility to studying the buying behaviors of customers [11], [18]. Mining association rules is employed to discover the impor-tant associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction [5]. After that, several technologies on association rule mining have been developed including: 1) al-gorithm improvements [3], [6], [10], [15], [27], [32], [43], [68]; 2) constraint-based [25], [28], [46]; 3) incremental updating [9], [14], [30]; 4) multiple minimum supports [35], [60]; 5) fre-quent closed itemsets [45], [47], [67]; and 6) generalized [53], multilevel [23], intertransaction [56], quantitative [54], and mul-tidimensional [61], [62].

Mining sequential patterns was first introduced in [7] for finding the intertransaction patterns in the traditional retailing environments. After that, several technologies on sequential pat-tern mining have been developed, including: 1) algorithm im-provements [26], [39], [48], [66]; 2) constraint-based [36], [55], [65]; 3) incremental updating [33], [44], [69]; and 4) general-ized [55].

Several temporal association rule mining techniques are ad-dressed in [8], [13], [29], and [41]. Episode mining has been studied in [31] and [38] for discovering frequent patterns in a sequence of time events. Das et al. [17] investigated the prob-lem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. Mining series of interval events was discussed in [59] for discovering the temporal containment relationships of event sequences. A temporal logic approach is proposed in [42] for finding temporal patterns. Mining partial orders from the sequential data is explored in [37]. Mining segmentwise peri-odic patterns is discussed in [24]. Mining asynchronous periperi-odic patterns is investigated within a subsequence shifted by distur-bance [63]. Searching for partial periodic patterns in time-series databases is discussed in [22].

A study on efficient mining of path traversal patterns for capturing Web user behavior was conducted in [12]. WEB-MINER [16] was designed for mining Web usage association rules and sequential patterns. Several WWW server logs are analyzed in [50] for deriving the path distribution patterns of the Web users. With mining longest repeated subsequences, a robust method was proposed in [51] for reducing the complexity

Fig. 3. Flowchart of the whole procedure of mining mobile sequential patterns.

while preserving the predictability of the Web user surfing paths. Recently, several EC Web sites have been using recommended systems for analyzing the customer behaviors to help their cus-tomers to find products for possible purchases [4], [52]. For capturing the user behavior in the EC environment, a study on efficient mining Web transaction patterns was reported in [64].

Note that by treating the cells as other items in the patterns, one may extend algorithm GSP, which was designed for mining conventional sequential patterns in [55], to find mobile sequen-tial patterns. However, such an extension to algorithm GSP is not deemed ideal for mining mobile sequential patterns for two reasons. First, since our approaches process the cells and the items individually, and the modified GSP treats the cell as an-other item in the patterns, it is expected that the former will have a smaller domain to process, thereby having better efficiency, than the latter. More importantly, by simply treating the cell as another attribute, GSP is not able to utilize the intrinsic rela-tionship between moving and purchase behaviors of customers, thus not attaining the mining efficiency we could have owing to the nature of this problem.

C. Procedure for Mining Mobile Sequential Patterns

With the aggregate concept of mining on association rules, path traversal patterns and sequential patterns, the problem of mining mobile sequential patterns cannot be solved by a simple addition of prior techniques since factors in these companion mining capabilities are in fact entangled. This fact justifies the necessity of devising a new mining procedure for mobile se-quential patterns. As the mobile commerce business has been identified by several leading industrial companies as the key direction to move for years to come, it is believed that min-ing mobile sequential patterns has become a very timely and important issue to address.

The flowchart for the whole procedure is shown in Fig. 3 and the meanings of symbols are given in Fig. 4. In the overall procedure, the proposed methods for mining mobile sequential patterns is outlined as follows.

(5)

Fig. 4. Meanings of symbol used in mining mobile sequential patterns.

Procedure MSP (Mobile Sequential Patterns):

1) Large-Transaction Generation Phase: Determine the (L-transactions large transactions) from the mobile trans-action sequences.

2) Large-Transaction Transformation Phase: Employ algorithm Large-Transaction Transformation with Sequence-Trimming (LTTST) to transform all mobile

transaction sequences into the maximal L-transaction

sequences.

3) Sequential-Pattern Generation Phase: Employ one of the following three algorithms [TJLS(Transactionset Join

with Large-transaction set), TJPT (Transactionset Join

with Path Trimming), and TJPF(Transactionset Join with

Pattern Family)] to determine the large sequential patterns

from the maximal L-transaction sequences.

4) Sequential-Rule Generation Phase: Derive mobile

se-quential rules from the large sese-quential patterns.

1) Large-Transaction Generation Phase: For each cell, we

apply a modified algorithm DHP [43] for finding the set of all L-transactions TL. Similarly to the approach taken by [7],

the set of litemsets is mapped to a set of contiguous integers for reducing the time required to check if a mobile sequential pattern is contained in a mobile transaction sequence. Note that we are able to simultaneously discover the set of all large 1-sequential patterns, since this set is mainly{x: C|x ∈ TL, C

is the cell containing itemset x}.

Example 2.1: An illustrative database for this problem is

shown in Fig. 5, where Sequence IDentification (SID) 100 is the mobile transaction sequence shown in Fig. 1. For the example database in Fig. 5, the L-transactions are shown in Fig. 6(a). For the L-transactions shown in Fig. 6(a), after the mapping shown in Fig. 6(b), the set of large 1-sequential patterns is shown in Fig. 6(c).

2) Large-Transaction Transformation (LTT) Phase: As will

be seen in Section III, we need to repeatedly determine which part of a given set of large sequential patterns will appear in the mobile sequential patterns. For efficiently mining the patterns, we employ algorithm LTTST to transform each mobile

trans-Fig. 5. Illustrative example database DM that stores six mobile transaction

sequences.

Fig. 6. Mapping table shown in (b) maps the large transactions in (a) to the large 1-sequential patterns in (c).

action sequence into a maximal L-transaction sequence in this phase.

Example 2.2: With the mobile transaction sequence shown in

Fig. 1 and the mapping table shown in Fig. 6(b), Fig. 7 illustrates the operations in algorithm LTTST. In Fig. 7, the first column

corresponds to the sequence of movements, the second column contains the nodes visited and the third column has the items purchased in SID 100. The fourth column gives the on-going L-transaction in the buffer and the fifth column gives the on-going string in the buffer. The sixth column shows the L-transaction set and the seventh column shows the path of the maximal L-transaction sequence generated by LTTST.

Note that the same itemsets in different cells are viewed as different transactions. Thus, the same litemsets sold in different cells will be transformed to different integers.

Example 2.3: For example, transactions A; {ig, ih},

B; {ig, ih}, and C; {ig, ih} all have itemset {ig, ih}. In

ad-dition,{ig, ih} is a litemset in both cells A and B but is not in

cell C. After this phase,{ig, ih} in cell A and {ig, ih} in cell

B are transformed to different integers (say, tyand tz) whereas

(6)

Fig. 7. Example for producing the maximal large transaction sequences.

in Fig. 5, the transformed database DT, storing maximal

L-transaction sequences, is shown in the first table of Fig. 8 for illustrative purposes.

3) Sequential-Pattern Generation Phase: After all the

mobile transaction sequences are transformed to maximal L-transaction sequences, three algorithms (algorithm TJLS,

algorithm TJPT, and algorithm TJPF) are devised for

mining large sequential patterns from the transformed database DT. A large k-sequential pattern is represented as

{x1, x2, . . . , xk}: n1n2. . . ny, where xj is an L-transaction

made along the path{n1n2. . . ny}. The details of algorithms

of this phase will be described in Section III.

Example 2.4: The large sequential patterns, generated in the

sequential-pattern generation phase from the example database DT, are shown in Fig. 8. For example, {A; t1, C; t3,

F; t4}: ABCDEF is one large 3-sequential pattern, whose

L-transaction set and path appear in SID 100, SID 200, SID 400, and SID 500. The support is thus 4.

4) Sequential-Rule Generation Phase: After the

sequential-pattern generation phase, we can find the mobile sequential rules from the large sequential patterns in this phase in a straightfor-ward manner. Unlike the association rule [6], the mobile se-quential rule, derived from mobile sese-quential patterns in this paper, is an implication of the formX =⇒ Z:n1, n2, . . . , ny,

where X and Z are both sets of L-transactions, X∩ Z = Φ, and {n1, n2, . . . , ny} ⊆ N. The rule X =⇒ Z: n1n2. . . ny

has support s if the number of mobile transaction sequences in DM containing X ∪ Z: n1n2. . . ny is s. Also, the rule

X =⇒ Z: n1, n2, . . . , ny holds with confidence c if c% of

mobile transaction sequences in DM that contain X also

con-tain Z along the path {n1, n2, . . . , ny}. Explicitly, support

(X =⇒ Z: n1n2. . . ny) = support(X ∪ Z: n1n2. . . ny),

and confidence (X =⇒ Z: n1, n2, . . . , ny) = (X ∪ Z:

n1n2. . . nh. . . ny)/(X:n1n2. . . nh}).

Example 2.5: For example, suppose that {A; t1,

C; t3, F; t4}: ABCDEF is one large 3-sequential

pat-tern with support = 4 and {A; t1, C; t3}: ABC

is one large 2-sequential pattern with support = 5. Then, we can derive one mobile sequential rule {A; t1,

C; t3} =⇒ {F; t4}: ABCDEF with the support equal to

support({A; t1, C; t3} =⇒ {F; t4}: ABCDEF) =

support({A; t1, C; t3, F; t4}: ABCDEF)

= 4 and the confidence ({A; t1, C; t3} =⇒

{F; t4}: ABCDEF) = (support({A; t1, C; t3,

F; t4}:ABCDEF))/(support({A; t1, C; t3} : ABC)}

= 80% .

III. ALGORITHMS FORMININGMOBILE

SEQUENTIALPATTERNS

Once the database contains maximal L-transaction sequences for all mobile users, we can derive the large sequential patterns by identifying the frequently occurring transaction sequences. Let Skbe the set of large k-sequential patterns, Rkbe the set of

candidate k-L-transaction sets, and Ckrepresent the set of

can-didate k-sequential patterns. Rkis the transaction component of

Ck, and Sk is a subset of Ck. By having different priorities on

the factors involving large itemsets, traversal paths and orders of purchases, we devise three algorithms (algorithm TJLS,

(7)

Fig. 8. Large sequential patterns generated in sequential-pattern generation phase from the example database DT.

patterns. Because both algorithm TJPTand TJPFgenerate Sk

along with the generation of Ck+1, we use round k to refer to

the procedure performed to obtain (Sk, Ck+1). For algorithm

TJLS, we use round k to refer to the procedure performed

to obtain (Sk, Rk+1). Note that S1 is obtained in the

large-transaction generation phase, we thus use round one to refer to the procedure performed to obtain (R2). These algorithms are

devised step by step in light of the features of the candidate generation of sequential patterns and are outlined as follows.

Generalized Descriptions of Algorithms:

1) Algorithm TJLS: By deriving a straightforward extension

from prior works, algorithm TJLSis devised as a variant

of algorithm a priori in [6] by using a two-level hash tree in mining large sequential patterns.

(8)

2) Algorithm TJPT: In light of the concept of the path

trimming technique, algorithm TJPTis devised by taking

the path into consideration in generating the candidate patterns.

3) Algorithm TJPF: In light of the concept of the pattern

family technique, algorithm TJPFis devised by using the

shared-path tree in generating the candidate patterns.

A. Algorithm T JLS (Transactionset Join With

Large-Transaction Set)

Algorithm TJLSis a variant of algorithm a priori in [6].

Algo-rithm TJLSessentially utilizes the concept of joining itemsets in

association rule mining [6], [55] while solving the discrepancy between large sequential patterns and large itemsets. Similarly to algorithm a priori [6], TJLSjoins the L-transaction sets of

large (k− 1)-sequential patterns for the generation of candidate

k-L-transaction sets in the procedure to discover large sequential

patterns. However, unlike algorithm a priori, TJLS employs a

two-level hash tree, called the mobile sequence tree, to store the candidate sequential patterns. By utilizing the two-level hashing technique, TJLScan join the L-transaction sets to construct the

transaction component of the mobile sequence tree in the can-didate generation. Then, in the database scan for counting the support, TJLSconstructs the path component by extracting the

corresponding path from the maximal L-transaction sequences whose L-transaction sets contain the corresponding candidate L-transaction sets.

In the two-level hash tree, a node either contains a list of patterns (a leaf node) or a hash table (an internal node). In an internal node, each bucket of the hash table points to another node. The patterns are stored in the leaf nodes. The root of the hash tree is defined to be at depth 1. An internal node at depth d points to nodes at depth d + 1. When TJLS adds

a pattern p, TJLS starts from the root and go down the tree

until reaching a leaf. At an internal node at depth d in the transaction component, TJLS decides which branch to follow

by applying a hash function to the dth transaction of the L-transaction set of pattern p. Similarly, at an internal node at depth g in the path component, TJLSdecides which branch to

follow by applying a hash function to the gth cell of the path of pattern p.

In the beginning of hashing a maximal L-transaction sequence

m, TJLSfinds all the candidate sequential patterns contained

in m as follows. If TJLSreaches an internal node by hashing

the L-transaction l (cell c), it hashes on each L-transaction (cell) that comes after l (c) in m and recursively applies this procedure to the node in the corresponding bucket. If TJLSreaches a leaf

node, it finds which of the patterns in the leaf node are contained in m and adds support counts to them.

Example 3.1: Fig. 8 is the large sequential patterns

gener-ated in sequential-pattern generation phase from the example database DT, and Fig. 9 is the mobile sequence tree storing S4

in Fig. 8.

For mining mobile sequential patterns, the first round is ex-ecuted with large-transaction generation phase to obtain S1,

the set of large 1-sequential patterns, as shown in Fig. 6(c). In

addition, algorithm TJLSutilizes the L-transactions in S1as the

seed set for generating R2, the set of candidate 2-L-transaction

sets, which is stored in the transaction component of a mo-bile sequence tree. In the second round, TJLS constructs the

complete mobile sequence tree by hashing each combination of 2-L-transactions set in each maximal L-transaction sequence into the transaction component and hashing the corresponding path for constructing the path component, to count the support of each candidate sequential pattern. Then, TJLSdestructs the

mo-bile sequence tree for deriving S2, the set of large 2-sequential

patterns, and utilizes the L-transaction sets in S2for

generat-ing R3. In each subsequent round, TJLSstarts with candidate

L-transaction sets found in the previous round for the counting of supports of candidate sequential patterns and then identifies large sequential patterns. TJLS proceeds to the generation of

new candidate L-transaction sets and stores them to the mobile sequence tree. The procedure continues until no large sequential patterns are derived.

Example 3.2: To illustrate the operations of algorithm TJLS,

it can be seen from Fig. 9, {A; t1, C; t3, F; t4, G; t5}

is a candidate 4-L-transaction set generated by join-ing the L-transaction sets {A; t1, C; t3, F; t4} and

{A; t1, C; t3, G; t5}, respectively, from large

3-sequential patterns{A; t1, C; t3, F; t4}: ABCDEF and

{A; t1, C; t3, G; t5}: ABCDEFG. In scanning database

phase, algorithm TJLS constructs the path component of

the mobile sequential tree and counts supports of the can-didate sequential patterns. For example, after the transac-tion component of tree is constructed in the Fig. 9(a), TJLS

scans the database DT in Fig. 8 to obtain the path

compo-nent in Fig. 9(b) while also counting supports. In SID 100, when the support for candidate L-transaction set {A; t1,

C; t3, F; t4, G; t5} is being counted, the corresponding

pathABCDEFG will be generated in the path component of the mobile sequence tree to account for one support count of

{A; t1, C; t3, F; t4, G; t5}: ABCDEFG. Hence, the

fi-nal support of{A; t1, C; t3, F; t4, G; t5}: ABCDEFG

is 2, i.e., from SID 100 and SID 200. Explicitly, the cor-responding path is divided into several subpaths by identi-fying the cells of L-transactions. For the example shown in Fig. 10, algorithm TJLScounts the support of L-transaction set

{A; t1, C; t3, P; t7} in SID 100. TJLSfirst locatesA; t1

on position (1) andC; t3 on position (2) in the L-transaction

set, and the corresponding subpathABC is extracted from the subpath ABC shown in (3). Then, TJLS locates C; t3 on

position (2) and P ; t7 on position (4) in the L-transaction

set, and the corresponding subpath CDEFGLP is extracted from the subpath CDEFGHQGLP shown in (5). Note that the redundancy of HQG is eliminated because they cause a cycle between L-transactionC; t3 and L-transaction P; t7.

After scanning database for counting the support of candi-date sequential patterns, TJLS obtains large sequential

pat-terns in the procedure of destructing the mobile sequence tree. Each large sequential pattern is generated when its sup-port exceeds the minimum supsup-port. For example, one can de-struct the mobile sequence tree in Fig. 9 to determine S4 in

(9)

Fig. 9. Data structure of a mobile sequential tree for storing candidate 4-sequential patterns in algorithm TJLS.

Fig. 10. Procedure of counting support of L-transaction set

{A; t1, C; t3, P; t7}: ABCDEFGLP in SID 100.

B. AlgorithmT JP T (Transactionset Join With Path Trimming)

Without exploiting the paths of large sequential pat-terns, algorithm TJLS tends to count the supports of a

lot of out-of-path sequential patterns (i.e., the sequential patterns that do not stay within the path), thus degrading the performance. In light of the concept of path trimming, algorithm TJPT is designed by taking both the L-transaction

sets and paths of large sequential patterns into consideration to generate candidate sequential patterns. Explicitly, during the generation of large sequential patterns, by destructing the mo-bile sequence tree, TJPTnot only determines large sequential

patterns but also maintains a buffer that contains the leaf nodes in the transaction component and the corresponding paths in the path component so as to classify the patterns. The purpose of classifying the patterns is that the patterns, whose paths do not

contain each other, need not be considered to generate candidate sequential patterns together. Thus, TJPTcan trim the generation

of candidate sequential patterns according to the paths. This is referred to as the path trimming technique. As a result, TJPT

uti-lizes large sequential patterns to generate candidate sequential patterns in the candidate generation for solving the out-of-path sequential pattern problem in TJLSmentioned above.

Example 3.3: For the example shown in Fig. 8, in SID 300,

when the support for candidate 4-L-transaction set {A; t1,

C; t3, F; t4, G; t5} is being counted, the corresponding

path AWBCEFG will be generated in the path component of mobile sequence tree to account for one support count for {A; t1, C; t3, F; t4, G; t5}: AWBCEFG. Note

that{A; t1, C; t3, F; t4, G; t5}: AWBCEFG has four

subpatterns including{A; t1, C; t3, F; t4}: AWBCEF,

{A; t1, C; t3, G; t5}: AWBCEFG, {A; t1, F; t4,

G; t5}: AWBCEFG, and {C; t3, F; t4, G; t5}:

CEFG. However, all of them are not large 3-sequential patterns. Instead, they are out-of-path 3-sequential patterns in round 3 in the sense that not all of their subpatterns are large 2-sequential patterns. Explicitly, only{F; t4, G; t5}: FG

is a large 2-sequential pattern in this case. However, algorithm TJLSstill counts the supports of them in round 3. In algorithm

TJLS, out-of-path sequential patterns will be generated in each

round if the candidate L-transaction sets are contained in the L-transaction sets of maximal L-transaction sequences in DT.

(10)

Fig. 11. Example for describing the out-of-path sequential pattern problem in SID 300 caused by algorithm TJLS.

SID 300 are shown in Fig. 11. Such an out-of-path sequential pattern problem will happen in round k for k > 2. This in turn implies that one can trim the support counting of the redundant sequential patterns according to the paths traversed.

Recall that Skrepresents the set of large k-sequential patterns

and Ckis the set of candidate k-sequential patterns. In the

can-didate generation phase, TJPTconstructs both the transaction

and path components of mobile sequential tree for storing Ck. In

the candidate generation, TJPTjoins the L-transaction sets of

large (k− 1)-sequential patterns for the generation of candidate

k -L-transaction set and compares the paths of large (k−

1)-sequential patterns. If one path does not contain the other path, the generated candidate k-L-transaction set is trimmed. If one path p contains the other path q, TJPTgenerates a candidate

k-sequential patterns consisting of the candidate k-L-transaction set and p.

Example 3.4: Consider the example scenario shown in

Fig. 12. In algorithm TJPT, the candidate 5-sequential pattern

{A; t1, C; t3, F; t4, G; t5, Q; t6}: ABCDEFGHQ in

Fig. 12(a) is generated by joining L-transaction set

{A; t1, C; t3, F; t4, G; t5} in subpattern 1 and

L-transaction set{A; t1, C; t3, F; t4, Q; t6} in subpattern

2 with the path trimming technique to identify the fact that

path ABCDEFGHQ contains path ABCDEFG. Finally,

{A; t1, C; t3, F; t4, G; t5, Q; t6}: ABCDEFGHQ is

qualified as a candidate 5-sequential pattern after TJPT

identifies that the other subpatterns (i.e., subpatterns 3, 4, and5) are large 4-sequential patterns in Fig. 12(b).

By classifying the large k-sequential patterns, TJPT can

efficiently generate candidate k-sequential patterns. Partic-ularly, by classifying the patterns in Sk for k≥ 2, TJPT

will not generate any out-of-path (k + 1)-sequential pattern. This demonstrates the very advantage of the path trimming technique TJPTemploys.

Example 3.5: For the example shown in Fig. 9, TJPT

gen-erates the complete mobile sequence tree by hashing not only L-transaction sets but also paths in candidate generation so that the pathAWBCEFG will not be counted for the support. Note that such out-of-path sequential patterns as the one shown in Fig. 11 will not occur anymore, showing a significant performance improvement of TJPTover TJLS.

C. AlgorithmTJPF(Transactionset Join With Pattern Family)

Algorithm TJPFis similar to algorithm TJPTin that it

em-ploys the concept of utilizing large sequential patterns for gener-ating candidate sequential patterns to reduce the computational overhead caused by out-of-path sequential patterns but is dif-ferent from the latter in that algorithm TJPF by utilizing the

information in patterns and is able to reduce the number of uncertain candidate sequential patterns and store candidate se-quential patterns with a compact approach, thus further reducing the corresponding overhead. Recall that algorithm TJPTutilizes

path trimming technique for the generation of candidate sequen-tial patterns by comparing the paths of its subpatterns to identify if one path contains another.

Example 3.6: For the example in Fig. 12, algorithm TJPT generates the candidate 5-sequential

pattern in Fig. 12(a) by joining L-transaction set

{A; t1, C; t3, F; t4, G; t5} in subpattern 1 and

L-transaction set {A; t1, C; t3, F; t4, Q; t6} in

subpat-tern2 with the path trimming technique to identify that path

ABCDEFGHQ contains path ABCDEFG.

Comparing the paths of subpatterns incurs O(|P |) computa-tion, where |P | is the average path length of large sequential patterns. In addition, algorithm TJPT is required to store the

same paths as the branches in different subtrees of the trans-action component in the mobile sequence tree, which incurs an excessive use of memory. Note that even by treating the cells as other items in the patterns, modified algorithm GSP still needs to compare the whole cells in the path and incurs O(|P |) computation. Hence, algorithm TJPF surpasses algorithm

TJPT and the modified GSP in that with the pattern family

technique, TJPF is able to generate a more compact tree to

store the patterns to minimize the corresponding overhead.

1) Remarks of AlgorithmTJPF: Algorithm TJPFis devised

in light of the pattern family technique. To facilitate our descrip-tion of algorithm TJPF, some theoretical properties of pattern

family are devised below.

Definition 1: A maximal sequential pattern is a large

sequen-tial pattern that is not contained in any other large sequensequen-tial pattern. For each maximal sequential pattern, its pattern family consists of the pattern itself and all its subpatterns generated in each round.

Example 3.7: For the example shown in Fig. 8, one of the maximal sequential patterns is {A; t1,

C; t3, F; t4, G; t5, Q; t6}: ABCDEFGHQ which is also

a large 5-sequential pattern. The corresponding pattern family is shown in Fig. 13.

Definition 2: For a pattern family whose maximal sequential

pattern is skwhich consists of L-transactions{x1, x2, . . . , xk}

and path m1m2. . . mq, a maximal-path large 2-sequential

pattern (abbreviatedly as MS2),{x1, xk}: m1m2. . . mq, is

a large 2-sequential pattern which has the same path as the maximal sequential pattern of this pattern family.

Example 3.8: For each pattern family, it is noted that MS2is

the large 2-sequential subpattern with the maximal path. For the example shown in Fig. 13, patterns marked gray are the patterns with MS2={A; t1, Q; t6}: ABCDEFGHQ, which is the

(11)

Fig. 12. Candidate 5-sequential pattern shown in (a) is generated by identifying the existence of its five large 4-sequential subpatterns shown in (b).

Fig. 13. One pattern family example. Patterns marked gray are the patterns having the maximal-path large 2-sequential pattern.

large 2-sequential pattern with path length equal to 9, larger than those of other large 2-sequential patterns.

Definition 3: Suppose sk, k≥ 3, is the maximal

sequen-tial pattern of a pattern family, and sk consists of

L-transactions {x1, x2, . . . , xk} and path m1m2. . . mq. The

centro-subtransactionset of a pattern in this pattern family is {x2, . . . , xk−1}.

Example 3.9: For example, the large 4-sequential

pat-tern {A; t1, C; t3, F; t4, Q; t6}: ABCDEFGHQ

can be viewed as two parts, i.e., MS2=

{A; t1, Q; t6}: ABCDEFGHQ and the

centro-subtransactionset being {C; t3, F; t4}. Note that a large

k-sequential pattern is a pattern consisting of k L-transactions

and a path. For each large k-sequential pattern, all its k (k− 1)-sequential subpatterns are large. Explicitly, for a large

k-sequential pattern with L-transaction {x1, x2, . . . , xk}, the

L-transactions of its k (k− 1)-sequential subpatterns can be represented by {x2, x3, . . . , xk}, {x1, x3, . . . , xk}, . . ., and

{x1, x2, . . . , xk−1}. Then, we have the following remarks.

Remark 1: For a large k-sequential pattern pk, k≥ 3, there

exist at least k− 2 large (k − 1)-subpatterns whose paths are identical to that of pk.

Example 3.10: For the large 5-sequential pattern

{A; t1, C; t3, F; t4, G; t5, Q; t6}: ABCDEFGHQ

shown in Fig. 13(a), there exist 3 large 4-subpatterns,

{A; t1, C; t3, F; t4, Q; t6}: ABCDEFGHQ,

{A; t1, C; t3, G; t5, Q; t6}: ABCDEFGHQ, and

{A; t1, F; t4, G; t5, Q; t6}: ABCDEFGHQ, shown in

Fig. 13(b), whose paths are the same with the one of the large 5-sequential pattern.

Remark 2: Note that a maximal sequential pattern is also

a large sequential pattern. Thus, for a maximal sequen-tial pattern sk with L-transactions {x1, x2, . . . , xk} and path

m1m2. . . mq, there exist k − 2 large (k − 1)-sequential

sub-patterns which have identical maximal-path large 2 -sequential pattern{x1, xk}: m1m2. . . mq.

(12)

Fig. 14. Algorithm TJPFhashes the last L-transaction first so that the

centro-subtransactionset is identified and compares the individual integers stored in shared-path tree.

2) Algorithm TJPF Using Shared-Path Tree in

Can-didate Generation: Algorithm TJPF is able to generate

a maximal sequential pattern sk with L-transactions

{x1, x2, . . . , xk} and path m1m2. . . mq as a candidate

sequential pattern by joining the previous large sequential patterns{x1, (x2, x3, . . . , xk−3, xk−2), xk}: m1, m2, . . . , mq

and {x1, (x2, x3, . . . , xk−3, xk−1), xk} : m1, m2, . . . , mq

with the pattern family technique. Explicitly, TJPF

obtains: 1) {x2, x3, . . . , xk−3, xk−2, xk−1} by joining

the centro-subtransactionsets {x2, x3, . . . , xk−3, xk−2}

and {x2, x3, . . . , xk−3, xk−1} and 2) the new MS2=

{x1, xk}: m1, m2, . . . , mq by comparing the MS2’s in

the previous large sequential patterns. By hashing the last L-transaction first and storing an integer for each path, TJPF

constructs the mobile sequence tree with a form that for each candidate sequential patterns, two L-transactions of MS2

come first, a centro-subtransactionset is in the middle, and an integer, which is returned from shared-path tree for indexing the corresponding path, is in the leaf.

Example 3.11: For example, TJPF stores the

can-didate sequential patterns {A; t1, C; t3, F; t4,

Q; t6}: ABCDEFGHQ and {A; t1, C; t3, G; t5,

Q; t6}: ABCDEFGHQ in Fig. 12(b) into the mobile

se-quence tree as in Fig. 14(a). TJPFhashes the last L-transaction,

Q; t6, in the first position of the mobile sequence tree.

Note that pathABCDEFGHQ is represented by the integer

p3, derived from the mapping of the shared-path tree in

Fig. 14(b). Then, TJPFcan join the L-transaction sets of these

two large 4 -sequential patterns, i.e., pattern2 and 3 shown in Fig. 15(b), with a buffer to keep a block that contains the leaf nodes in the transaction component and the corresponding integers in the path component to classify the patterns for generating the candidate 5-sequential pattern in Fig. 12(a) efficiently.

From Remark 2, we know that TJPFjoins Skfor generating

Ck+1with the pattern family technique, and the paths of all large

Fig. 15. Algorithm TJPF joins the centro-subtransactionsets with integer

comparison.

sequential patterns to be joined are identical to one another in round k, k≥ 3. Thus, TJPFjoins the centro-subtransactionsets

with comparing individual integers for achieving performance improvement.

Example 3.12: For the example shown in Fig. 15, TJPFjoins

the centro-subtransactionsets in pattern 4 and 5 shown in Fig. 15(c) with comparing integers which are equal to each other (i.e., p3) to generate pattern2 in Fig. 15(b). Similarly,

TJPFjoins the centro-subtransactionsets in pattern2 and 3

shown in Fig. 15(b) by comparing integers to generate pattern

1 in Fig. 15(b).

The method for algorithm TJPF to reduce computational

overhead and memory consumption is as follows. In the first round, algorithm TJPFalso joins the L-transactions in S1 for

generating R2to be stored in the transaction component of a

mo-bile sequence tree. However, in the second round, TJPFhashes

each combination of 2-transactions set in each maximal L-transaction sequence into the L-transaction component by hashing the last L-transaction first. Then, TJPFhashes the

correspond-ing path into the shared-path tree which has an assigned integer in each leaf node for representing the path from the root node to the parent node of that leaf node. TJPFnext returns the

inte-ger for constructing the path component of the mobile sequence tree while keeping counting the support. After the candidate 2 -sequential patterns with the minimum support are identified as the large 2-sequential patterns, algorithm TJPFjoins the

L-transaction sets with the path trimming technique to generate candidate 3-sequential patterns. Note that the shared-path tree constructed in round two will be used for the mapping between paths and integers in the following rounds.

Example 3.13: For example, the shared-path tree shown in

Fig. 14(b) maps the paths of large 4-sequential patterns in Fig. 12(b) into integers{p1, p2, p3, p4, p5}.

In the third round, TJPFcounts the supports of candidate

3-sequential patterns by hashing into the mobile sequence tree the L-transaction sets and integers returned from the shared-path tree. In destructing the mobile sequence tree, TJPF not only

(13)

pares the individual integers to trim the generation of candidate sequential patterns according to the paths. In each subsequent round, TJPF constructs the mobile sequence tree by hashing

the last L-transaction first and utilizing the shared-path tree for mapping so that TJPFcompares the individual integers in

trimming the generation of candidate sequential patterns. Al-gorithm TJPFthus has O(1) execution time complexity in this

step, better than O(|P |) by algorithm TJPT, where|P | is the

average path length of large sequential patterns. In addition, un-like algorithm TJPT, algorithm TJPFutilizes MS2to filter out

some uncertain candidate sequential patterns before subpattern identification. For a candidate k-sequential patterns, it should have k large (k− 1)-sequential patterns. For an uncertain can-didate k-sequential patterns, it is generated by joining two large (k− 1)-sequential patterns. Thus, for proving that an uncer-tain candidate k-sequential patterns is qualified as a candidate

k-sequential patterns, there are k− 2 subpattern identifications

the need to be conducted.

Example 3.14: For example, taking the large 4 -sequential

patterns shown in Fig. 8, TJPTgenerates four uncertain

can-didate 5-sequential patterns shown in Fig. 16(a). However, by utilizing the pattern family technique, TJPFonly generates one

uncertain candidate 5-sequential pattern shown in Fig. 16(b). Thus, TJPT conducts 12 subpattern identifications and TJPF

conducts three subpattern identifications. This demonstrates the very advantage of the pattern family technique TJPFemploys.

IV. EXPERIMENTALRESULTS

To assess the performance of TJLS, TJPT, and TJPF, we

conducted several experiments to determine large sequential patterns. These experiments are performed on a computer with a 1-GHz Intel CPU and 512 MB of memory. The method used to generate synthetic data is described in Section IV-A. In Section IV-B, performance of TJLS, TJPT, and TJPFis comparatively

studied.

A. Generation of Synthetic Mobile Transaction Sequences

In the experiments, the moving scenario with transactions made in a mobile commerce environment is simulated. Since the mobile commerce service is a new application in the near future, we believe that the customers have the similar behaviors to those of them in the current data network when they first use this service. After this service is used by customers, the

behav-Fig. 18. Mesh network to simulate mobile commerce environment.

iors will then be changed according to their usage experiences. Currently, there is no real scenario that we can mimic. Thus, in this paper, the simulation model for generating synthetic mobile transaction sequences is in fact similar to that in the companion papers [43], [49]. Explicitly, the method for generating moving patterns is similar to that in [49] and the method for generating transactions is similar to that in [43].

Fig. 17 summarizes the meanings of various parameters used in the experiments. First, we construct an n× n mesh network [40] with a modification by taking the geographic boundary into consideration to limit the number of neighbors so as to mimic the mobile environment, where each node represents one cell [49]. The number of items in each cell is determined from a uniform distribution within a given range, denoted by nI. For each cell, the advancing probability Pa

of each neighbor is the probability for a customer to move to neighboring cells to purchase the items sold there. In essence, each directed edge from one cell A to another cell B is assigned with a weight, corresponding to the advancing probability of B for A. In the model, the advancing probability is obtained by the ratio of the number of items sold in each neighbor to those num-bers of other neighbors. For the 3× 3 mesh network example shown in Fig. 18(a), there are four neighborsN1, N2, N3, N4

for cell Y with the corresponding advancing probabili-ties Pa1, Pa2, Pa3, Pa4. In addition, iN1, iN2, iN3, iN4

are the numbers of items sold in cells N1, N2, N3, N4

and we have Pa1= (iN1)/(iN1+ iN2+ iN3+ iN4) and

Pa2 = (iN2)/(iN1+ iN2+ iN3+ iN4).

In the experiments,|D| is the number of mobile transaction sequences generated. When a customer moves among cells for

(14)

Fig. 19. (a) Execution time of algorithms TJLS, TJPT, and TJPFwhen minimum support varies and (b) execution time of algorithms TJLS, TJPT, and

TJPFwhen number of mobile transaction sequences varies.

shopping in the MC environment, the mobile transaction se-quence completed by this customer consists of a moving path and a set of transactions made in the corresponding cells. The starting position of each mobile sequential pattern can be either vistor location register (VLR) or home location register (HLR) and is randomly selected among these cells [34]. A moving path consists of cells moved by a user. The size of each moving path is determined from a Poisson distribution with mean equal to

|P |. When a customer moves to a cell, the probability that this

customer makes the transaction in this cell is denoted by Pb.

Note that the number of items in each cell is determined from a uniform distribution within a given range nI. For each cell,

once the number of items is determined, the items that could be purchased in each cell are fixed. The method for generating transaction data in each cell is similar to the one in the prior work [43]. In the mobile commerce environment, people tend to buy sets of items together, which are also called potential maximal frequent sets. The size of the maximal elements is clustered around a mean with a few long itemsets. A transaction may contain one or more of such frequent sets. The transaction size is also clustered around a mean, which is denoted|T |. The probability that a user will move from the current cell back to the cell from which he/she came, called the backward weight, is denoted by P0, which is equal to Pa× Pd, where Pdis a

damp-ing factor because of the backward movement. Without loss of generality, Pd is set to 0.8 in our experiments. The probability

of moving to each neighbor Pmis also determined by the

ad-vancing probability and the sum of the weights for all these cells is equal to 1− P0. For the mesh network shown in Fig. 18(a),

when one user visits cell Y from cell N1, the probabilities of the

neighbors that this user will move to are shown in Fig. 18(b).

B. Performance Comparison

In the following experiments, we construct an 8× 8 mesh network and set |D| = 200 K, s = 0.5%, nI = 200, Pb=

0.5, Pd= 0.8,|T | = 4, and |P | = 20.

1) Experiment One: When the Minimum Support Varies: In

this experiment, s varies from 1.5% to 0.25%. Fig. 19(a) shows that TJPTand TJPFin general, outperform TJLSfor various

minimum supports. With the path trimming and the pattern family techniques, both TJPT and TJPF can generate fewer

candidate sequential patterns than TJLS, which suffers a lot of

out-of-path sequential patterns in every round. As the minimum support decreases, the execution times of all the algorithms increase because of the increases in the total number of candidate and large sequential patterns.

2) Experiment Two: When the Number of Mobile Transaction Sequences Varies: In this experiment,|D| varies from 200 to

1000 K. Fig. 19(b) shows that the execution times of TJPTand

TJPFincrease linearly as the database size increases, indicating

the good scale-up feature of TJPTand TJPF.

3) Experiment Three: When Purchasing Probability Varies:

Note that algorithm TJLSsuffers the out-of-path sequential

pat-tern problem. To address this problem, we conduct this ex-periment with the purchase probability Pb varying from 0.5

to 0.3, and the result is shown in Fig. 20(a). For each al-gorithm, its execution time is taken as the base point when

Pb is 0.5, and Fig. 20(a) shows the execution time when Pb

varies. When the purchase probability decreases, the execution times of all the algorithms decrease because of the decreases in the total number of candidate and large sequential patterns. However, the path lengths of the out-of-path sequential pat-terns increase because the average number of cells visited per transaction increases. Note that although the total number of candidate and large sequential patterns decreases, the out-of-path sequential pattern problem causes algorithm TJLSto still

count the supports of nonlarge sequential patterns. As a re-sult, when Pb decreases, the decrease of the execution time

of TJLS is not as prominent as those of TJPTand TJPF. To

provide more insight into the performance comparisons of algo-rithms, it is shown in Fig. 21 that TJPTand TJPFoutperform

TJLSin different database sizes, which indicates that TJPTand

TJPF are robust in the sensitivity analysis of the purchasing

probability.

4) Experiment Four: When the Average Path Length Varies:

To examine the sensitivity of varying the average path length,|P | varies from 10 to 30. The result is shown in Fig. 20(b). For each algorithm, its execution time is taken as the base point when|P | is 10, and Fig. 20(b) shows the execution time when|P | varies. It can be seen that TJPFis less sensitive to the variation of path

length than TJPT. This agrees with the fact that TJPFhas O(1)

execution time for comparing the path in the candidate gener-ation stage, whereas the corresponding complexity of TJPTis

(15)

Fig. 20. (a) Execution time of algorithms TJLS, TJPT, and TJPFwhen purchasing probability varies and (b) execution time of algorithms TJPTand TJPF

when average path length of mobile transaction sequences varies.

Fig. 21. Execution time of algorithms TJLS, TJPT, and TJPFwhen purchasing probability varies in different database sizes.

O(|P |). In addition, it is also shown in Fig. 22 that TJPF

out-performs TJPTin the sensitivity analysis of the average path

length with different database sizes. To provide more insight into the candidate generation stage of TJPT and TJPF, it is

shown in Fig. 23 that the ratio (T JP T)/(T JP F) of execution

time which is incurred by comparing the path is almost equal to (O(|P |))/(O(1)).

5) Experiment Five: Performance Comparison Between

TJPT and TJPF in Each Round: To provide more insights

into the shared path tree feature exploited by pattern family technique, we set |D| = 200 000, s = 0.5%, nI = 200, Pb=

0.5, Pd= 0.8,|T | = 4, and |P | = 20 and compare the

perfor-mance of TJPT and TJPF in each round. Because S1 is

ob-tained in the large-transaction generation phase, we thus use round one to refer to the procedure performed to obtain (R2)

and use round two to refer to the procedure performed to obtain (C2, S2, C3). Note that TJPT and TJPF generate Sk along

with the generation of Ck+1, we use round k, k≥ 3 to refer

to the procedure performed to obtain (Sk, Ck+1). As shown in

Fig. 24(a), TJPFconsistently outperforms TJPTin all rounds,

except round one. This agrees with our intuition. Note that in round one, without any path information, both TJPTand TJPF

(16)

Fig. 22. Execution time of algorithms TJPTand TJPFwhen average path length of mobile transaction sequences varies in different database sizes.

Fig. 23. Ratio of execution time which is incurred by comparing the path.

join the L-transactions in S1for generating R2to be stored in the

transaction component of a mobile sequence tree. In round two, TJPTconstructs the path component of the mobile sequential

tree for storing C2. In the following rounds, when TJPT

stores the path information of Ck, k≥ 3, TJPT still needs

to construct the path component of the mobile sequential tree for storing Ck. However, by utilizing the pattern family

rela-tionship, TJPF can use the shared-path tree generated in C2

for indexing the path information of Ck, k≥ 3, in the

follow-ing rounds, leadfollow-ing to more efficient execution. In addition, to provide more insights into TJPFand TJPT, the numbers of

branches for storing the path information of Ck are shown in

Fig. 24(b). In TJPT, these branches are stored in the mobile

se-quential trees for all rounds. In TJPF, these branches are stored

Fig. 24. Performance comparison between TJPTand TJPFin each round.

(a) Execution time and (b) the number of paths stored.

in the shared-path tree generated in round two, and thus, the amount of memory savings is 25.8 MB.

V. CONCLUSION

In this paper, we explored a data mining capability which involves mining mobile sequential patterns for an MC envi-ronment. In essence, the mining of mobile sequential patterns aggregates the concepts of mining association rules (mining path traversal patterns and mining sequential patterns) and thus

數據

Fig. 1. Illustrative example for a mobile transaction sequence where cells are underlined if items are purchased there.
Fig. 2. Notion of mining mobile sequential patterns.
Fig. 3. Flowchart of the whole procedure of mining mobile sequential patterns.
Fig. 6. Mapping table shown in (b) maps the large transactions in (a) to the large 1-sequential patterns in (c).
+7

參考文獻

相關文件

[This function is named after the electrical engineer Oliver Heaviside (1850–1925) and can be used to describe an electric current that is switched on at time t = 0.] Its graph

 VR is an inherently an interactive medium; t herefore, the simple transference of content from sequential media makes

Then g is defined on [a, b], satifies (11), and is continuous on [a, b] by the Sequential Characterization of Limits.. Thus, f

Srikant, Fast Algorithms for Mining Association Rules in Large Database, Proceedings of the 20 th International Conference on Very Large Data Bases, 1994, 487-499. Swami,

In view of the large quantity of information that can be obtained on the Internet and from the social media, while teachers need to develop skills in selecting suitable

This paper presents (i) a review of item selection algorithms from Robbins–Monro to Fred Lord; (ii) the establishment of a large sample foundation for Fred Lord’s maximum

In an Ising spin glass with a large number of spins the number of lowest-energy configurations (ground states) grows exponentially with increasing number of spins.. It is in

The entire moduli space M can exist in the perturbative regime and its dimension (∼ M 4 ) can be very large if the flavor number M is large, in contrast with the moduli space found