• 沒有找到結果。

A Unified Intentional-Based Web Transaction Mining

N/A
N/A
Protected

Academic year: 2021

Share "A Unified Intentional-Based Web Transaction Mining"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

Chiu, W.-Y., Tao, Y.-H., Hong, T. and Lin, W.-Y., A Unified Intentional-Based Web Transaction Mining, Multi-Conference of Systemics, Cybernetics and Informatics, Orlando, Florida, July 18-21, 2004.

A Unified Intentional-Based

Web Transaction Mining

Wen-Yuan CHIU

I-Shou U., 1, Section 1, Hsiueh-Chen Rd., Ta-Hsu Hsiang, Kaohsiung County Yu-Hui TAO

National Pingtung U. of Technology and Science, 1, Hsueh-Fu Rd., Pingtung County Tzung-Pei HONG

National U. of Kaohsiung, No. 700, Kaohsiung University Rd., Nan-Tzu Dist., Kaohsiung Wen-Yang LIN

I-Shou U., 1, Section 1, Hsiueh-Chen Rd., Ta-Hsu Hsiang, Kaohsiung County Taiwan, R.O.C.

Abstract

Intentional browsing data (IBD) is a new data ingredient proposed by Tao et. al. (2002) to improve Web usage mining that commonly used Web log files as the primary data source. As an illustration, Yun and Chen’s (2000) Web transaction mining (WTM) algorithm was used to demonstrate how IBD enhanced WTM on pages with item purchase by IWTMp (Intentional-based WTM with purchase) and complemented WTM on pages without item purchase by IWTMnp. (IWTM with no purchase). Although IWTMp and IWTMnp satisfactorily illustrated the benefits of IBD on the WTM algorithm, there are potential technical improvements remained to be addressed. The most obvious issue is why to separate the source data into purchased-item and not-purchased-item segments for being respectively processed by the two IWTM algorithms? Accordingly, we propose a unified IWTMualgorithm that processes all pages with and without item purchase simultaneously, and then discuss comparative implications of IWTMpvs. IWTMnpand IWTMu.

1. Introduction

Web mining is defined as applying data mining techniques to Web data [2], which is further classified into Web usage mining (WUM), Web content mining and Web structure mining[1]. WUM adopts mainly the Web log records that typically include host name or IP address, remote user name, login name, date stamp, retrieval method, HTTP completion code, and number of bytes in a file retrieved. The log file content has shown to be valuable, but it does not include all user interactions that WUM can utilize. As a result, intentional browsing data (IBD)[3], such as scroll-bar, select, or save-as user interactions, is formaly defined as a new data ingredient to be used in WUM. The benefits of IBD was illustrated through the Web transaction mining (WTM) algorithm [5] that explored the relationship of traveling and purchasing behaviour. One intentiona-based WTM (IWTM) algorithm

(2)

focusing on the purchased items (IWTMp) and another one on the not-purchased items (IWTMnp) were used to demonstrate the potential benefits of enhancement and complement, respectively. These modified algorithms met their intended purpose satisfactorily, but technically why to separate the data sources into purchased and not-purchased segments for two algorithms? It would be more practical to have one unified IWTM algorithm to process the whole data set at one time. Consequently, this paper addresses this technical issue by proposing a unified IWTMu and discusses the comparative implications between IWTMp, IWTMnpand IWTMu.

2. Preliminaries

The original WTM algorithm assumed one merchandise item on one Web page, which was represented as B{i1} meaning Web page B with item i1. The purpose of IWTMpis to show that a user’s interest level on an item can be represented as the occurence of a certain IBD, which can then enhance the predicting power of the origianl WTM association rules. For example, an IWTMpassociation rule on the browsing path of Web pages A-B-F-G, i.e., <ABFG: B{i1, b14}. G{i6, b61}>, indicates additional information on Web page B with 4 occurences of b1-type IBD, and on web page G of 1 occurence of b6-type IBD. One implication is that among users who have purchased i1on page B, the ones with higher b1 -type IBD are more likely to purchase i6on page G, especially if accompanied with any b6 -type IBD. Therefore, more resources and promotion strategies can be applied to those users with higher b1and b6. On the other hand, IWTMnpis used to probe those Web pages without any purchase by IBD. For example, <ABFG: F{0, b55}.G{0, b61}> implies that a user with browsing path A-B-F-G may not purchase anything on Web pages F and G, but has a strong potential interest on page F with 5 occurences of b6-type IBD and some interest on page G with 1 occurence of b7-type IBD. Therefore, a proper promotion effort may stimulate the users who have purchased neither on page F nor on G but with higher occurences of corresponding b6and b7.

3 The Unified IWTM Algorithm (IWTMu)

From a practical viewpoint, the purpose of a unified IWTMu algorithm is to process the whole intended data set at once by only one algorithm. We modified and enhanced the original notations, defintions and implication rules of IWTM [4] as follows. Let N = {n1, n2, ..., np-1, np} be a set of Web pages of a Web site; I = {i1, i2, ..., im-1, im} be the merchandise items sold in the Web site assuming one Web page can have only one merchandise item for sale; B = {b1o1,b2o2,..., bx-1ox1,bxox} be the IBDs assuming each Web page x has only one bx-type IBD associated with a number of occurences ox, where p,

m, and x are none-zero positive integers and need not be the same values. Figure 1 illustrates a Web transaction tree with associated IBDs, where A, B, ..., L represent the Web page names and A is the entry page that usually contains no merchandise item.

b1 0 i1 0 K b1 1 i1 1 L b9 i9 J b8 i8 I b7 i7 H b6 i6 G b5 i5 F b4 i4 E b3 i3 D b2 i2 C b1 i1 B -A I B D I t e m s N o d e ( B ) A B C D E F G H I J K L ( A )

(3)

Figure 1. A Web transaction tree and corresponding transaction data Definition 1: Let {s1, s2, ..., sy} be a path sequence, where {s1, s2, ..., sy}Ny.

Definition 2: Let <s1, s2, ..., sy: n1{i1, b1o1 }, n2{i2, b2o2}, …, nx{ix,bxox}> be a transaction pattern, where imI for 1mx, and bm B for 1mx, and {n1, n2, ..., nx}{s1, s2, ..., sx}N

y .

Definition 3: Let <sy: X Y> be an association rule, where X and Y are both subsets of (I, B) and X∩Y =ψ.

Similar to either IWTMpor IWTMnp, the unified IWTMualgorithm mines transaction patterns with IBDs as follows:

Step 1: Sort all transaction records in ancending order of ID.

Step 2: Generate a set of 1-transaction candidate patterns C1 from Step 1.

Step 2-1: First calculate the occurences of purchased items in each Web page without repetition. For each user, count only one for repeated purchases of the same items, but the exact occurrenes of IBDs. In the case of one user having the same IBDs in diffrent patterns, take the minimum value. Then take the maximum value among all users having the same IBD occurences.

Step 2-2: Repeat the same procedure as in Setp 2-1 for the counterpart of Web pages without item purchases.

Step 3: Set two minimum support values, one for the patterns with item purchases and the other one without. Save all C1 items whose sums of occurences are greater than or equal to the minimum support value into large 1-transaction patterns T1, which represents possible browsing paths for purchasing or not purchasing one item over a hurdle value.

Setp 4: According to Web browsing sequential paths, generate a set of candidates 2-transaction patterns C2 by joining items in T1.

Step 5: Set the minimum support values, and save all C2 items whose sums of occurences are greater than or equal to the minimum support value into large 2-transaction patterns T2.

Step 6: Repeat steps 4 and 5 until no large k-transaction sets can be generated.

IWTMu differs from IWTMp or IWTMnp mainly on two aspects. First, all records enter into IWTMu in Step 1 and are processed differently in Step 2 for records with purcahses and for records without purchases as they are in IWTMp and IWTMnp, respectively. Second, items with purcahses and without purchase are joined in a mixed way after Step 3. Therefore, minimum support values can be set differently as in IWTMp or IWTMnp in Steps 3 and 5 so that IWTMu is guaranteed to cover whatever outcomes IWTMpand IWTMnphave.

4. An Example of IWTMu

We illustrate IWTMuusing the example depicted in Figure 1 as follows:

Step 1: Order the user Web browsing transaction records in ascending order of ID. Table 1 lists the user IDs, Web sequential browsing paths, and sets of (item, IBD) which

(4)

represent purchase states and corresponding IBD occurrences. Notice that any path contain both items with purchase and without purchase, such as B{0,b15}, C{i2,b23}, E{i4,b41} exist in the path of ABCE path for user ID 1. Item zero represents no purchase.

Table 1. An Example of General Transaction Patterns

ID Path Item Purchase and IBD ID Path Item Purchase and IBD

ABCE B{0,b15},C{i2,b23},E{i4,b41} ABCE B{0,b11},C{0,b25},E{0,b42}

ABFGH B{i1,b18},F{0,b54},G{0,b63},H{i7,b75} AIJK I{0,b84},J{0,b97},K{0,b102}

1

AIJK I{0,b82},J{i9,b99},K{0,b104}

6

AIL I{0,b86},L{0,b116}

ABCE B{i1,b15},C{i2,b25},E{0,b43} ABCE B{0,b17},C{0,b23},E{0,b41}

2

ABFGH B{i1,b13},F{i5,b58},G{0,b64},H{0,b75} ABFGH B{0,b12},F{0,b54},G{0,b63},H{0,b71}

ABCE B{0,b14},C{0,b22},E{i4,b41}

7

AIJK I{0,b82},J{0,b97},K{0,b104}

ABCD B{i1,b14},C{0,b25},D{i3,b32} ABCD B{0,b13},C{0,b25},D{0,b32}

3

AIL I{0,b82},L{0,b118}

8

AIL I{0,b87},L{0,b116}

ABCE B{i1,b15},C{i2,b25},E{i4,b43} ABFGH B{0,b12},F{0,b53},G{0,b63},H{0,b71}

ABFGH B{i1,b19},F{i5,b52},G{i6,b61},H{0,b73}

9

AIJK I{0,b82},J{0,b95},K{0,b104}

4

AIJK I{i8,b87},J{i9,b98},K{0,b102} AIJK I{0,b82},J{0,b91},K{0,b103}

ABCE B{0,b14},C{ 0,b22},E{0,b41} 10 AIL I{0,b85},L{0,b112} ABFGH B{0,b12},F{0,b52},G{0,b63},H{0,b71} 5 AIL I{0,b81},L{ 0,b113}

Step 2: Generate candidate sets C1 from all the pages with or without purchases. For the with-purchase illustration, user ID 2 in Table 1 has purchased i1on page B twice, namely B{i1,b15}and B{i1,b13}, which is only counted once. Among the ten users of IDs 1-10, only 1-4 had purchased i1on page B, which leads to a support value of 4 for i1 purchase on page B. On the IBD, the frequency of b1is calculated first as the minimum value for each user and then the maximum value among all users. That is, b1is calculated as min(b15, b13) = b13 for user ID 2 and max{b18, b13, b14, b15}= b18 for all the four users with IDs 1-4 having purchase i1on page B. Accordingly, path AB in Table 2 has the result B{i1, b18}. On the other hand, B{0, b15} of path AB, the counterpart of B{i1, b18}, is obtained similarly with 0 item purchase and max{b15, b14, b12, b11, b12, b13, b12}= b15from IDs 1, 3, 5, 6, 7, 8 and 9. All the resulting 1-Transaction Pattern Candidate Set C1 is shown in Table 2.

Table 2. 1-Transaction Pattern Candidate Set (C1)

Path Behaviour Sup Path Behaviour Sup

AB B{0,b15} 7 ABFG G{i6,b61} 1

AB B{i1,b18} 4 ABFGH H{0,b75} 5

ABC C{0,b25} 5 ABFGH H{i7,b75} 1

ABC C{i2,b25} 3 AI I{0,b87} 8

ABCD D{0,b3

2

} 1 AI I{i8,b8

7

} 1

ABCD D{i3,b32} 1 AIJ J{0,b97} 4

ABCE E{0,b4

3

} 4 AIJ J{i9,b9

9

} 2

(5)

ABF F{0,b54} 4 AIJK K{i10,b100} 0

ABF F{i5,b58} 2 AL L{0,b118} 5

ABFG G{0,b64} 5 AL L{i11,b110} 0

Step 3: Assume the minimum support value for item purchase is 2 and for no item purchase is 6. The rationale for the different minimum support values is becuase a practical Website would have more no-purchased transactions than purchased tranactions. Therefore, a higher initial minimum support value on no-purchase transactions will quickly filter out insignificant portions of no-purchase items. Save the transaction patterns with purchases whose support values are greater than or equal to 2 in large 1-transaction pattern set T1. Similarly, save the transaction patterns without purchase whose support values are greater than or equal to 6 in large 1-transaction pattern set T1, as seen in Table 3.

Table 3. Large 1-Transaction Pattern Set (T1)

Path Behaviour Sup Path Behaviour Sup

AB B{0,b15} 7 ABF F{i5,b58} 2

AB B{i1,b18} 4 AI I{0,b87} 8

ABC C{i2,b25} 3 AIJ J{i9,b99} 2

ABCE E{i4,b43} 3 AIJK K{0,b104} 4

Step 4: According to the Web sequential browsing paths, generate 2-transaction pattern candidate set C2 from T1 by joining items in T1 as shown in Table 4.

Table 4. 2-Transaction Pattern Candidate Set (C2)

Path Behaviour Sup Path Behaviour Sup

ABC B{0,b15}C{i2,b25} 2 ABF B{0,b15} F{i5,b58} 2

ABC B{i1,b18} C{i2,b25} 2 ABF B{i1,b18} F{i5,b58} 2

ABCE B{0,b1 5 } E{i4,b4 3 } 2 AIJ I{0,b8 2 } J{i9,b9 9 } 1

ABCE B{i1,b18} E{i3,b43} 1 AIJK I{0,b87} K{0,b104} 5

ABCE C{i2,b25} E{i4,b43} 3

Step 5: Assuming both minimum support values are set to 2, only those patterns whose support values are greater than or equal to 2 are kept in large 2-transaction pattern set T2 as seen in Table 5.

Table 5. Large 2-Transaction Pattern Set (T2)

Path Behaviour Sup Path Behaviour Sup

ABC B{0,b15} C{i2,b25} 2 ABF B{0,b15} F{i5,b58} 2

ABC B{i1,b18} C{i2,b25} 2 ABF B{i1,b18} F{i5,b58} 2

ABCE B{0,b15} E{i4,b43} 2 AIJK I{0,b87} K{0,b104} 5

ABCE C{i2,b25} E{i4,b43} 3

Step 6: According to the web sequential browsing paths, generate 3-transaction pattern candidate set C3 by joining itemsets from T2, as seen in Table 6. Because all the support values are less than the minimum support value 2, this algorithm ends here.

Table 6. 3-Transaction Pattern Candidate Set (C3)

Path Behaviour Sup

ABCE B{0,b1 5 } C{i2,b2 5 } E{i4,b4 3 } 1

(6)

Accordingly, the final results generate seven association rules as can be seen from Table 5. The derived rules from IWTMu, IWTMpand IWTMnpwith the same data set are listed in Table 7. As we can see, the unified IWTMualgorithm not only covers all the (first three) rules derived from both IWTMpand IWTMnp, but also generates new rules by rejoining the splitted data sets. In other words, IWTMucan efficiently obtain the same results as from IWTMpand IWTMnp while simultaneously enriching the rule base of previous IWTM algorithms. For instance, the new rules <ABC: B{0,b15}C{i2,b25}> and <ABCE: B{0,b15}E{i4,b43}> imply that users who did not purchase on page B but with high interest level of b15-type IBD may purchase items on Web pages C or E with high levels of b25-type or b43-type IBDs, respectively. In practice, we can allocate more resources to promote any user who has presented high frequencies b1-type IBD on page B for potential purchase on page C or E. Furthermore, more dedicated strategies can be deployed based on the interest levels of b25 and b43, so that more accurate customer targeting and marketing performance can be achieved.

Table 7. A comparison of the implications by IWTMu, IWTMp& IWTMnp

Algorithm Transaction Behaviour Rules Implications to WTM

IWTMp

(with purchase)

<ABC: B{i1,b18}C{i2,b25}>

<ABF: B{i1,b18}F{i5,b58}>

Enhancement

IWTMnp

(without purchase) <AIJK: I{0,b8 7 }K{0,b10 4 }> Complement IWTMu (unified)

<ABC: B{i1,b18}C{i2,b25}>

<ABF: B{i1,b18}F{i5,b58}>

<AIJK: I{0,b87}K{0,b104}> <ABC: B{0,b15}C{i2,b25}> <ABCE: B{0,b1 5 }E{i4,b4 3 }> <ABCE: C{i2,b25}E{i4,b43}>

<ABF: B{0,b15}F{i5,b58}> Complete Enhancement & Complement 6. Conclusions

This paper focuses on the technical improvements that were not addressed in the illustrative work [4] of how a new data ingredient IBD could bring potential benefits into the WTM algorithm. We have successfully shown that how a unified IWTMu algorithm could product the same outcomes of IWTMpand IWTMnpalgorithms as well as additional association rules derived. In other words, IWTMu is an efficient replacement of IWTMp and IWTMnpalgorithms and an effective improvement on the volume of useful association rules in practical business applications.

References

[1] Madria, S.K., Bhowmick, S.S., Hg, W.K. and Lim, E.P., “Research issuesin Web data mining”,The First International Conference on Data Warehousing and knowledge Discovery, 1999, pp. 303-312.

[2] Oren E., “The World WideWeb:Quagmireorgold mine”, Communications of the ACM, 39(11), 1996, pp. 65-68.

[3] Su, Y.M. and Tao, Y.H., “Classification of intentional behavior and mechanism and mechanism for online data collection”, The Thirteen International Conference on Information Management, Taiwan, R.O.C., 2002.

(7)

[4] Tao, Y.H., Su, Y.M. and Hong, T.P., "Web transaction mining algorithm with intentional behaviour", The Sixth International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies, Italy, 2002.

[5] Yun, C.H. and Chen, M.S., “Using pattern-join and purchase-combination for mining transaction patterns in an electronic commerce environment”,The 24th Annual International Conference On Computer Software and Applications, Taiwan, 2000, pp.99-104.

數據

Table 1. An Example of General Transaction Patterns
Table 3. Large 1-Transaction Pattern Set (T1)
Table 7. A comparison of the implications by IWTMu, IWTMp &amp; IWTMnp

參考文獻

相關文件

• School-based curriculum is enriched to allow for value addedness in the reading and writing performance of the students. • Students have a positive attitude and are interested and

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

Then, based on these systematically generated smoothing functions, a unified neural network model is pro- posed for solving absolute value equationB. The issues regarding

To improve the convergence of difference methods, one way is selected difference-equations in such that their local truncation errors are O(h p ) for as large a value of p as

for some constant  1 and all sufficiently  large  , then  Θ.

The packed comparison instructions compare the destination (second) operand to the source (first) oper- and to test for equality or greater than.. These instructions compare eight