• 沒有找到結果。

A generic approach for mining indirect association rules in data streams

N/A
N/A
Protected

Academic year: 2021

Share "A generic approach for mining indirect association rules in data streams"

Copied!
10
0
0

加載中.... (立即查看全文)

全文

(1)

A Generic Approach for Mining Indirect Association

Rules in Data Streams

Wen-Yang Lin1, You-En Wei2 and Chun-Hao Chen3

1,2Dept. of Computer Science and Information Engineering, National University of Kaohsi-ung, Taiwan

3Dept. of Computer Science and Information Engineering, Tamkang University, Taiwan 1wylin@nuk.edu.tw; 2waiewing@gmail.com; 3chchen@mail.tku.edu.tw

Abstract. An indirect association refers to an infrequent itempair, each item of

which is highly co-occurring with a frequent itemset called “mediator”. Al-though indirect associations have been recognized as powerful patterns in re-vealing interesting information hidden in many applications, such as recom-mendation ranking, substitute items or competitive items, and common web navigation path, etc., almost no work, to our knowledge, has investigated how to discover this type of patterns from streaming data. In this paper, the problem of mining indirect associations from data streams is considered. Unlike con-temporary research work on stream data mining that investigates the problem individually from different types of streaming models, we treat the problem in a generic way. We propose a generic window model that can represent all classi-cal streaming models and retain user flexibility in defining new models. In this context, a generic algorithm is developed, which guarantees no false positive rules and bounded support error as long as the window model is specifiable by the proposed generic model. Comprehensive experiments on both synthetic and real datasets have showed the effectiveness of the proposed approach as a ge-neric way for finding indirect association rules over streaming data.

1 Introduction

Recently, the problem of mining interesting patterns or knowledge from large volumes of continuous, fast growing datasets over time, so-called data streams, has emerged as one of the most challenging issues to the data mining research community [1, 3]. Al-though over the past few years there is a large volume of literature on mining frequent patterns, such as itemsets, maximal itemsets, closed itemsets, etc., no work, to our knowledge, has endeavored to discover indirect associations, a recently coined new type of infrequent patterns. The term indirection association, first proposed by Tan et

al. in 2000 [21], refers to an infrequent itempair, each item of which is highly

co-occurring with a frequent itemset called “mediator”. Indirect associations have been recognized as powerful patterns in revealing interesting information hidden in many applications, such as recommendation ranking [14], common web navigation path [20], and substitute items (or competitive items) [23], etc. For example, Coca-cola and Pepsi are competitive products and could be replaced by each other. So it is very likely that

(2)

there is an indirect association rule revealing that consumers buy a kind of cookie tend to buy together with either Coca-cola or Pepsi but not both (Coca-cola, Pepsi | cookie). In this paper, the problem of mining indirect associations from data streams is con-sidered. Unlike contemporary research work on stream data mining that investigates the problem individually from different types of streaming models, we treat the prob-lem in a unified way. A generic streaming window model that can encompass contem-porary streaming window models and is endowed with user flexibility for defining specific models is proposed. In accordance with this model, we develop a generic algo-rithm for mining indirect associations over the generic streaming window model, which guarantees no false positive patterns and a bounded error on the quality of the discovered associations. We further demonstrate an efficient implementation of the generic algorithm. Comprehensive experiments on both synthetic and real datasets showed that the proposed algorithm is efficient and effectiveness in finding indirect association rules.

The remainder of this paper is organized as follows. Section 2 introduces contem-porary stream window models and related work conducted based on these models. Our proposed generic window model, system framework and algorithm GIAMS for mining indirect association rules over streaming data are presented in Section 3. Some proper-ties of the proposed algorithm are also discussed. The experimental results are pre-sented in Section 4. Finally, in Section 5, conclusions and future work are described.

2 Related Work

Suppose that we have a data stream S = (t0, t1,… ti, …), where ti denotes the transaction arriving at time i. Since data stream is a continuous and unlimited incoming data along with time, a window W is specified, representing the sequence of data arrived from ti to

tj, denoted as W[i, j] = (ti, ti+1, …, tj). In the literature [1], there are three main different types of window models for data stream mining, i.e., landmark window, time-fading window, and sliding window models.

 Landmark model: The landmark model monitors the entire history of stream data from a specific time point called landmark to the present time. For example, if win-dow W1 denotes the stream data from time ti to tj, then windows W2 and W3 will span stream data from ti to tj+1 and ti to tj+2, respectively.

 Time-fading model: The time-fading model (also called damped model) assigns more weights to recently arrived transactions so that new transactions have higher weights than old ones. At every moment, based on a fixed decay rate d, a transac-tion processed n time steps ago is assigned a weight dn, where 0 < d < 1, and the occurrence of a pattern within that transaction is decreased accordingly.

 Sliding window model: A sliding window model keeps a window of size ω, moni-toring the data within a fixed time [18] or a fixed number of transactions [8]. Only the data kept in the window is used for analysis; when a new transaction arrives, the oldest resident in the window is considered obsolete and deleted to make room for the new one.

The first work on mining frequent itemsets over data stream with landmark win-dow model was proposed by Manku et al. [19]. They presented an algorithm, namely Lossy Counting, for computing frequency counts exceeding a user-specified threshold

(3)

over data streams. Although the derived frequent itemsets are approximate, the pro-posed algorithm guarantees that no false negative itemsets are generated. However, the performance of Lossy Counting is limited due to memory constraints. Since then, considerable work has been conducted for mining different frequent patterns over data streams, including frequent itemsets [5, 6, 12, 15, 17, 18, 25], maximal frequent item-sets [16], and closed frequent itemitem-sets [8, 11]. Each method, however, is confined to a specific type of window model.

Existing researches on indirect association mining can be divided into two catego-ries, either focusing on developing efficient algorithms [7, 13, 24] or extending the definition of indirect association for different applications [14, 20, 23].

The first indirect association mining approach was proposed by Tan et al. [21], called “INDIRECT”. However, it is time-consuming for generating all frequent item-sets before mining indirect association. Wan and An [24] proposed an approach, called HI-mine, for improving the efficiency of the INDIRECT algorithm. Chen et al. [7] also proposed an indirect association mining approach that was similar to HI-mine.

3 A Generic Framework for Indirect Associations Mining

3.1 Proposed Generic Window Model

Definition 1. Given a data stream S = (t0, t1,… ti, …) as defined before, a generic

window model  is represented as a four-tuple specification, (l, ω, s, d), where l

denotes the timestamp at which the window starts, ω the window size, s the stride the window moves forward, and d is the decay rate.

The stride notation s is introduced to allow the window moving forward in a batch of transactions, i.e., a block of size s. That is, if the current window under concern is (tj1, tj2, …, tj), then the next window will be (tjs1, tjs2, …, tjs), and the weight of a transaction within (tjs1,tjs2, …, tj), say , is decayed to d, and the weight of a transaction within (tj+1, …, tjs) is 1.

Example 1. Let  = 4, s = 2, l = t1, and d = 0.9. An illustration of the generic streaming window model is depicted in Figure 1. The first window W1 = W[1, 4] = (t1, t2, t3, t4) consists of two blocks, B1 = {AH, AI} and B2 = {AH, AH}, for B1 receiving weight 0.9 while B2 receiving 1. Next, the window moves forward with stride 2. That means B1 is outdated and a new block B3 is added, resulting in a new window W2 = W[3, 6].

Below we show that this generic window model can be specified into any one of the contemporary models described in Section 2.

 Landmark model: (l, , 1, 1). Since ω  , there is no limitation on the window size and so the corresponding window at timestamp j is (tl, tl+1, …, tj) and is (tl,

tl+1, …, tj, tj+1) at timestamp j1.

 Time-fading model: (l, , 1, d). The parameter setting for this model is similar to landmark except that a decay rate less than 1 is specified.

 Sliding window model: (l, ω, 1, 1). Since the window size is limited to ω, the corresponding window at timestamp j is (tjω1, tjω2, …, tj) and is (tjω2, …, tj, tj+1) at timestamp j1.

(4)

Fig. 1. An illustration of the generic window

model. Fig. 2. A generic framework for indirect association mining.

3.2 Generic Framework for Indirect Association Mining

Definition 2. An itempair {a, b} is indirectly associated via a mediator M, denoted as

a, b|M if the following conditions hold:

1. sup({a, b}) < s (Itempair support condition);

2. sup({a}  M) ≥ f and sup({b}  M) ≥ f (Mediator support condition); 3. dep({a}, M) ≥ d and dep({b}, M) ≥ d (Mediator dependence condition); where

sup(A) denotes the support of an itemset A, and dep(P, Q) is a measure of the depend-ence between itemsets P and Q.

In this paper, we follow the suggestion in [20, 21], adopting the well-known de-pendence function, IS measure IS(P, Q) (= sup(P, Q) / sqrt (sup(P)  sup(Q))).

According to the paradigm in [21], the work of indirect association mining can be divided into two subtasks: First, discovers the set of frequent itemsets with support higher than f, and then generates the set of qualified indirect associations from the frequent itemsets. Our framework adopts this paradigm, working in the following sce-nario: (1) The user sets the streaming window model to his need by specifying the pa-rameters described previously; (2) The framework then executes the process for dis-covering and maintaining the set of potential frequent itemsets PF as the data continu-ously stream in; (3) At any moment once the user issues a query about the current indi-rect associations the second process for generating the qualified indiindi-rect associations is executed to generate from PF the set of indirect associations IA. Figure 2 depicts the generic streaming framework for indirect associations mining.

3.3 The Proposed Generic Algorithm GIAMS

Based on the framework in Figure 2, our proposed GIAMS (Generic Indirect Associa-tion Mining on Streams) algorithm consists of two concurrent processes:

PF-Process 1: Discover & maintain PF

PF model setting & adjusting Process 2: Generate IA query result data stream Access & update Access t1 A H t2 A I W1 ti t3 A H t4 A H t5 A I t6 A B D t7 B C D W2 t8 C D  = 4 A A H I 0.9 0.9 A A H H 1 1 A A s = 2 l= t1 d = 0.9 A A I B 1 D 1 H H 0.9 0.9 W3 A A I B B C C D 0.9 D D 1 1 0.9

(5)

monitoring and IA-generation. The first process is set off when the users specifies the

window parameters to set the required window model, responsible for generating item-sets from the incoming block of transactions and inserting those potentially frequent itemsets into PF. The second process is activated when the user issues a query about the current indirect associations, responsible for generating the qualified patterns from the frequent itemsets maintained by process PF-monitoring. Algorithm 1 presents a sketch of GIAMS.

Algorithm 1 GIAMS

Input: Itempair support threshold s, association support threshold f, dependence threshold d, stride s,

decay rate d, window size , and support error ε.

Output: Indirect associations IA. Initialization:

1: Let N be the accumulated number of transactions, N  0;

2: Let  be the decayed accumulated number of transaction,   0;

3 Let cbid be the current block id, cbid = 0, sbid the starting block id of window, sbid = 1;

3: repeat

4: Process 1;

5: Process 2;

6: until terminate;

Process 1: PF-monitoring

1: Reading the new coming block;

2: N  N + s;  d  s;

3: cbid  cbid + 1; 4: if ( N > ω) then

5: Block_delete(sbid, PF); // Delete outdated block Bsbid

6: sbid  sbid + 1;

7: N  N – s; // Decrease the transaction size in current window

8:    – sdcbid–sbid+1; // Decrease the decayed transaction size in current window

9: endif

10: Insert(PF, f, cbid, ); // Insert potential frequent itemsets in block cbid into PF 11: Decay&Pruning(d, s, ε, cbid, PF); //Remove infrequent itemsets from PF Process 2: IA-generation

1: if user query request  true then

2: IAgeneration(PF, f, d, s, N); //Generate all indirect associations from PF

The most challenging and critical issue to the effectiveness of GIAMS is bounding the error. That is, under the constraints of only one pass of data scan and limited mem-ory usage, how can we assure the error of the generated patterns is always bounded by a user specified range? Our approach to this end is, during the process of PF-monitoring, eliminating those itemsets with the least possibility to become frequent afterwards. More precisely, after processing the insertion of the new arriving block, we decay the accumulated count of each maintained itemset, and then prune any itemset X whose count is below a threshold indicated as follows:

d d s d d d s X.count sbid cbid sbid cbid                1 1 ) ( 1 1 2   (1)

where cbid and sbid denote the identifiers of current block and the first block that X appears in PF, respectively, s is the stride, and ε is a user-specified tolerant error. Note that the term s  (d + d2 + … + dcbid sbid + 1) equals to the decayed amount of transac-tions between the first block that X appears and the current block within the current

(6)

window. Therefore, we delete X when its count is far less than ε  s  (d + d2 + … +

dcbid sbid + 1). Later, we will prove that this pruning condition guarantees that the error of the itemset support generated by our algorithms is always bounded by ε.

Another critical issue is the generation of indirect associations from frequent item-sets. A naïve approach is adopting the method used in the INDIRECT algorithm [21]. A novel and more efficient method is proposed based on the following theorem. Due to space limit, all of the proof is omitted.

Theorem 1. The support of a mediator M should be no less than m = 2f  s, i.e.,

sup(M) ≥ m.

First, the set of frequent 1-itemsets and the set of 1-mediators M1 (with support larger than m) are generated. Then, it generates 2-itemsets from those frequent 1-itemsets; simultaneously, those 2-itemsets generated with threshold less than s form the indirect itempair set (IIS). The procedure then proceeds to generate all qualified indirect associations. First, a candidate indirect association rule is formed by simply combining an itempair from IIS and a 1-mediator from M1; the rule is output as a qualified association if it satisfies the dependence condition (d) and mediator support threshold (f). Next, the set of 2-mediators M2 is generated by performing apriori-gen on M2 and checking against m. The above steps are repeated until no new mediator is generated.

3.4 Theoretical Analyses

Consider a generated frequent itemset X by GIAMS. Theorem 2 shows that GIAMS always guarantees a bound on the accuracy as long as the window model of concern can be specified by the proposed generic model.

Theorem 2. Let the true support of itemset X, called Tsup(X), be the fraction of trans-actions so far containing X, and the estimated support of an itemset X, called Esup(X), is the fraction of transactions containing X accumulated by the proposed GIAMS

algo-rithm. Then Tsup(X) – Esup(X)  .

Recall that an indirect association is an itempair {a, b} indirectly associated via a mediator M, denoted as a, b|M, if it satisfies the three conditions. We show that if the mediator dependence threshold is set smaller than a specific value, then all indi-rect association patterns generated by our algorithms that satisfy the mediator support condition also satisfy the mediator dependence condition.

Lemma 1. Let a, b|M be a candidate indirect association satisfies the mediator

sup-port condition. Then f  dep({a}, M), dep({b}, M)  1.

Corollary 1. If the mediator dependence threshold is set as d  f, then an

indi-rect association satisfying the itempair support condition and mediator support condi-tion also satisfies the mediator dependence condicondi-tion. That is, it is a qualified indirect association.

Although Corollary 1 suggests the appropriate setting of d from the viewpoint of retaining all high mediator supported associations, it can be regarded as an alternative bound for pruning to take effect. That is, given f and , we have to specify d larger

(7)

5 Experimental Results

A series of experiments were conducted to evaluate the efficiency and effectiveness of the GIAMS algorithm. Our purpose is to inspect: (1) As a generic algorithm, how GIAMS performs in various streaming models, especially the three classical models; (2) How is the influence to GIAMS of each parameter engaged in specifying the win-dow model? Each evaluation was inspected from three aspects, including execution time, memory usage, and pattern accuracy.

All experiments were done on an AMD X3-425(2.7 GHz) PC with 3GB of main memory, running the Windows XP operating system. All programs were coded in Vis-ual C++ 2008. A synthetic dataset, T5.I5.N0.1K.D1000K, generated by the program in [2] as well as a real web-news-click stream [4] extracted from msn.com for the entire day of September 28, 1999 were tested. Since similar phenomena on the synthetic data were observed, we only show the results on the real dataset.

Effect of Mediator Support Threshold: We first examine the effect of varying

mediator support thresholds, which is ranging from 0.01 to 0.018. It can be seen from Figure 3(a) that in the case of landmark model, the smaller f is, less execution time is spent due to smaller f resulting in less number of itemsets. And, the overall trend is linearly proportional to the transaction size. The memory usage exhibits a similar phe-nomenon. The situation is different in the case of time-fading model. From Figure 3(d) we can observe both the execution time and memory usage are not affected by the me-diator support threshold. This is because most of the time for time-fading model, com-pared with the other two models, is spent on the insertion of itemsets and the benefit of pruning fades away. The result for sliding model is omitted because it resembles that for landmark model.

Effect of Window Stride: The stride value is ranging from 10000 to 80000. Two

noticeable phenomena are observed in the case of landmark model. First, the execu-tion time decreases as the stride increases because larger strides encourage analogical transactions; more transactions can be merged together. Second, a larger stride also is helpful in saving the memory usage; as indicated in (1) the pruning threshold becomes stricter, so more itemsets will be pruned. Effect of larger strides, however, is on the contrary in the case of sliding model. The execution time increases in proportional to the stride because larger strides imply larger transaction block to be inserted and de-leted in maintaining the PF. The memory usage also increases, but is not significant.

Effect of Decay Rate: Note that only the time-fading model depends on this

fac-tor. As exhibited in Figure 3(c), a smaller decay rate contributes to a longer execution time. This is because a smaller decay rate makes the support count decay more quickly, so the itemset lifetime becomes shorter, leading to more itemset insertion and deletion operations. This is also why the memory usage, though not significant, is smaller than that for larger decay rates.

Effect of Window Size: Not surprisingly the memory increases as the window size

increases, as shown in Figure 3(f), while the performance gap is not significant. This is because the stride (block size) is the same, and most of the time for process

FP-monitoring is spent on the block insertion and deletion.

Performance of Process IA-generation: We compare the performance of the

pro-posed two methods for implementing process IA-generation. We only show the re-sults for landmark model because the other models exhibit similar behavior. GIAMS-IND denotes the approach modified from algorithm GIAMS-INDIRECT while GIAMS-MED

(8)

represents the more efficient method utilizing qualified mediator threshold. As illus-trated in Figure 3(g), GIAMS-MED is faster than GIAMS-IND. The reason is that as shown in Figure 3(h), the number of candidate rules generated by GIAMS-IND is much more than that by GIAMS-MED. Both methods consume approximately the same amount of memory.

0 0.1 0.2 0.3 0.4 0.5 80K160K240K320K400K480K560K640K720K800K880K960K Transaction size Ti m e( se c) 0 1 2 3 4 5 6 7 Mem or y(M B )

Mem 0.01 Mem 0.012 Mem 0.014

Mem 0.016 Mem 0.018 Time 0.01

Time 0.012 Time 0.014 Time 0.016

Time 0.018 0 0.1 0.2 0.3 0.4 0.5 80K 160K240K320K400K480K 560K640K720K800K880K 960K Transaction size Ti m e( se c) 4 4.5 5 5.5 6 6.5 7 M emo ry (M B )

Mem stride=10000 Mem stride=20000

Mem stride=40000 Mem stride=80000

Time stride=10000 Time stride=20000

Time stride=40000 Time stride=80000

(a) Landmark: Varying mediator supports (b) Landmark: Varying strides

0 50 100 150 200 250 80K 160K240K320K400K480K560K640K720K800K880K960K Transaction size T im e(sec) 19 21 23 25 27 29 Me m or y(MB )

Mem. d=0.9 Mem. d=0.8 Mem. d=0.7

Time d=0.9 Time d=0.8 Time d=0.7

0 50 100 150 80K160K 240K 320K 400K480K560K 640K 720K 800K880K 960K Transaction size T im e( sec) 20 22 24 26 Mem ory (MB )

Mem. 0.01 Mem. 0.012 Mem. 0.014

Mem. 0.016 Mem. 0.018 Time 0.01

Time 0.012 Time 0.014 Time 0.016

Time 0.018

(c) Time-fading: Varying decaying rates (d) Time-fading: Varying mediator supports

0 0.05 0.1 0.15 0.2 0.25 0.3 80K 160K240K320K400K480K 560K640K720K800K 880K960K Transaction size Ti m e( se c) 0 1 2 3 4 5 M em ory (MB )

Mem s=10000 Mem s=20000 Mem s=40000

Mem s=80000 Time s=10000 Time s=20000

Time s=40000 Time s=80000 0 0.2 0.4 0.6 0.81 1.2 1.4 80K160K240K320K400K480K560K640K720K800K880K960K Transaction size Ti m e( se c) 0 5 10 15 20 Me m or y( MB )

Mem. w=10000 Mem. w=20000 Mem. w=40000

Mem. w=80000 Time w=10000 Time w=20000

Time w=40000 Time w=80000

(e) Sliding: Varying strides (f) Sliding: Varying window sizes

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.01 0.012 0.014 0.016 0.018

Mediator support threshold

Ti m e( se c) 0 1 2 3 4 5 6 7 Me m or y( M B )

Mem. GIAMS-IND Mem. GIAMS-MED

Time GIAMS-IND Time GIAMS-MED

0 200 400 600 800 1000 1200 1400 0.01 0.012 0.014 0.016 0.018

mediator support threshold

N um ber of c andi da te r ul es GIAMS-IND GIAMS-MED

(g) IA-generation: Execution time and memory (h) IA-generation: # of candidate rules

(9)

Accuracy: First, we check the difference between the true support and estimated

support, measured by ASE (Average Support Error) = xF(Tsup(x) – Esup(x))/|F|, where F denotes the set of all frequent itemsets w.r.t. f. The ASEs for all test cases with varying strides between 10000 and 80000 and fs ranging from 0.01 to 0.018 were recorded. All of them are zero except the case for time-fading model with d = 0.9; all values are less than 310-7. We also measured the accuracy of discovered indi-rect association rules by inspecting how many rules generated are corindi-rect patterns, i.e., recall. All the test cases exhibit 100% recalls.

6. Conclusions

In this paper, we have investigated the problem of indirect association mining from a generic viewpoint. We have proposed a generic stream window model that can encom-pass all classical streaming models and a generic mining algorithm that guarantees no false positive rules and bounded support error. An efficient implementation of GIAMS also was presented. Comprehensive experiments on both synthetic and real datasets have showed that the proposed generic algorithm is efficient and effectiveness in find-ing indirect association rules.

Recently, the design of adaptive data stream mining methods that can perform adaptively under constrained resources has emerged into an important and challenging research issue to the stream mining community [9, 22]. In the future, we will study how to apply or incorporate some adaptive technique such as load shedding [1] into our approach.

Acknowledgements

This work is partially supported by National Science Council of Taiwan under grant No. NSC97-2221-E-390-016-MY2.

References

1. Aggarwal, C.: Data Streams: Models and Algorithms, Springer (2007)

2. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: 20th Int. Conf. Very Large Data Bases, pp. 487--499 (1994)

3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: 21st ACM Symp. Principles of Database Systems, pp. 1--16 (2002) 4. Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of Navigation

Patterns on a Web Site Using Model-based Clustering. In: 6th ACM Int. Conf. Knowledge Discovery and Data Mining, pp. 280--284 (2000)

5. Chang, J.H., Lee, W.S.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: 9th ACM Int. Conf. Knowledge Discovery and Data Mining, pp. 487--492 (2003)

(10)

6. Chang, J.H., Lee, W.S.: estWin: Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In: 12th ACM Int. Conf. Information and Knowledge Management, pp. 536--539 (2003)

7. Chen, L., Bhowmick, S.S., Li, J.: Mining Temporal Indirect Associations. In: 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 425--434 (2006)

8. Chi, Y. Wung, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: 4th IEEE Int. Conf. Data Mining, pp. 59--66 (2004) 9. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Towards an Adaptive Approach for

Mining Data Streams in Resource Constrained Environments. In: 6th Int. Conf. Data Warehousing and Knowledge Discovery, pp. 189--198 (2004)

10. Hidber, C.: Online Association Rule Mining. ACM SIGMOD Record 28(2), pp. 145--156 (1999)

11. Jiang, N., Gruenwald, L.: CFI-stream: Mining Closed Frequent Itemsets in Data Streams. In: Proc. 12th ACM Int. Conf. Knowledge Discovery and Data Mining, pp. 592-597 (2006) 12. Jin, R., Agrawal, G.: An Algorithm for In-core Frequent Itemset Mining on Streaming

Data. In: 5th IEEE Int. Conf. Data Mining, pp. 210--217 (2005)

13. Kazienko, P.: IDRAM—Mining of Indirect Association Rules. In: Int. Conf. Intelligent Information Processing and Web Mining, pp. 77--86 (2005)

14. Kazienko, P., Kuzminska, K.: The Influence of Indirect Association Rules on Recommendation Ranking Lists. In: 5th Int. Conf. Intelligent Systems Design and Applications, pp. 482--487 (2005)

15. Koh, J.L., Shin, S.N.: An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams. In: 8th Int. Conf. Data Warehousing and Knowledge Discovery, pp. 352--362 (2006)

16. Lee, D., Lee, W.: Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. In: 5th IEEE Int. Conf. Data Mining, pp. 266--273 (2005)

17. Li, H.F., Lee, S.Y., Shan,M.K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: 1st Int. Workshop Knowledge Discovery in Data Streams, pp. 20--24 (2004)

18. Lin, C.H., Chiu, D.Y., Wu, Y.H., Chen, A.L.P.: Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In: 5th SIAM Data Mining Conf., pp. 68--79 (2005)

19. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: 28th Int. Conf. Very Large Data Bases, pp. 346--357 (2002)

20. Tan, P.N., Kumar, V.: Mining Indirect Associations in Web Data. In: 3rd Int. Workshop Mining Web Log Data Across All Customers Touch Points, pp.145--166 (2001)

21. Tan, P.N., Kumar, V, Srivastava, J.: Indirect Association: Mining Higher Order Dependencies in Data. In: 4th European Conf. Principles of Data Mining and Knowledge Discovery, pp. 632--637 (2000)

22. Teng, W.G., Chen, M.S., Yu, P.S.: Resource-Aware Mining with Variable Granularities in Data Streams. In: 4th SIAM Conf. Data Mining, pp. 527--531 (2004)

23. Teng, W.G., Hsieh, M.J., Chen, M.S.: On the Mining of Substitution Rules for Statistically Dependent Items. In: 2nd IEEE Int. Conf. Data Mining, pp. 442--449 (2002)

24. Wan, Q., An, A.: An Efficient Approach to Mining Indirect Associations. Journal of Intelligent Information System 27(2), pp. 135--158 (2006)

25. Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: 30th Int. Conf. Very Large Data Bases, pp. 204--215 (2004)

數據

Fig. 3. Experimental results on evaluating GIAMS over a real web-news-click stream.

參考文獻

相關文件

The relationship between these extra type parameters, and the types to which they are associated, is established by parameteriz- ing the interfaces (Java generics, C#, and Eiffel)

Discovering the City by Mining Diverse and Multimodal Data Streams – IBM Grand Challenge: New York City 360. §  Exploring and Integrating Multiple Contents and Sources for

We showed that the BCDM is a unifying model in that conceptual instances could be mapped into instances of five existing bitemporal representational data models: a first normal

2 machine learning, data mining and statistics all need data. 3 data mining is just another name for

This bioinformatic machine is a PC cluster structure using special hardware to accelerate dynamic programming, genetic algorithm and data mining algorithm.. In this machine,

We try to explore category and association rules of customer questions by applying customer analysis and the combination of data mining and rough set theory.. We use customer

In this thesis, we have proposed a new and simple feedforward sampling time offset (STO) estimation scheme for an OFDM-based IEEE 802.11a WLAN that uses an interpolator to recover

It is concluded that the proposed computer aided text mining method for patent function model analysis is able improve the efficiency and consistency of the result with