結論與未來方向 - 中華大學

綜合上序兩點，SIT 演算法於候選 2 物項集合採用 SIT+MPHP 的方法找尋高頻物項集合，候選3 以後物項集合則是使用 SI+Apriori 的方法找尋高頻物項集合能得到最好的效益。

5.2 未來方向

本研究所提出的論文是著重於如何增快演算法的速度，所以事先將資料庫排序，將索引及修剪的效益達到最好，但是對於現實的環境中上仍有未考量到的問題，所以本節將於5.2.1 說明當資料更新時可改進的方向，於 5.2.3 說明當交易長度較長時，使用另一種方式的索引來改進SIT 演算法的研究方向。

5.2.1 資料更新未來可改進的方向

當資料倉儲的資料隨時間的增加時，如果需要再次作關聯式法則的資料探勘時，此時必須將資料庫重資料新排序，然後執行SIT 演算法，這對於部分需要更新非常頻繁的資料倉儲是非常麻煩的，所以如果演算法能將執行過程中的各候選物項集合的數量儲存起來，當資料增加時，只需計算增加的部分[18]，將會大量的減少執行的時間，雖然SIT 的演算法中的排序與修剪的功能將無法使用，但整體而言只計算更新資料所減少的時間將遠大於排序與修剪所節省的時間。

5.2.2 交易長度較長未來可改進的方向

SIT 演算法使用的索引的技巧是針對候選物項集合使用索引對映，但是如果交易長度比較長時，即使使用索引的方法仍必須非常多的時間，若是能針對交易下所有可能物項建立索引對映，假設資料庫有1000 個不同物項，則每個交易下存有1000 個矩陣，將存在該交易下的物項的矩陣位址設為”1”，不存在該交易下的物項的矩陣位址設為”0” ，則在不考慮記憶體空間下，其執行效率將遠勝於 SIT 與 MPHP；若是因記憶體空間嚴重不足，則可考慮使用 Partition[7]的方式，

將資料庫分成數個等份，分別於不同的電腦同時來計算，而每部電腦都有一份相同的候選物項集合，當所有的電腦計算完後選物項集合的次數時，只需累加所有電腦的候選物項集合的次數，此方法除了可以使用多台電腦分攤記憶體空間，還能的到平行計算的好處。

現今企業的電腦化已經非常的普及，對於資料的保存也非常的完整，但對於有效率的將資料轉換成有用的資訊的需求是非常的渴望，於是有資料探勘技術提供企業所需要的資訊，若是資料探勘過程需要太多的時間及昂貴的設備，將不會被大部分的企業所接受，本研究SIT 演算法經各種不同的實驗結果可以證明即使使用PC 電腦，仍可以很快的執行效率，這對於一般企業於大量的資料找尋有用的資訊來說將是一個很好的投資。

參考文獻

[1] R. Agrawal, T. Imielinski, and A.swami(1993), “Mining Association Rules

between the Sets of Items in Large Database,” Proc. ACM SIGMOD, pp. 207-216, May 1993

[2] R. Agrawal & R. Srikant(1994), “Fast Algorithms for Mining Association Rules,”

Proceedings of the 20th VLDB Conference Santiago, Chile , 1994

[3] M.S. Chen, J. Han & P.S. Yu(1996), “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.

6, pp. 866 –883, 1996

[4] J. Soo, M.S. Chen & Philip S. Yu(1997), “Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules,” IEEE Trans. On Knowledge and Data Engineering, Vol.9, No.5, pp.813-825, 1997

[5] Michael J.A. Berry & Gordon Linoff, “Data Mining Techniques: For Marketing, Sales, and Customer Support,” Wiley Computer Publishing, 1997

[6] D. Lin and Z.M. Kedem, “Pincer-Search:” A New Algorithm for Discovering the Maximum Frequent Set, ”Sixth Int’1 Conf. on Extending Database Technology, March 1998

[7] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Database,” Proc. Of 21^st VLDB, pp. 432-444,1995.

[8] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.

SIGMOD'00, 1-12, Dallas, TX, May 2000

[9] “IBM QUEST Group,” http://www.almaden.ibm.com/cs/quest/HOME.html

[10] M. Houtsma & A. Swami(1995), “Set-Oriented Mining for Association Rules in Relational Databases,” Proc. of 11th International Conference on Data

Engineering, pp. 25-33, Mar. 1995

[11] N. Pasquier, Y. Bastide, R. Taouil, & L. Lakhal(1999), “Discovering Frequent Closed Itemsets for Association Rules,” 7th International Conference on Database Theory, 1999

[12] S. Sarawagi, S. Thomas, & R. Agrawal(1998), “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,” ACM SIGMOD Record, Proceedings of the 1998 ACM SIGMOD international

conference on Management of data, Vol. 27, Issue 2, 1998

[13] W. Wang, J. Yang, & Philip S. Yu(2000), “Efficient Mining of Weighted

Association Rules(WAR),” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000

[14] “ARMiner Project,” http://www.cs.umb.edu/~laur/ARMiner/

[15] Hussien H. Aly, Ashraf A. Amr, & Y. Taha(2001), “Fast Mining of Association Rules in Large-Scale Problems,” Computers and Communications, 2001.

Proceedings. Sixth IEEE Symposium , pp.107 –113, 2001

[16] C.C. Chang(1984) , “The Study of an Ordered Minimal Perfect Hashing Scheme,” Communications of the ACM, Vol. 27, No. 4, pp. 384-387, 1984.

[17] Heikki Mannila and Pirio Ronkainen,”Similary of Event Sequences (Revised version ),” Proceedings of the Fourth International Workshop on Temporal Representation and Reasoning (TIME’97), Daytona Breach ,Florida, USA, PP.

136-139, May 1997.

[18] 蔡紋富，”A Minimal Perfect Hashing and Pruning Approach for Mining Association Rules” ，暨南大學資訊管理學系碩士論文，2002，

在文檔中中華大學 (頁 65-70)