本論文提出 gSFNP 演算法從單一圖型中有效率探勘常見鄰近樣式,同時提出 一個平行演算法 gSFNP_MR。演算法的設計運用 gSpan 演算法中的最小深先搜尋 碼有效避免找出圖型同構的樣式並使用邊編碼大小來避免產生圖型同構。探勘過 程中我們運用內嵌結構將鄰近樣式所對應之實際圖型儲存於鄰近樣式對應表中,
以圖型成長方式列舉出常見鄰近樣式的子樣式為候選鄰近樣式,以避免用樣式組 合組出許多實際不存在的候選鄰近樣式。由於本方法採用深先搜尋的順序列舉產 生鄰近子圖樣式,因此在儲存圖型資訊的個數較 FNP 演算法來得少。在輸入大型 圖型時,可採用 gSFNP_MR 演算法解決記憶體不足等問題,在實驗結果中 gSFNP 及 gSFNP_MR 相較於 FNP 都有較佳的執行效率,且隨資料圖型變大,記憶體需 求與執行時間成長曲線較 FNP 平緩。
由於 gSFNP 仍需耗費相當大的記憶體儲存圖型結構資訊,因此未來可研究如 何減少儲存圖型資料結構以增進效率等問題。此外,gSFNP_MR 在 MapReduce 環境執行時因產生鄰近樣式及大量的對應圖型結構資料,使得資料傳輸上佔據過 多的執行時間,因此透過網路在平行架構上分越多台電腦越慢。未來研究可考慮 如何減少圖型資料結構儲存,以及讓子圖圖型成長產生的工作分配儘量平均,或 運用壓縮技術減少傳輸資料,以更快速的平行處理的常見鄰近樣式探勘問題。
參考文獻
[1] Y. R. Cho, and A. Zhang. “Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks”. Trans. Info. Tech.
Biomed,14(1):30-36,Jan.2010
[2] L. Dehaspe, H. Toivonen, and R. D. King. “Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), pages30–36. AAAI Press, 1998.
[3] M. Elseidy , E. Abdelhamid, S. Skiadopoulos, and P. Kalnis. “GRAMI: Frequent Subgraph and Pattern Mining in a Singls Large Graph.”, In VLDB'14 2014
[4] J. Han, and J. R. Wen. “Mining Frequent Neighborhood Patterns in a Large Labeled Graph.” In CIKM'13, 2013.
[5] J. Han, and J. R. Wen. “Within-network classification using radius-contrained neighborhood patterns.” , in Proc. The ACM Intl. Conf. on Information and Knowledge Management,CIKM,2014.
[6] S. Hill , B. Srichandan , and R. Sunderraman. An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets. In ACM-BCB 2012
[7] L. B Holder, D. J Cook, S. Djoko,et al. “Substructure discovery in the subdue system.” In AAAI Workshop on Knowledge Discovery in Database,KDD-94,pages
169-180, 1994.
[8] J. Huan, W. Wang and J. Prins.”Efficient Mining of Frequent Subgraph in the Presence of Isomorphism.” In ICDM'03, 2003.
[9] A. Inokuchi, T.Washio,and H.Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery,PKDD'00,pp 13-23,2000.
[10] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM),November 2001.
[11] M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. In IEEE Transacrion on Knowledge and Data Engineering,16(9),pp.1038 - 1051,2004.
[12] M. Kuramochi and G. Karypis. “Finding frequent patterns in a large sparse graph.”
In Data Mining and Knowledge Discovery,11(3):243-271,2005
[13] C. W. Leung , E. P. Lim , D. Lo and J. Weng. “Mining interesting link formation rules in social networks.”, In Proc. the ACM Intl.Conf.on Information and
Knowledge Management,CIKM'10,pp.209-218,2010.
[14] W. Lin,X. Xiao, and G. Ghinita. “Large-scale frequent subgraph mining in MapReduce” in Proc. the IEEE Inter. Conf. on Data Engineering, ICDE’14,2014.
[15] Y. Liu, X. Jiang, H. Chen , J. Ma , and X. Zhang. “MapReduce-Based Pattern
Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network.”, In Advanced Parallel Processing Technologies,pages 341-355,
Springer,2009.
[16] W. Lu ,G. Chen,A. KH Tung and F. Zhao.”Efficiently extracting frequent subgraphs using mapreduce. In 2013 IEEE International Conference on Big Data, Page 639-647. IEEE, 2013.
[17] T. Meinl, M. Worlein, I. Fischer and M. Philippsen. “Mining molecular datasets on symmetric multiprocessor systems. In IEEE International Conference on systems,Man and Cybernetics, volume 2 of SMC ’06,pages 1269-1274.IEEE, 2006.
[18] S. Nijssen and J. N. Kok. “A quickstart in frequent structure mining can make a difference.” In Proc. the ACM Intl. Conf. on Knowledge Discovery and Data
Mining,SIGKDD'04,pp. 647-652,2004
[19] X.Yan, and J. Han. “gSpan:graph-based substructure pattern mining.” In ICDM'02 2002.