本論文使用 sequential pattern mining 方法分析 openssh 程式之動態歷程,輔以在 openssh 版本歷程中的變動情況,以描述程式元素(program element)之關係,其中使用 Support 用以挑選出現次數夠高的 pattern,接著以 Confidence 決定 pattern 之品質,其品 質可分為三類:
z frequent: 該 pattern 於執行中完全無例外情況,可做為使用建議
z Potential Error:該 pattern 於執行中有少數例外情況,可檢視該例外情況以除錯 z Unlikely: 該 pattern 於執行中有許多例外情況,pattern 可能實際上不存在 最後以 Variant 比較不同 pattern 的重要性,以及 pattern 中各單元的重要程度。
本研究有二個主要頁獻,其一是相較其它論文所使用的 apriori-based 之方法,使用 sequential pattern 方法,可以確定 pattern 中先元素的前後次序;二是導入變動情況的概 念,故可以更明確地描述 pattern,並顯示出易錯誤處。
5.1. 與相關研究之比較
分析實體 分析範圍 特點比較
Guide Software Changes (2004)
Each revision diff
• Less dedicated program analysis
System specific Rule (2005)
Procedure call
10 lines in a revision
• Pairwise pattern
• Static analysis Matching Method
calls (2005)
Procedure call
Each revision diff
• Pairwise pattern
• Dynamic Analysis PR-Miner (2005) Procedure
call and variable
intra-procedure • Static analysis
Our work Procedure call and variable
intra-procedure • Dynamic analysis
• Ordered pattern
5.2. 未來改進方向
5.2.1. 偵測異名變數
C 語言由於其指標(pointer)便於使用,因而功能強大,然而指標的使用卻帶來程式分 析時相當大的困擾。以下列三行 C 程式為例說明:
int * flag1 = 3;
int * flag2;
flag2 = flag1;
在執行後,記憶體內的有一塊單元其值為三,而 flag1 及 flag2 皆指向此位址,使用指標 透過 flag1 修改該單元的值也會造成 flag2 指向的值改變,反之亦然,因此在此 flag1 及 flag2 具有相同之意義,然而在本論文中 flag1 及 flag2 是視為不同的,例如:
foo(flag1);
以及
foo(flag2);
將會被統計為不同的 pattern,由於該問題為 NP-Complete program,因此若加入 pointer alias analysis 之近似演算法應可以部份解決此問題。以上的問題可能造 成 false-negative 之情形(即存在的規則被忽略而沒被找出)。
5.2.2. 改善版本變動歸因
由於本研究在處理版本資料時皆只使用一行修正(one-line check-ins),若有一組重要 的函式使用方式是從未出錯或恰好其修改皆在二行以上,則我們的研究將無法彰顯這些 函式的重要性。若能有更細緻的演算法標定每次的版本更動的性質,則能更精確描述每 個 pattern 之變動性或重要性。
5.2.3. 利用資料相依性
本研究僅使用變數更名的方式以表達資料相依性(data dependency),然而此種簡易 的方法仍不足我們的需求,若能透過變數展現函式間的關係,搭配前文之 sequential mining 之方法,則有助於找出低發生次數的 pattern,並降低 false-pattern 的機會
參考文獻
[1] Y. Shigio, "GNU GLOBAL source code tag system",http://www.gnu.org/software/global/
[2] Z. Li and Y. Zhou, "PR-miner: Automatically extracting implicit programming rules and detecting violations in large software code," in ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, pp. 306-315.
[3] B. Livshits and T. Zimmermann , "Locating matching method calls by mining revision history data," in Workshop on the Evaluation of Software Defect Detection Tools, 2005.
[4] I. Neamtiu, J. S. Foster and M. Hicks, "Understanding source code evolution using abstract syntax tree matching," in MSR '05: Proceedings of the 2005 International Workshop on Mining Software Repositories, 2005.
[5] C. C. Williams and J. K. Hollingsworth, "Recovering system specific rules from software repositories," in MSR '05: Proceedings of the 2005 International Workshop on Mining Software Repositories, 2005.
[6] T. Zimmermann, P. Weisgerber, S. Diehl and A. Zeller, "Mining version histories to guide software changes," in ICSE '04: Proceedings of the 26th International Conference on Software Engineering, 2004, pp. 563-572.
[7] R. Purushothaman and D. Perry, "Towards understanding the rhetoric of small changes," in 2004, pp. 90-94.
[8] "OpenSSH" http://www.openssh.com/, May, 2006.
[9] D. Kramer, "API documentation from source code comments: A case study of javadoc," in SIGDOC '99: Proceedings of the 17th Annual International Conference on Computer
Documentation, 1999, pp. 147-153.
[10] A. Zeller, "Configuration Management with Version Sets," Abteilung Softwaretechnologie, Technische Universität Braunschweig, Braunschweig, 1997.
[11] CollabNet, "subversion", http://subversion.tigris.org/
[12] S. Huang and K. Liu, "Mining version histories to verify the learning process of legitimate peripheral participants," in MSR '05: Proceedings of the 2005 International Workshop on
Mining Software Repositories, 2005.
[13] D. Cubranic, Murphy and Gail C., "Hipikat: Recommending pertinent software development artifacts," in ICSE '03: Proceedings of the 25th International Conference on Software Engineering, 2003, pp. 408-418.
[14] T. Zimmermann and P. Weisserber, "Preprocessing CVS data for fine-grained analysis," in MSR 2004: International Workshop on Mining Software Repositories, 2004.
[15] I. Sommerville, Software Engineering. ,7th ed.Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 2004,
[16] J. Whaley, M. C. Martin and M. S. Lam, "Automatic extraction of object-oriented component interfaces," in ISSTA '02: Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, 2002, pp. 218-228.
[17] S. Konrad and B.H.C. Cheng, “Requirements Patterns for Embedded Systems," IEEE Joint International Conference on Requirements Engineering, 2002. Proceedings., 2002, pp.127-136
[18] Z. Balanyi and R. Ferenc, "Mining design patterns from C++ source
code," .Proceedings.International Conference on Software Maintenance, 2003., pp. 305-314, 2003.
[19] J. W. Nimmer and M. D. Ernst, "Automatic generation of program specifications," ACM SIGSOFT Software Engineering Notes, vol. 27, pp. 229-239, 2002.
[20] R. Kollmann, P. Selonen, E. Stroulia, T. Systa and A. Zundorf, "A study on the current state of the art in tool-supported UML-based static reverse engineering," Proceedings.Ninth Working Conference on Reverse Engineering, 2002., pp. 22-32, 2002.
[21] L. C. Briand, Y. Labiche and Y. Miao, "Towards the reverse engineering of UML sequence diagrams," in WCRE '03: Proceedings of the 10th Working Conference on Reverse Engineering, 2003, pp. 57.
[22] A. Zeller and D. Lutkehaus, "DDD—a free graphical front-end for UNIX debuggers,"
SIGPLAN ., vol. 31, pp. 22-27, 1996.
[23] B. Demsky and M. Rinard, "Data structure repair using goal-directed reasoning,"
Proceedings of the 27th International Conference on Software Engineering, pp. 176-185, 2005.
[24] J. Cordy, "TXL-A Language for Programming Language Tools and Applications,"
Proc.4th Int.Workshop on Language Descriptions, Tools and Applications, Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31, 2004.
[25] G. C. Necula, S. McPeak, S. P. Rahul and W. Weimer, "CIL: Intermediate language and tools for analysis and transformation of C programs," Conference on Compiler Construction, 2002.
[26] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000, [27] A. Michail, "Data mining library reuse patterns using generalized association rules,"
International Conference on Software Engineering, pp. 167–176, 2000.
[28] A. Michail, "Data mining library reuse patterns in user-selected applications,"
Automated Software Engineering, 1999.14th IEEE International Conference on., pp. 24-33, 1999.
[29] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovering frequent episodes in sequences," KDD, pp. 210-215, 1995.
[30] “CVS - open source version control," http://www.nongnu.org/cvs/
[31] John Polstra, "CVSup," http://www.cvsup.org/
[32] K. CL, "SVN-Mirror," http://search.cpan.org/~clkao/SVN-Mirror-0.68/
[33] P. Godefroid, N. Klarlund and K. Sen, "DART: Directed automated random testing," in PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005, pp. 213-223.