Conclusions and Future Works - 利用三階段行為分析來偵測和分類已知與未知的惡意程式

In order to take care of both detection accuracy and time cost against malware, we propose a three-phase behavior-based approach, with the front two phases serving detection and the rear one phase serving classification. We observe a program’s behaviors in two quite different ways: sandbox-based and system-call-based.

Although observing a program by sandbox is faster, observing by system calls can dissect a program in a much more fine-grained way.

In the 1^st-phase, we employ the GFI sandbox to obtain 12 representative behaviors, and then adopt an artificial neural network to calculate the MD values for each to-be-detected program. In the 2^nd-phase, we record the issued system calls of each program during its execution, and discover common behaviors between different malware by recursively extracting the longest common substring of system call sequences. Subsequently, we apply the Bayes probabilistic model to keep the likely malicious behaviors that benign programs rarely perform, and judge a program by comparing the common system call sequences. In the 3^rd-phase, we define type vectors according to what malicious behaviors each malware exhibits. Afterwards, these type vectors are utilized to recognize the malware of a known type or an unknown one by cosine similarity. Since intrusive behaviors are carried out only by intrusive malware, such as bots, the intrusive malware and the non-intrusive malware can be identified individually.

We conduct some experiments to validate the effectiveness and the efficiency of the proposed scheme. We summarize some insights as follows. First, the 1^st-phase takes about 180 seconds to analyze a program while the 2^nd-phase takes approximately 900 seconds. However, the 1^st-phase introduces 7.6% in FNR and 44.9% in FPR, compared with 7.4% in FNR and 7.5% in FPR of the 2^nd-phase. The

2^nd-phase takes more time to achieve a better performance. Next, the integrated 2-phase detection approach performs better than any 1-phase approach alone in both detection accuracy and time cost, where it produces 3.6% in FPR and 6.8% in FPR, and spends 731 seconds on analyzing a sample. Finally, based on our classification method, the proposed approach can distinguish malware of known types from unknown type with the accuracy of 85.8% and discriminate the malware of unknown type from known types with the accuracy of 80.0%. It can also recognize intrusive malware with accuracy of 82.7% and non-intrusive malware with accuracy of 88.9%.

No matter for detection or classification, the proposed approach still leaves something to be improved. For the 1^st-phase, we can employ multiple sandboxes to put more behaviors into consideration. For the 2^nd-phase and the 3^rd-phase, although all the invoked system calls are recorded, we ignored the parameters of system calls.

We would like to investigate the malicious behaviors that system call sequences represent. With the malicious behaviors studied, we can get more familiar with malware, and comprehend what triggered events malware requires. By this way, we might better recognize what kind of malware it is.

References

[1] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, “A Sense of Self for Unix

Process,” Proceedings of the 1996 IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120-128, May 1996.

[2] D. Mutz, F. Valeur, G. Vigna, and C. Kruegel, “Anomalous system call detection,” ACM Transactions on Information and System Security, vol. 9, no. 1, pp. 61-93, Feb. 2006.

[3] C. Warrender, S. Forrest, and B. Pearlmutter, “Detecting Intrusions Using System Calls:

Alternative Data Models,” Proceedings of the 1999 IEEE Symposium on Security and Privacy,

Pages 133-145, 1999.

[4] Y. D. Lin, Y. C. Lai, C. H. Chen, and H. C. Tsai, “Identifying Android Malicious

Repackaged Applications by Thread-grained System Call Sequences,” Computers & Security, in revision.

[5] Y. D. Lin, Y. T. Chiang, Y. S. Wu, and Y. C. Lai, “Automatic Analysis and Classification of Obfuscated Bot Binaries,” International Journal of Network Security, to appear.

[6] B. Rozenberg, E. Gudes, Y. Elovici, and Y. Fledel, “A Method for Detecting Unknown Malicious Executables,” Proceedings of the 2011 IEEE 10^th International Conference on Trust,

Security and Privacy in Computing and Communications, pp. 190-196, Nov. 2011.

[7] M. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 40, pp. 31-60, 2001.

[8] B. Cha and B. Vaidya, “Anomaly Intrusion Detection for System Call using the Soundex Algorithm and Neural Networks,” Proceedings of the 10^th IEEE Symposium on Computers and

Communications (ISCC’05), pp. 427-433, 2005.

[9]J. Li, J. Xu, M. Xu, H. Zhao, and N. Zheng, “Malware Obfuscation Measuring via

Evolutionary Similarity”

[10] S. B. Mehdi, A. K. Tanwani, and M. Farroq, “IMAD: In-Execution Malware Analysis and

Detection,” Proceedings of the 11th Annual conference on Genetic and Evolutionary

Computation, Montreal, Canada, pp. 1553-1560, July 2009.

[11] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,”

Proceedings of Complexity and Data Mining, pp. 39-42, Sep. 2011.

[12] H. Y. Tsai and K. C. Wang, Suspicious Behavior-based Malware Detection Using Artificial

Neural Network, Institute of Network Engineering College of Computer Science, National Chiao Tung University, June 2012.

[13] R. Sekar, M.Bendre, P. Bollineni, and D. Dhurjati, “A Fast Automaton-based Approach for Detecting Anomalous Program Behaviors,” Proceedings of IEEE Symposium on Security and

Privacy, 2001.

[14] “GFI Sandbox,” [Online]. Available: http://www.gfi.com/malware-analysis-tool.

[15] “Norman Sandbox,” [Online].

Available: http://www.norman.com/security_center/security_tools.

[16] “Anubis sandbox,” [Online]. Available: http://anubis.iseclab.org/.

[17] “Aritificial Neural Network for beginner, ”

[online], Available: http://arxiv.org/ftp/cs/papers/0308/0308031.pdf

[18] “Cosine Similarity,” C. Manning, P. Raghavan, and H. Schütze. “Introduction to Information Retrieval,” Cambridge Univ Press, 2008.

[19] “Pin – A Dynamic Binary Instrumentation Tool,”

[online], Available: http://www.pintool.org/.

[20] “QEMU,” [online], Available: www.qemu.org/

[21] “VX Heaven,” [online], Available: http://vx.netlux.org/index.html [22] “CNET,” [online], Available: http:// www.cnet.com/

[23] “Virus Total,” [online], Available: http://www.virustotal.com/

[24] “VirtualBox,” [online], Available: https://www.virtualbox.org/

在文檔中利用三階段行為分析來偵測和分類已知與未知的惡意程式 (頁 41-44)