Table 3: Interval behavior cluster vectors by decimal encoding method
Mi =
Table 4: Result of malware detection experiment Epoch Batch size Testing accuracy
600 10 0.625
accuracy occurs when the batch is equal to 100. At this time, the accuracy is 0.98. When we continue to increase the batch to 200, the accuracy slightly decreases to 0.961. This should be an overfitting situation. The best batch size should be 100.
4.6 Malware Family Classification Experiment
We use 18 categories generated from AVClass tool ,and take these 18 categories as our reference benchmarks to judge the accuracy of the training model. In this experiment, we set interval behavior vectors as 1000-gram and 2000-gram and compare the effect between two different n-gram vector. While calculating the testing accuracy, we consider
‧
the top two high possibilities of the guess as top 2 accuracies because sometimes the same behavior may be traced from different malware families.
Table 5: Training & testing accuracy of 1000-gram vectors by decimal encoding method
Epoch
Portion
100% 50% 25% 10% 5% Iteration
100 0.57 0.58 0.54 0.55 0.48 0.41
200 0.61 0.64 0.5 0.44 0.59 0.37
300 0.65 0.63 0.53 0.55 0.68 0.42
400 0.71 0.68 0.47 0.6 0.76 0.40
500 0.81 0.69 0.56 0.6 0.72 0.36
600 0.81 0.74 0.74 0.71 0.74 0.39
700 0.87 0.69 0.77 0.63 0.73 0.17
800 0.81 0.84 0.71 0.75 0.68 0.36
900 0.86 0.77 0.70 0.76 0.72 0.31
1000 0.9 0.81 0.79 0.66 0.72 0.36
Testing Accuracy 0.665 0.693 0.664 0.677 0.719 0.342 Top 2 Testing Accuracy 0.771 0.820 0.821 0.807 0.867 0.5
Table 6: Training & testing accuracy of 1000-gram vectors by weighted one-hot encoding method
Epoch
Portion
100% 50% 25% 10% 5% Iteration
100 0.25 0.32 0.45 0.41 0.33 0.24
200 0.47 0.59 0.60 0.57 0.53 0.21
300 0.52 0.64 0.75 0.75 0.72 0.2
400 0.58 0.64 0.82 0.85 0.85 0.19
500 0.65 0.67 0.91 0.88 0.87 0.22
600 0.74 0.77 0.83 0.93 0.93 0.26
700 0.75 0.75 0.91 0.90 0.96 0.28
800 0.81 0.82 0.88 0.94 0.87 0.25
900 0.88 0.84 0.83 0.90 0.91 0.32
1000 0.9 0.79 0.91 0.93 0.82 0.30
Testing Accuracy 0.152 0.141 0.172 0.09 0.113 0.144 Top 2 Testing Accuracy 0.252 0.250 0.2781 0.226 0.222 0.264
In Table 5 and Table 6, 1000-gram vectors are transformed by decimal encoding and weighted one-hot method respectively and then forwarded into RNN model. Each table is trained in 1000 epochs and the RNN input are divided in several portion, such as 100%
, 50%, 25%, 10%, 5%, iteration, to retrieve the best accuracy of training model. In Table 5, the highest testing accuracy and the top 2 testing accuracy occurs at 5% portion of
‧
RNN input. As data shown, 100% may encounter the overfitting problem. In Table 6, the highest testing accuracy and the top 2 testing accuracy occurs at iteration portion of RNN input. As the result shown, the training accuracy are better than Table 5 but the testing accuracy are much lower which may be the overfitting problem due to the weighted one-hot encoding method.
In Table 7 and Table 8, 2000-gram vectors are transformed by decimal encoding and the weighted one-hot method respectively and then forwarded into the RNN model. Each table is trained in 500 epochs and the RNN input are divided into several portions, such as 100% , 50%, 25%, 10%, 5%, iteration, to retrieve the best accuracy of training model.
In Table 7, the highest testing accuracy and the top 2 testing accuracy occurs at 25%
portion of RNN input. In Table 8, the testing accuracy and the top 2 testing accuracy are not significantly different. As the result shown, the weighted one-hot encoding method performs the worse result. Figure 9,10,11 and 12, has shown the line chart of the testing accuracy.
Table 7: Training & testing accuracy of 2000-gram vectors by decimal encoding method
Epoch
Portion
100% 50% 25% 10% 5% Iteration
100 0.73 0.67 0.67 0.5 0.45 0.39
200 0.76 0.78 0.7 0.56 0.66 0.42
300 0.74 0.77 0.65 0.66 0.69 0.27
400 0.74 0.75 0.8 0.68 0.72 0.32
500 0.83 0.87 0.81 0.63 0.68 0.43
Testing Accuracy 0.731 0.710 0.751 0.690 0.657 0.401 Top 2 Testing Accuracy 0.876 0.840 0.890 0.845 0.844 0.496
5 Conclusion
This paper proposes a new malware detection and classification framework. We use the growing hierarchical self-organizing map to group program behaviors, obtain a record of all kinds of program operations on virtual machines, and use a recurrent neural network to perform the behavior sequence analysis. The experimental results of this paper in the
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Table 8: Training & testing accuracy of 2000-gram vectors by weighted one-hot encoding method
Epoch
Portion
100% 50% 25% 10% 5% Iteration
100 0.34 0.26 0.28 0.28 0.21 0.17
200 0.45 0.46 0.51 0.42 0.32 0.15
300 0.48 0.58 0.62 0.56 0.47 0.22
400 0.45 0.6 0.71 0.66 0.57 0.21
500 0.62 0.68 0.67 0.71 0.65 0.2
Testing Accuracy 0.144 0.137 0.166 0.161 0.109 0.19 Top 2 Testing Accuracy 0.310 0.289 0.268 0.267 0.258 0.33
malware detection, when the batch size is smaller, the accuracy is not expected to be good, only 0.625 accuracy, and after adjusting the batch size, it can reach 0.98 accuracy, we think that the batch size for the prediction model has a significant impact. The classification accuracy reaches 0.719 at testing accuracy and 0.867 at top 2 testing accuracy when 1000-gram vectors are encoded by the decimal encoding method. In the future, we will delve into how to deal with data noise problems and how to make classification labels more accurate, as well as extend the flexibility of the model for data so that we can cope with a larger and more complex malware behavior analysis in the future.
Figure 9: Testing accuracy of 1000-gram decimal encoding vector
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 10: Testing accuracy of 1000-gram weighted one-hot encoding vector
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 11: Testing accuracy of 2000-gram decimal encoding vector
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 12: Testing accuracy of 2000-gram weighted one-hot encoding vector
‧
[1] A.-r. M. https://commons.wikimedia.org/wiki/User:BiObserve (Raster version previ-ously uploaded to Wikimedia)Alex Graves and G. H. (original)Eddie Antonio Santos (SVG version with TeX math), “Peephole long short-term memory,” ”[CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons”.
[2] “Linux syscall reference,” https://syscalls.kernelgrok.com/, [Online; accessed 11-August-2018].
[3] R. J. Canzanese Jr, “Detection and classification of malicious processes using system call analysis,” Ph.D. dissertation, Drexel University, 2015.
[4] T. Moore, D. J. Pym, C. Ioannidis et al., Economics of information security and privacy. Springer, 2010.
[5] N. Idika and A. P. Mathur, “A survey of malware detection techniques,” Purdue University, vol. 48, 2007.
[6] “Manalyze,” https://github.com/JusticeRage/Manalyze, [Online; accessed 4-May-2018].
[7] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, “A sense of self for unix processes,” in Security and Privacy, 1996. Proceedings., 1996 IEEE Symposium on.
IEEE, 1996, pp. 120–128.
[8] M. Rhode, P. Burnap, and K. Jones, “Early stage malware prediction using recurrent neural networks,” arXiv preprint arXiv:1708.03513, 2017.
[9] X. Wang and S. M. Yiu, “A multi-task learning model for malware classification with useful file access pattern from api call sequence,” arXiv preprint arXiv:1610.05945, 2016.
‧
[10] B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, “Deep learning for classification of malware system call sequences,” in Australasian Joint Conference on Artificial Intelligence. Springer, 2016, pp. 137–149.
[11] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi, “Malware detection with deep neural network using process behavior,” in Computer Software and Appli-cations Conference (COMPSAC), 2016 IEEE 40th Annual, vol. 2. IEEE, 2016, pp.
577–582.
[12] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and A. Thomas, “Malware classification with recurrent networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 1916–1920.
[13] C.-H. Chiu, J.-J. Chen, and F. Yu, “An effective distributed ghsom algorithm for unsupervised clustering on big data,” in Big Data (BigData Congress), 2017 IEEE International Congress on. IEEE, 2017, pp. 297–304.
[14] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.
1162/neco.1997.9.8.1735
[15] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recurrent neural net-work architectures for large scale acoustic modeling,” in Fifteenth annual conference of the international speech communication association, 2014.
[16] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gra-dient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp.
157–166, 1994.
[17] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual pre-diction with lstm,” 1999.
‧
[18] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[19] T. Mikolov, M. Karafi´at, L. Burget, J. ˇCernock`y, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh Annual Conference of the Inter-national Speech Communication Association, 2010.
[20] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recur-rent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.
[21] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–
3112.
[22] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp.
1464–1480, 1990.
[23] A. Rauber, D. Merkl, and M. Dittenbach, “The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data,” IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1331–1341, 2002.
[24] H. Shi, T. Hamagami, K. Yoshioka, H. Xu, K. Tobe, and S. Goto, “Structural clas-sification and similarity measurement of malware,” IEEJ Transactions on Electrical and Electronic Engineering, vol. 9, no. 6, pp. 621–632, 2014.
[25] W. Shuwei, W. Baosheng, Y. Tang, and Y. Bo, “Malware clustering based on snn density using system calls,” in International Conference on Cloud Computing and Security. Springer, 2015, pp. 181–191.
[26] M. Dittenbach, D. Merkl, and A. Rauber, “The growing hierarchical self-organizing map,” in Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, vol. 6. IEEE, 2000, pp. 15–19.
‧
[27] C. Guarnieri, A. Tanasi, J. Bremer, and M. Schloesser, “The cuckoo sandbox,” 2012.
[28] Y.-H. Li, Y.-R. Tzeng, and F. Yu, “Viso: Characterizing malicious behaviors of virtual machines with unsupervised clustering,” in Cloud Computing Technology and Science (CloudCom), 2015 IEEE 7th International Conference on. IEEE, 2015, pp.
34–41.
[29] S.-W. Lee and F. Yu, “Securing kvm-based cloud systems via virtualization intro-spection,” in System Sciences (HICSS), 2014 47th Hawaii International Conference on. IEEE, 2014, pp. 5028–5037.
[30] F. Yu, S.-y. Huang, L.-c. Chiou, and R.-h. Tsaih, “Clustering ios executable using self-organizing maps,” in Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013, pp. 1–8.
[31] R.-S. Pirscoveanu, M. Stevanovic, and J. M. Pedersen, “Clustering analysis of mal-ware behavior using self organizing map,” in Cyber Situational Amal-wareness, Data Analytics And Assessment (CyberSA), 2016 International Conference On. IEEE, 2016, pp. 1–6.
[32] S. Marinai, E. Marino, and G. Soda, “Embedded map projection for dimensional-ity reduction-based similardimensional-ity search,” in Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 2008, pp. 582–591.
[33] “Virustotal,” https://www.virustotal.com/en/, [Online; accessed 4-April-2018].
[34] M. Sebasti´an, R. Rivera, P. Kotzias, and J. Caballero, “Avclass: A tool for massive malware labeling,” in International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 2016, pp. 230–253.
[35] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” arXiv preprint arXiv:1506.00019, 2015.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[36] W. Hu and Y. Tan, “Black-box attacks against rnn based malware detection algo-rithms,” arXiv preprint arXiv:1705.08131, 2017.
[37] “strace(1) - linux man page,” https://linux.die.net/man/1/strace, [Online; accessed 5-April-2018].
[38] S.-W. Hsiao, Y.-N. Chen, Y. S. Sun, and M. C. Chen, “A cooperative botnet profiling and detection in virtualized environment,” in Communications and Network Security (CNS), 2013 IEEE Conference on. IEEE, 2013, pp. 154–162.