• 沒有找到結果。

We have mentioned in section 3.4 that SVM have four prevalent kernel for data classifica-tion, Linear, polynomial, RBF and sigmoid. In general, we should choose the appropriate kernel based on the characteristics of the data distribution. We had adopted all the four kernel functions for majority learning. However, the sigmoid kernel has poor ability for classifying the system call data in our experiment. So, we did not list the result of SVM majority learning with sigmoid kernel.

In experiment 1.2 and 2.2, we have proof that the SVM majority learning with linear, polynomial and RBF kernel can choose proper majority just like other majority learning methods.

In experiment 1.1 and 2.1, the SVM majority learning methods with linear and polyno-mial kernel have similar performance on classification accuracy and time efficiency. These two methods have on average the best performance compare to other majority learning methods. We speculate that the reason for this experimental result is that the system call data set distribution is proper for using linear and polynomial kernel for classifying two class of data. As for the SVM majority learning methods with RBF kernel, it has on average the worst classification accuracy compare to other majority learning methods.

We can draw a conclusion that choosing a proper kernel for classifying the data set is important.

In experiment 3.1, the classification accuracy of SVMs is on average worse than BML.

We supposed that when the training data has more variety, SVMs are more difficult to find a hyperplane to separate two class of data. Relatively speaking, BML is a more stable majority learning method.

6 Conclusion

Noisy labels are almost inevitable in real-world cases. In this paper, we introduce a novel nominal resistant learning procedure BML to avoid anomalies affecting the effectiveness of learning. Through picking the observations with the maximum distance of two classes, we are able to spot anomalies in a global view when the feature of anomalies is unknown.

Further, we apply the resistant learning mechanism to reduce the impact of outliers on neural networks.

We had applied the majority learning concept on other prevalent classification mod-els, SVMs and ANN. Through selecting the data which is familiar to the modmod-els, these majority learning methods can learn proper majority and avoid the interference of outliers.

Besides the optimization on the algorithm, We implemented the majority learning algorithms with TensorFlow and executed in GPU environment to accelerate the model training process.

Experiments on real-world data sets show that our approach has a classification ac-curacy similar to the envelope mechanism but have more time efficiency. We also use popular multi-class classification model, the softmax neural network, to learn the major-ity selected by BML and perform a higher classification accuracy on the training data, and remain the same level classification accuracy on the testing data.

References

[1] W. Huang, Y. Yang, Z. Lin, G.-B. Huang, J. Zhou, Y. Duan, and W. Xiong, “Random feature subspace ensemble based extreme learning machine for liver tumor detection and segmentation,” in Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pp. 4675–4678, IEEE, 2014.

[2] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A con-volutional neural-network approach,” IEEE transactions on neural networks, vol. 8, no. 1, pp. 98–113, 1997.

[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing sys-tems, pp. 1097–1105, 2012.

[4] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732, 2014.

[5] G.-B. Huang and H. A. Babri, “Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,” IEEE Transactions on Neural Networks, vol. 9, no. 1, pp. 224–229, 1998.

[6] F. Anscombe, “Graphs in statistical analysis,” The American Statistician, vol. 27, no. 1, pp. 17–21, 1973.

[7] R.-H. Tsaih and T.-C. Cheng, “A resistant learning procedure for coping with out-liers,” Annals of Mathematics and Artificial Intelligence, vol. 57, no. 2, pp. 161–180, 2009.

[8] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated dynamic malware-analysis techniques and tools,” ACM computing surveys (CSUR), vol. 44, no. 2, p. 6, 2012.

[9] S.-Y. Huang, F. Yu, R.-H. Tsaih, and Y. Huang, “Resistant learning on the enve-lope bulk for identifying anomalous patterns,” in Neural Networks (IJCNN), 2014 International Joint Conference on, pp. 3303–3310, IEEE, 2014.

[10] “TensorFlow.” https://www.tensorflow.org/.

[11] G.-B. Huang, Y.-Q. Chen, and H. A. Babri, “Classification ability of single hidden layer feedforward neural networks,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 799–801, 2000.

[12] R. Tsaih, “The softening learning procedure,” Mathematical and computer modelling, vol. 18, no. 8, pp. 61–64, 1993.

[13] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, vol. 2, pp. 985–990, IEEE, 2004.

[14] G. Feng, G.-B. Huang, Q. Lin, and R. Gay, “Error minimized extreme learning machine with growth of hidden nodes and incremental learning,” IEEE Transactions on Neural Networks, vol. 20, no. 8, pp. 1352–1357, 2009.

[15] P. J. Rousseuw and A. M. Leroy, “Robust regression and outlier detection,” 1987.

[16] A. C. Atkinson, “Plots, transformations and regression; an introduction to graphical methods of diagnostic regression analysis,” tech. rep., 1985.

[17] R. D. Cook and S. Weisberg, Residuals and influence in regression. New York:

Chapman and Hall, 1982.

[18] J. Law, “Robust statistics-the approach based on influence functions.,” 1986.

[19] Y. Ren, P. Zhao, Y. Sheng, D. Yao, and Z. Xu, “Robust softmax regression for multi-class multi-classification with self-paced learning,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2641–2647, 2017.

[20] W. Jiang, H. Gao, F.-l. Chung, and H. Huang, “The l2, 1-norm stacked robust autoencoders for domain adaptation.,” in AAAI, pp. 1723–1729, 2016.

[21] C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deep autoencoders,”

in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674, ACM, 2017.

[22] H. Zhao and Y. Fu, “Semantic single video segmentation with robust graph repre-sentation.,” in IJCAI, pp. 2219–2226, 2015.

[23] D. Wang and X. Tan, “Robust distance metric learning in the presence of label noise.,” in AAAI, pp. 1321–1327, 2014.

[24] Z. Jia and H. Zhao, “A joint graph model for pinyin-to-chinese conversion with typo correction,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1512–1523, 2014.

[25] P. J. Huber, “Robust statistics. 1981.”

[26] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,

“Dropout: A simple way to prevent neural networks from overfitting,” The Jour-nal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

[27] S. Hou, Y. Ye, Y. Song, and M. Abdulhayoglu, “Hindroid: An intelligent android malware detection system based on structured heterogeneous information network,”

in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1507–1515, ACM, 2017.

[28] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversar-ial perturbations against deep neural networks for malware classification,” arXiv preprint arXiv:1606.04435, 2016.

[29] Q. Wang, W. Guo, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, and C. L. Giles,

“Adversary resistant deep neural networks with an application to malware detection,”

in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1145–1153, 2017.

[30] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acoustics, Speech and Signal Pro-cessing (ICASSP), 2013 IEEE International Conference on, pp. 3422–3426, IEEE, 2013.

[31] C.-H. Chiu, J.-J. Chen, and F. Yu, “An effective distributed ghsom algorithm for unsupervised clustering on big data,” in Big Data (BigData Congress), 2017 IEEE International Congress on, pp. 297–304, IEEE, 2017.

[32] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

[33] R. M. Bell and Y. Koren, “Lessons from the netflix prize challenge,” Acm Sigkdd Explorations Newsletter, vol. 9, no. 2, pp. 75–79, 2007.

[34] S. Haykin, Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.

[35] “TensorFlow - MNIST For ML Beginners.” https://www.tensorflow.org/

versions/r1.1/get_started/mnist/beginners.

[36] A. R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information theory, vol. 39, no. 3, pp. 930–945, 1993.

[37] “TensorFlow - tf.matrix solve ls.” https://www.tensorflow.org/api_docs/

python/tf/matrix_solve_ls.

[38] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, p. 27, 2011.

[39] “Scikit-Learn Support Vector Machines.” https://scikit-learn.org/stable/

modules/svm.html.

[40] B. Zhang, “Reliable classification of vehicle types based on cascade classifier en-sembles,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 322–332, 2013.

[41] “Malware Knowledge Base.” https://owl.nchc.org.tw/.

[42] “Cuckoo Sandbox.” https://cuckoosandbox.org/.

[43] Y.-H. Li, Y.-R. Tzeng, and F. Yu, “Viso: Characterizing malicious behaviors of virtual machines with unsupervised clustering,” in Cloud Computing Technology and Science (CloudCom), 2015 IEEE 7th International Conference on, pp. 34–41, IEEE, 2015.

相關文件