國
立 政 治 大 學
‧
N a
tio na
l C h engchi U ni ve rs it y
Fig. 11. The average filtered out inputs of each malware family.
marks in the next column specify that the two aligned APIs are matched (O), mismatched (X), or is a gap (-). We can see that most of the filtered APIs can be aligned (marked as O), and it indicates that our filterNN can identify the patterns of the family while performing classification. Such a human-readable filter makes the security experts more convenient and easier to analyze sequential data without manually filter unimportant data in the sequence.
F.
Case Study of different learning goals of the filterNN
As mentioned in Section III, the learning goal of the three cost function 1) Z cost: reduce inputs as more as possible, 2) C cost: retain the inputs (API calls) as consecutive as possible, which will help security expert to perform further analysis, 3) A cost: minimizing the cost between predictions and labels, which means maximize accuracy. These three learning goal conflict with each other. In expected, if we want to reduce most of the noises in the original data which can help security experts to explore the malicious purpose of this malware variant, the classifier may receive insufficient input to learn and classify. On the contrary, if we want to achieve higher classification accuracy, the filter will output a relatively integrated which may contain many noises that will make security experts hard to refer the malicious purpose of the malware variant. The idea is that by adjusting the training proportion of the three cost functions, we would like to train a filter model that can generate filtered input. The filtered input contains fewer data for the latter classifier that can still have high accuracy, meanwhile, the data left is as consecutive (2 calls in a row) as possible.
In figure 12, 13 and 14 shows some of the filterNN training results with SLFN filter. The learning rate of A cost and C cost is 0.0001 and C cost is 0.0001 in the following examples. The hyper-parameter of
‧
the cost function represents the training proportion. If it is 1, then it means this cost function is trained in every epoch. If it is 4, it means this cost function is trained every 4 epochs. The total parameter sets is shown in VII. In this experiment, I tested LeNet and AlexNet separately as the CNN classifier, while the effectiveness between these two models are similar. Under the consideration of the while training efficiency (in the perspective of time consumed), I adopt LeNet as the CNN classifier. As for the RNN model, I I used 128 units and 128 inputs to perform the classification. In expected, the higher the training proportion of Z cost is, the more input will be filtered out. But the result shows that the factor that influences the number of filtered input is C cost. The lower the training proportion of C cost is, the more input will be filtered out. It is because when filterNN try to make the filtered input as consecutive as possible, the input which is filtered out will be less and less. On the other hand, when filterNN try not to make the filtered input consecutive, the input which is filtered out will be more and more. In figure 12 we can see that the training proportion of C cost is high (every 4 epochs), this cause the filter eventually filter out no input so the accuracy is very high (95%) as expected. While in figure 13 and 14 shows two different situation. In figure 13, the C cost hyper-parameter is set to 8. The cost of Z cost is slowly rising and eventually retains about 90% of the original input. The accuracy is still very high (94%). In figure 14, the C cost is set to 12. It shows an interesting result that the Z cost starts dropping, the final Z cost is 0.2. It means the filter has filtered out 80% of the original input while the accuracy can reach 87%.
This result indicates that the classifier is able to correctly learn the insights of the input while the input is only left 20%. The filtered input can be viewed as the characteristic of the malware variant. Figure 15 shows the alignment of filtered input using the model which is trained by the hyper-parameter set of 13. Though the number of filtered input is about 90% of the original input. The malware from the same group is aligned well. Figure 16 shows the alignment of filtered input using the model which is trained by the hyper-parameter set of 14. Almost 80% of the original input were filtered out, there are 23 inputs left. As a result, most of the filtered input are aligned.
In figure 17, 18 and 19 display some of the filterNN training results with convolution filter. Table VIII shows the hyper-parameter used in this part. The hyper-parameter configuration of classifiers is as the same as table VII while the hyper-parameters of the three cost functions are different. Different from using SLFN filter, the result shows that C cost cannot significantly affect the number of filtered input like what SLFN filter did. No matter the training proportion of C cost is high or low, the number of filtered inputs or the accuracy rate are not much different in these three different conditions. This result can be inferred that because of the the characteristics of the convolution operation. Because convolution operation is not fully-connected to the input. It is hard to train Z cost function with small size convolutional kernel. While
‧
國立 政 治 大 學
‧
N a
tio na
l C h engchi U ni ve rs it y
SLFN filter is fully-connected to the input, so the training curve of C cost looks smooth and as expected.
In sum, in filterNN, using SLFN filter is capable of training a better model which can filter out inputs as many as you want while still keeps high accuracy. Figure 20 shows the alignment of filtered input using the model which is trained by the hyper-parameter set of 19. There’s 80% of the original input were filtered in average. In Convolution filter, the number of 0s is difficult to fine-tune. This result can be inferred that because of the characteristics of the convolution operation. Convolution is not fully-connected to the input and the convolution operation is complex than SLFN filter. It need more training to find the local minimal. However, the cost will increase exponentially. The experiment shows that we can adjust the training proportion of the three cost function to have different kinds of filtered input 1) filterNN can output the filter result that contains 50% of the filtered input and has 80% accuracy, or 2)output the filter result that contains 20% of the filtered input and has 90% accuracy. To security expert, you can either choose 1) to analyze the malware purpose by more filtered input with lower classification accuracy or 2) to analyze the malware purpose with less filtered input with higher classification accuracy.
Fig. 12. filterNN (SLFN filter) training results of hyper-parameter set A cost: 1, Z cost: 1, C cost: 4
‧
PARAMETERS USED IN FILTERNN (SLFNFILTER).
Parameter Value Explanation
SLFN filter
SLF N neuron 128 The number of neurons in the SLFN filter in CNN
K1 64 The number of the first convolution
F 1,F 2 4, 4 The length of the convolution kernel
P 1,P 2 4, 4 The length of pooling
D1,D2 4, 4 The strides of pooling
K2 128 The number of the second convolution
F C 1024 The number of nodes in full-connected layer in RNN
num units 128 number of units
n input 128 number of inputs
in SLFN
num neuron 1024 number of single layered neurons in Logistic Regression
num weight [128, 130] number of weights
num bias 130 number of bias
shared parameters
test rate 0.2 The ratio of the test data of all data, training 6165, testing 1487
n 128 The length if API call sequence
DR 0.5 Dropout rate
T cost 1 T cost training proportion
Z cost 1 Z cost training proportion
C cost 1 ∼ 15 C cost training proportion T cost rate 0.0001 T cost Learning rate Z cost rate 0.00001 Z cost Learning rate C cost rate 0.0001 C cost Learning rate
batch size 40 The number of data in each epoch of training
epochs 5000 the total training epochs
G.