4.2 Evaluation
4.2.1 Exp. 1.1: Majority Learning on Small-Size Sampling Data
RQ 1: How BML performs in terms of efficiency and accuracy compared to the state-of-the-art approaches on small-size data sets?
To answer RQ1, in this experiment, we tested the learning results of BML, ENV, SVM, and softmax on sampling data, whether it can improve performance in the case of majority learning. The performance evaluation shows the classification accuracy and the time efficiency of each the methods.
Figure 6: Experimental design of experiment 1.1.
Figure 6 illustrates the experimental design of experiment 1.1. We randomly selected an equal amount of malicious samples from the 9 malware clusters and the benign samples from the benign data set. We decided to select 100 samples from each detection rule clusters, because the size 100 is relatively small compared to the total sample amount in each clusters (less than 10%). These randomly selected samples are used as the training set; the rest of the samples are used as the testing set.
For example, we would randomly select 100 malware samples from the rule cluster 1 and randomly select 100 benign samples from the benign cluster. Then, we labeled these 200 samples according to their class and regarded these samples as training set. The rest of the rule 1 cluster samples and the rest of the benign cluster samples would be the testing set. After we sampling the data from each clusters, we conduct the different majority learning experiments with the same sampling data.
Table 6: Training result by using 100*2 samples
Rule Majority #HN Outliers Execute Train B Train M Test B Test M
Method (#FB/#B, #FM/#M) Time(s) #FB/#B #FM/#M #FB/#B #FM/#M
1
SVM linear - (0/1, 0/9) 0.1 0/99 0/91 85/19291 3/3075
SVM poly - (0/0, 0/10) 0.1 0/100 0/90 127/19291 30/3075
SVM rbf - (0/0, 3/10) 0.3 0/100 0/90 58/19291 350/3075
Softmax - (0/7, 0/3) 0.6 0/93 0/97 345/19291 1/3075
ENV 7 (7/9, 0/1) 106.1 0/91 0/99 1078/19291 0/3075
BML 1 (4/6, 3/4) 0.4 0/94 0/96 759/19291 30/3075
2
SVM linear - (0/6, 0/4) 0.1 0/94 0/96 12/19291 0/1231
SVM poly - (0/7, 0/3) 0.1 0/93 0/97 18/19291 0/1231
SVM rbf - (0/7, 3/3) 0.2 0/93 0/97 0/19291 47/1231
Softmax - (0/0, 0/10) 0.5 0/100 0/90 6/19291 7/1231
ENV 7 (0/0, 0/10) 82.5 0/100 0/90 2/19291 17/1231
BML 1 (0/4, 0/6) 0.4 0/96 0/94 2/19291 6/1231
3
SVM linear - (1/6, 0/4) 0.1 0/94 0/96 14/19291 0/1925
SVM poly - (1/7, 0/3) 0.1 0/93 0/97 18/19291 0/1925
SVM rbf - (0/3, 2/7) 0.2 0/97 0/93 0/19291 48/1925
Softmax - (0/0, 0/10) 0.5 0/100 0/90 3/19291 0/1925
ENV 5 (0/4, 0/6) 34.5 0/96 0/94 8/19291 0/1925
BML 1 (0/7, 0/3) 0.4 0/93 0/97 8/19291 0/1925
4
SVM linear - (0/10, 0/0) 0.1 0/100 0/100 15/19291 0/1738
SVM poly - (1/10, 0/0) 0.1 0/90 0/100 18/19291 0/1738
SVM rbf - (0/1, 1/9) 0.2 0/99 0/91 0/19291 28/1738
Softmax - (4/10, 0/0) 0.4 0/90 0/100 548/19291 0/1738
ENV 11 (0/2, 4/8) 104.7 0/98 0/92 0/19291 98/1738
BML 1 (0/6, 0/4) 0.5 0/94 0/96 0/19291 0/1738
5
SVM linear - (0/4, 0/6) 0.1 0/96 0/94 12/19291 0/2351
SVM poly - (0/4, 0/6) 0.1 0/96 0/94 18/19291 0/2351
SVM rbf - (0/0, 4/10) 0.2 0/100 0/90 0/19291 74/2351
Softmax - (7/8, 0/2) 0.5 0/92 0/98 1784/19291 0/2351
ENV 55 (1/2, 0/8) 1265.0 0/98 0/92 500/19291 0/2351
BML 1 (0/3, 0/7) 0.4 0/97 0/93 3/19291 0/2351
6
SVM linear - (0/8, 0/2) 0.1 0/92 0/98 16/19291 0/4256
SVM poly - (0/8, 0/2) 0.1 0/92 0/98 19/19291 0/4256
SVM rbf - (0/0, 7/10) 0.2 0/100 0/90 0/19291 125/4256
Softmax - (4/10, 0/0) 0.3 0/90 0/100 814/19291 0/4256
ENV 39 (0/3, 0/7) 461.0 0/97 0/93 0/19291 0/4256
BML 1 (0/3, 0/7) 0.4 0/97 0/93 0/19291 0/4256
7
SVM linear - (0/4, 0/6) 0.1 0/96 0/94 17/19291 0/1108
SVM poly - (0/4, 0/6) 0.1 0/96 0/94 34/19291 0/1108
SVM rbf - (0/0, 9/10) 0.3 0/100 0/90 0/19291 171/1108
Softmax - (0/0, 5/10) 0.5 0/100 0/90 1/19291 79/1108
ENV 25 (0/0, 4/10) 378.3 0/100 0/90 0/19291 38/1108
BML 1 (0/0, 4/10) 0.4 0/100 0/90 0/19291 36/1108
8
SVM linear - (0/0, 0/10) 0.1 0/100 0/90 10/19291 0/1120
SVM poly - (0/0, 0/10) 0.1 0/100 0/90 16/19291 0/1120
SVM rbf - (0/1, 3/9) 0.2 0/99 0/91 0/19291 58/1120
Softmax - (0/4, 0/6) 0.4 0/96 0/94 35/19291 0/1120
ENV 37 (0/1, 0/9) 1472.5 0/99 0/91 44/19291 32/1120
BML 1 (0/6, 0/4) 0.4 0/94 0/96 11/19291 0/1120
9
SVM linear - (0/2, 0/8) 0.1 0/98 0/92 11/19291 0/1687
SVM poly - (0/1, 0/9) 0.1 0/99 0/91 16/19291 0/1687
SVM rbf - (0/0, 9/10) 0.2 0/100 0/90 0/19291 60/1687
Softmax - (2/10, 0/0) 0.4 0/90 0/100 664/19291 0/1687
ENV 5 (0/1, 0/9) 92.1 0/99 0/91 118/19291 0/1687
BML 1 (0/1, 0/9) 0.4 0/99 0/91 7/19291 0/1687
Table 6 shows the training result of SVM, softmax, ENV and BML. For each cluster, we train the different models by using randomly selected samples as training data (100 benign samples and 100 malicious samples are used) in this experiment. The column of
“#HN” specifies the number of hidden nodes in SLFN after trained by ENV and BML.
Note that the softmax neural network does not have hidden layer, so the amount of hidden nodes must be always zero.
For the ENV, all of the rules need more than one hidden nodes to find the fitting function. For BML, all rules only need one hidden node to classify the majority data. It indicates we do not even need to apply the add hidden nodes procedure to deal with the outliers in the training data.
In column “Outliers (#FB/#B,#FM/#M)”, the value #B is the number of benign samples which were regarded as outliers in the training data. The value #M is the number of malware samples which were regarded as outliers in the training data. The sum of #B and #M is equal to 5% of training data because our majority rate is set to 95%. The value #FB and #FM are the number of false classified samples in the benign and malware outliers, respectively. Although outliers have a greater loss than the majority data, not all outliers are misclassified. Because we applied the condition L for classification, if the losses are not great enough, the outliers would not be misclassified by the model.
On the average, BML has higher classification accuracy on training data than EVN, and most of the misclassified samples are benign samples. As for the training time, BML is outperformed then ENV and is similar with SVM and softmax, since BML do not need to re-train the model as many times as ENV.
In this study, we evaluate the accuracy of the model by “false rate”. We define the false rate as follows: F alse Rate = F alse classif ied sample amount / T otal sample amount.
For example, if a rule 1 sample was classified as benign sample by a model, the rule 1 sample is a false classified sample. We sum the amount of false classified rule 1 samples and divide by the total amount of rule 1 samples to calculate the false rate of rule 1 samples. This calculation method applies to all rule clusters and benign clusters.
In column “Train B(#FB/#B)” indicates the false rate of benign training data, the value #B is the number of benign samples in the training data. In column “Train M(#FM/#M)” indicates the false rate of malware training data, the value #M is the number of malware samples in the training data. In column “Test B(#FB/#B)” indi-cates the false rate of benign testing data, the value #B is the number of benign samples in the testing data. In column “Test M(#FM/#M)” indicates the false rate of malware testing data, the value #M is the number of malware samples in the testing data.
Figure 7: False rate of different majority learning methods on training data (100*2 sam-ples).
Figure 8: False rate of different majority learning methods on testing data (100*2 sam-ples).
Figure 9: False rate of different majority learning methods on outlier data (100*2 samples).
Figure 10: Execution time of different majority learning methods (100*2 samples).
Figure 7 to Figure 9 show the false rate of SVM, softmax, ENV and BML. We calculate the mean false rate of 9 rules, BML can perform higher classification accuracy compare to softmax and ENV on testing data. As for the training data, BML has higher classification accuracy on benign data but has lower classification accuracy than softmax on malware data. Figure 10 shows the execution time of SVM, softmax, ENV and BML. BML, SVM and softmax finish the model training process much faster than ENV.
To answer RQ1, BML has on average higher time efficiency and higher classification
accuracy than the state-of-art methods on small size data sets.