Exp. 3: Binary Classification Performance

4.2 Evaluation

4.2.5 Exp. 3: Binary Classification Performance

RQ5: How BML performs on high-variation data sets?

The above experiments have proved the eﬀectiveness of BML classifying rule samples and benign samples. However, the rule samples are clustered by GHSOM, which means the features of rule samples in the same cluster are similar.

Figure 24: Experimental design of experiment 3.1.

Figure 24 illustrates the experimental design of experiment 3.1. In this experiment, we want to test whether BML can learn good binary classification criteria under the condition that the malicious behavior patterns are more complicated. We consider all kinds of malicious clusters as one malicious category and conduct our bipartite learning on the two kinds but spread samples, i.e., benign and malicious. We use three methods to build up the training data: 200, 10% and 80%. “200” means that we randomly select 200 malware samples from 9 rules separately, and then randomly select an equal amount of benign samples (i.e. 1,800 benign samples). “10%” means that we randomly select 10% malware samples from 9 rules separately, and then randomly select an equal amount of benign samples. “80%” means that we randomly select 80% malware samples from 9 rules separately, and then randomly select an equal amount of benign samples. Note that the same amount of benign samples are randomly selected and used for training as well.

We used ENV and softmax neural network as the experimental control group. The

terminal condition of ENV and softmax neural network is the neural network can correctly classify 95% of the training data. We compare the performance of BML with the other two majority learning methods.

Table 12: Training result of bipartite classification.

Sampling Majority #HN Outliers Execute Train B Train M Test B Test M

Method Method (#FB/#B, #FM/#M) Time(s) #FB/#B #FM/#M #FB/#B #FM/#M

200

SVM linear - (7/7, 161/173) 31 0/1793 0/1627 57/17591 889/17591

SVM poly - (7/8, 153/172) 33 0/1792 0/1628 53/17591 792/17591

SVM rbf - (152/180 0/0) 133 0/1620 0/1800 1359/17591 0/17591

Softmax - (33/77, 0/103) 37 0/1723 0/1697 278/17591 1/17591

ENV 3429 (5/5, 170/175) 9010 0/1795 0/1625 1295/17591 5526/17591

BML 107 (54/162, 12/18) 1205 0/1638 0/1782 589/17591 95/17591

10%

SVM linear - (3/3, 169/191) 32 0/1932 0/1744 53/17456 1469/17456

SVM poly - (3/3, 189/191) 38 0/1932 0/1744 57/17456 1763/17456

SVM rbf - (1/1 72/193) 690 0/1934 0/1742 47/17456 757/17456

Softmax - (7/12, 137/182) 230 0/1923 0/1753 87/17456 1214/17456

ENV 401 (51/51, 139/143) 6100 0/1884 0/1792 38/17456 5869/17456

BML 1 (32/35, 159/159) 33 0/1900 0/1776 347/17456 1446/17456

80%

SVM linear - (555/590, 771/961) 3108 0/14919 0/14548 194/3882 889/3882 SVM poly - (573/846, 11/705) 3139 0/14663 0/14804 150/3882 4/3882

SVM rbf - (1439/1551 0/0) 5247 0/13958 0/15509 361/3882 0/3882

Softmax - (416/750, 12/801) 2031 0/14759 0/14708 100/3882 4/3882

ENV N/A N/A N/A N/A N/A N/A N/A

BML 1 (551/1321, 0/230) 2316 0/14188 0/15279 133/3882 0/3882

Table 12 shows the training result of three sampling methods. The execution time of 80% ENV is longer than 20,000 seconds, so we didn’t finish this experiment. Since the malicious patterns are more complex, ENV needs a lot of time for training. BML has higher time eﬃciency and lower model complexity compare to ENV. BML has much higher time eﬃciency than SVMs, but softmax has much higher time eﬃciency than BML.

Figure 25: False rate of diﬀerent majority learning methods on training data (variety samples).

Figure 26: False rate of diﬀerent majority learning methods on testing data (variety samples).

Figure 27: False rate of diﬀerent majority learning methods on outlier data (variety samples).

Figure 25 to Figure 27 show the mean false rate of the three diﬀerent sampling meth-ods.

To answer RQ5, in the scenario of high variety data, BML can perform higher time eﬃciency and higher classification accuracy compared to ENV. But, softmax has more eﬃciency than BML.

5 Discussion

In chapter 4, we have shown the experiment results. Generally speaking, BML performs high time eﬃciency and remain the same level classification accuracy compared with the state-of-art methods. In this chapter, we will explain the experiment results in more detail.

5.1 Exp. 1.1: Majority Learning on Small-Size Sampling Data

In Experiment 1.1, we randomly obtained 100 rule samples and 100 benign samples as training data. Compared to the number of all rule samples, 100 samples are less than 10%

of the total number of samples. This small portion of data might not able to represent the distribution of the population. But, the rule sample have clustered by GHSOM algorithm, the rule sample should have similar pattern. That is to say, the 100 sampling rule samples are similar. Therefore, we should not worry too much about the representativeness of the obtained rule samples. As for the benign samples, there are 133 benign clusters clustered by GHSOM.

Figure 28: False rate of diﬀerent majority learning methods (100*2 samples).

The variety of benign sample patterns are larger than the malware sample patterns.

Therefore, we sampled as much randomly as possible so that the obtained 100 benign

samples are not overly concentrated on a few specific clusters. In figure 28, the false rate of training data and testing data are related. To a certain extent, the sampling 100 rule samples and 100 benign samples can represent the characteristics of the overall data.

We can discover that the execution time of ENV is much longer than softmax and BML in table 6. The main reason for the result is that ENV needs to retrain the model much more times than softmax and BML. ENV is a relatively strict majority learning method.

Thus, ENV needs more model training procedure to fit the strict limitation. The model training procedure of ENV including weight-tuning and add hidden nodes. Since adding hidden nodes only needs to calculate newly hidden node weights, the time complexity is O(1), which means adding hidden nodes does not need too much time. The most time-consuming process is to use the gradient descent method tuning the weight. The gradient descent method needs to do forward and backward pass repeatedly so many times, this process needs a lot of computation power. When the hidden node amount increased, the model needs even more computation power to deal with the extra calculation. Table 6 shows that ENV indeed retrained the model and add hidden nodes in these 9 experiments.

Therefore, ENV spent the most time on model training compared to softmax and BML.

The softmax majority learning method and BML train the model much faster because they do not need to retrain the model after the initialize of the model. Both methods obtained m + 1 samples for model initialization. The softmax majority learning method applies gradient descent method a few times to learn the m + 1 samples. BML adopts simultaneous equations to calculate proper initial weights for the m + 1 samples. After the m + 1 samples were learned by the model, the majority can be correctly classified by softmax and BML. Thus, softmax and BML save much time in retraining the model.

We believe that the reason why BML and softmax do not need retraining is that the rule sample and the benign sample are the data clustered by the GHSOM algorithm, and the sample features have a certain degree of diﬀerence, so the malware and the benign samples are not diﬃcult to be distinguished by the neural network.

Figure 28 shows the average false rate of the 9 experiments. The classification result

indicates that BML has the highest classification accuracy. However, BML is a less strict majority learning method compared to ENV, the classification accuracy of BML should not better than ENV. We hypothesise that the training data amount is not large enough, the number of samples taken is not suﬃcient to represent all the features of the population, so the BML classification results could be better than ENV.

5.2 Exp. 1.2: Use ANN to Learn the Majority

Figure 29: False rate of diﬀerent softmax neuron network. (100*2 samples)

In experiment 1.2, we have tested the majority data selected by diﬀerent majority learning methods. Figure 29 shows that all the majority learning methods can find the proper majority. Compared to using all data for training, the diﬀerent selected majority data do not lose accuracy on testing data. In other words, the majority learning methods can help us to pick fewer data for model training but remain the same level classification accuracy.

As for the training data, both ENV and BML can select proper majority to slightly increase classification accuracy. We can prove that the majority is selected properly by BML in the small size data set.

5.3 Exp. 2.1: Majority Learning on Large Scale Data

Table 9 shows the training result of the diﬀerent majority learning methods. Obviously, ENV needs lots of time for model training, and the hidden node amount is very large.

The SLFNs grew too big, and thus extended the model training time. We have observed that once ENV needs to apply adding hidden nodes process, the later stages are almost inevitable to add hidden nodes. Pure back-propagation process cannot get rid of local optimal problem. Because of the newly hidden node weights were precisely calculated, the gradient descent method was unable to make appropriate adjustments for the neuron weights.

As for softmax and BML, the situation is the same as experiment 1.1: after the model was initialized, the majority can be correctly classified by softmax and BML. Both softmax and BML need to check the condition L for (γN− m − 1) times. The size of the training set in experiment 2.1 is larger, so the execution time in experiment 2.1 is more than experiment 1.1.

Figure 30: False rate of diﬀerent majority learning methods (80% samples).

Figure 30 shows the false rate of the diﬀerent majority learning methods. Compared to experiment 1.1, the classification accuracy became much higher. We surmise that the more data were used for training, the more information would be learned by the models. In addition, when the size of the training set is large enough, ENV has better performance on

classification accuracy compared to BML. However, the trade of is the long model training time. BML can learn the majority very fast while remaining competitive classification accuracy.

5.4 Exp. 2.2: Use ANN to Learn the Larger Amount of Major-ity

Figure 31: False rate of diﬀerent softmax neuron network. (80% samples).

Figure 31 shows the false rate of ANN. The classification accuracy is lower than experiment 1.2. There are two possible reasons for this result. First, we adopt simple ANN in this experiment. That is to say, the ANN does not have any hidden layer. The simple model might not be able to learn the feature of training data such precisely when the size of training set is large. Another possible reason is that we apply gradient descent the same 10,000 times . When the data size becomes larger, ANN model might need more times weight tuning to achieve the same level of classification accuracy.

The majority learning methods indeed select the proper majority. Compared to using all data for training, the diﬀerent majority learning methods have same level classification accuracy on testing data. Also, the outliers are more easily to be mis-classified by the ANN. As for the training accuracy, all of the majority learning methods can increase the

classification accuracy on training data compared to using all data. We can prove that BML can find proper majority on larger training set.

5.5 Exp. 3: Binary Classification Performance

In experiment 3, we designed 3 diﬀerent sampling method. Sampling method “10%” and

“80%” are the experiments for testing small and large size training set. Sampling method

“200” is regarded as the experimental comparison of “10%”, due to the similar in training set size.

Table 12 shows the training result. ENV has no doubt longest model training time.

ENV grows many hidden nodes to fit the majority of the training data. Thus, ENV consumed lots of time for training a perfect model. However, figure 25 and figure 26 indicate that ENV has the worst classification accuracy. We supposed that the ENV models over-fit the training data. So, the trained models have poor classification accuracy on malware testing data.

We mentioned in section 3.3 that the classification of softmax is actually a variant of the condition L. So, softmax and BML are no doubt having similar classification accuracy.

As for the model training time, we will discuss the three diﬀerent sampling methods separately.

In the experiment sampling method “200”, softmax only tuning the weights 3,600 times while BML cramming 53 times and tuning the weights 132,802 times. Thus, the execution time is separately 37 and 1,205 seconds, softmax has more time eﬃcient than BML.

In the experiment sampling method “10%”, softmax tuning the weights 34,971 times while BML does not need any training. BML only needs time for checking majority.

Thus, the execution time is separately 230 and 33 seconds, BML has more time eﬃcient than softmax.

In the experiment sampling method “80%”, softmax tuning the weights 2,023 times while BML does not need any training. Although BML only needs time for checking majority, softmax has a relatively simple model. Forward pass calculation is easier for

softmax model. Thus, the execution time is separately 2,031 and 2,316 seconds, softmax has more time eﬃcient than BML.

在文檔中二元主體學習技術研究與張量流實作 - 政大學術集成 (頁 57-67)