Artificial neural network - 基於可疑行為及類神經網路之惡意軟體偵測機制

Chapter 3 Background

3.3 Artificial neural network

An artificial neural network (ANN) is a kind of machine learning algorithms. It is a calculating system which can mimic the neural network systems of creatures to solve complex problems. An ANN is composed of several interconnected artificial neurons.

Each neuron has an I/O characteristic and implements a local computation. The output of a neuron is determined by its I/O characteristic, the interconnecting structure with other neurons, and external inputs [19]. By using simple mathematical techniques and training a plenty of data, the ANN will have the ability of inference and judgment to solve problems [20]. Since the ANN has the ability of fault tolerance and optimization, it can solve extremely complex problems which other algorithms cannot solve. Thus, the ANN is widely used in the computer science fields, e.g. data mining, clustering, classification, prediction, pattern matching, and so on. There are several malware detection schemes which also used the ANN to match the binary code patterns of the malware [21] [22]. In this thesis, use an ANN to classify unknown samples into malware and benign software.

Figure 2. The architecture of the proposed ANN-MD.

Figure 2 shows the architecture of the proposed ANN-MD. There are two phases in the ANN-MD: training phase and testing phase. The training phase is responsible to train abundant samples, adjust the weights of each behavior using ANN, and then construct an MD expression. First, we collect some common suspicious behaviors identified from three sandboxes [9] [10] [11] and store them in the suspicious

behavior database. Next, we submit the training samples including malicious and benign ones to the sandbox web sites for collecting the runtime behaviors of them. By comparing each sample’s runtime behaviors with the behaviors in the suspicious behavior database, we can train and adjust the weights of each behavior by using ANN. At the end of the training phase, we construct an MD expression. According to the MD values of the training samples, we can set an optimum MD threshold, as shown in Figure 3. The quantities of samples at the two ends of the double-headed arrows line are relatively large. There are a few ambiguous samples at the middle of the line. The optimum MD threshold can discriminate malicious samples from benign samples located at the ambiguous area. The testing phase is responsible to test and judge whether an unknown sample is malicious or not. We first submit the unknown sample to the sandboxes for collecting its runtime behaviors. By using the MD expression which was constructed at the end of training phase, we can calculate the MD value of the unknown sample. If the unknown sample’s MD value is larger than the MD threshold, the unknown sample is identified as malware. Otherwise, it is identified as benign software.

Figure 3. Distribution of malicious and benign samples.

0 1

4.1 Suspicious behaviors

As mentioned above, we used three sandboxes [9] [10] [11] to collect 13 common suspicious behaviors. We submit samples to these three sandboxes to calculate the appearance frequency of each behavior. We first choose the behaviors in the intersection of the suspicious behaviors identified by these sandboxes and store them to the suspicious behavior database. We eliminate the behaviors which have low appearance frequency or even do not appear (appearance frequency < 15%). Next, we store the behaviors which are not in the intersection but have comparatively high appearance frequency into our database (appearance frequency ≥ 15%), too. The names and descriptions of these 13 suspicious behaviors are listed in the following:

1. Creates Mutex

- Obtains the exclusive access to system recourses [17].

2. Creates Hidden File

- Creates file without the notification of the user.

3. Starts EXE in System

- Executes EXE without the permission of the user.

4. Checks for Debugger

- Checks whether there is any anti-virus systems under the environment.

5. Starts EXE in Documents

- Documents execute EXE automatically without the permission of the user.

6. Windows/Run Registry Key Set

- Creation, modification, or deletion of Windows registry key.

7. Hooks Keyboard

- Checks keyboard values.

8. Modifies File in System

- Modifies files in the system permanently.

9. Deletes Original Sample - Deletes the original sample.

10. More than 5 Processes

- Creates more than 5processes.

11. Opens Physical Memory - Accesses physical memory.

12. Delete File in System

- Deletes a file in the system without permission of the user.

13. Auto Start

- Starts automatically when the system reboots.

Table 2 shows the appearance frequencies of malicious and benign samples. Note that the suspicious behaviors we chose all have much higher appearance frequencies in malicious samples than that in benign samples.

Table 2. The appearance frequencies of malicious and benign samples.

No. Behavior

Appearance frequency of a malicious sample

Appearance frequency of a benign sample

1 Creates Mutex 53.8% 2.4%

2 Creates Hidden File 65.4% 8.0%

3 Starts EXE in System 54.4% 11.0%

4 Checks for Debugger 37.1% 9.0%

5 Starts EXE in Documents 34.0% 1.4%

6 Windows/Run Registry Key Set 72.0% 3.2%

7 Hooks Keyboard 25.4% 2.0%

8 Modifies File in System 28.6% 3.4%

9 Deletes Original Sample 16.0% 0.6%

10 More than 5 Processes 16.7% 2.4%

11 Opens Physical Memory 34.8% 6.0%

12 Delete File in System 15.4% 3.0%

13 Auto Start 35.6% 0.0%

4.2 ANN topology

Figure 4. Topology of our artificial neural network.

Figure 4 shows the topology of our ANN. It is a FeedForward Neural Network model. The model can be divided into three layers: input layer, hidden layer, and output layer. The input layer consists of the suspicious behaviors of an input sample.

The hidden layer and output layer contain several neurons which are marked as the dotted line area. If there is more than one hidden layer between the input layer and the output layer, the neural network model will be called as a multi-layer neural network.

The major functionality of the hidden layer is to increase the complexity of a neural network. Thus, a multi-layer neural network can resolve more complicated non-linear problems than a single-layer one. The more the number of hidden layers in a neural network is the more complex the neural network will be. However, if there are too many hidden layers in a neural network, it will become an over complex neural network model and may results in over fitting [23]. Thus, it is important to choose the

optimum number of the hidden layers and the neurons in them. However, it is regarded as a difficult work to obtain the optimum number of hidden layers and their neurons. At least for now, there is no certain mathematical approach to achieve this goal yet [24]. In the proposed ANN-MD, a two-layer ANN with one hidden layers is founded. In the hidden layer and output layer, we set the number of neurons as 10 and 1, respectively. The operational details of each neuron will be described in the

Figure 5. A neuron in the hidden layer.

Figure 5 illustrates the operational details of a neuron (the first one) in the hidden layer. The inputs are 13 suspicious behaviors of a sample, i.e. Behaviors1 – Behaviors₁₃. The input value will be marked as 1 if a sample has the corresponding suspicious behavior. For example, a sample has No.1, No. 2, No. 6, No. 8 suspicious behaviors, the input data of this sample will be [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0].

Multiply these inputs by their corresponding weights in the neuron and do the summation. Then add the neuron’s bias to the summation value. Substitute the result into the transfer function ^{( )}( ) to get the output value of this neuron.

. . . a₁

a₂ a₃

a₁₀

ω1'

n’

b’ ω2'

ω3' ω10'

Figure 6. A neuron in the output layer [25].

Figure 6 illustrates the details of a neuron in the output layer. The input values of this neuron are a₁ – a₁₀, which are the output values of the ten neurons in the hidden layer. Multiply them by the corresponding weights of each neuron and do the summation. Then add the neuron’s bias to the summation value. Substitute the result into transfer function ^{( )}( ) to get the final output value. We chose the tangent-sigmoid function:

as the transfer functions of our ANN, i.e. ^{( )}( ) and ^{( )}( ), since it is often used to resolve the classification problems.

4.3 MD expression

According to the neural network model mentioned above, we can construct an MD expression. Define set | be a sample’s suspicious behaviors. hidden layer and represents the bias value of the neuron in the output layer. The MD expression can be represented as follows:

^{( )}(∑ ^{( )}(∑ the target value we gave previously. If a sample is malicious, the target value will be set to 1. On the contrary, will be set to 0. denotes the final output value of the ANN. η , where η represents a learning factor and represents a set of input values. The value ofη is between 0 and 1. The larger the η is the larger the is; however, under these circumstances, the ANN will be more unstable. As a tradeoff, in our scheme, we setη to 0.5. The new weights can be calculated according to the following formula: . The more close to zero the mean square error is, the more convergent and more stable the ANN is.

Chapter 5 Evaluation

5.1 Experimental settings

We utilized Matlab 7.11.0 to implement the ANN of our scheme. The architecture of the ANN from Matlab is shown in Figure 7, which is corresponds to that in Figure 4. We take tangent-sigmoid as the transfer functions in both the hidden layer and the output layer. The 13 possible suspicious behaviors of a sample are the input values of the ANN. By serial calculation of the ANN, there will be an output value, which is the sample’s MD value. In order to distribute the weight of each neuron in the layer evenly, the initial values of the weights and the bias are chosen by a built-in function, initnw, according to the Nguyen-Widrow initialization algorithm [26].

Figure 7. Architecture of our ANN (from Matlab).

The numbers of malicious and benign samples we used for experiments are shown in Table 3. The size of the sample space is 2200, which is divided into benign samples and malicious samples. We selected 1000 portable execution files which originally exist under the Windows directories after the installation of Windows XP SP2 at the first time as the benign samples. The 1000 malware samples we used were

downloaded from Blast’s Security [27] and VX Heaven [28] websites. Among 1000 malicious (benign) samples, 500 (500) samples were used in the training phase and the other 500 (500) samples were used in the testing phase, as shown in Table 3.

Besides, to further verifying the feasibility of the proposed ANN-MD, we chose another 200 samples (100 malicious samples and 100 benign samples), which are different from the training sample space. The 100 malicious samples were from the database of National Communications Commission, NCC, of Taiwan (collected by five Internet Service Providers (ISPs) in Taiwan) and the 100 benign samples were downloaded from the CNET.com [29] website.

Table 3. Numbers of benign and malicious samples.

Phase Malicious Benign Total

Training 500 500 1000

We use 9 matrices to evaluate the proposed ANN-MD and the related schemes, as follows:

- False Positive Rate (FPR) = FP / (FP + TN) - True Negative Rate (TNR) = TN / (FP + TN) - Accuracy Rate = (TP + TN) / (TP + FN + FP + TN)

5.2 MD threshold selection in the training phase

The distribution of the numbers of training samples is shown as Figure 8.

According to this distribution, we can set a possible range of the MD threshold. For benign samples, we choose the largest MD value such that the number of benign samples at this MD value is larger than 10 as the lower bound of the possible range of the MD threshold. For malicious samples, we choose the smallest MD value such that the number of malicious samples at this MD value is larger than 10 as the upper bound of the possible range of the MD threshold. In Figure 8, the possible range of the MD threshold is between 0.19 and 0.87.

Figure 8. Distribution of the numbers of training samples under different MDs.

We calculate the accuracy rate, FPR, and FNR under different MDs from 0.19 to 0.87, as shown in Figure 9. First, we narrow down the MD range to MD value =0.5 and MD value=0.59 since the accuracy rates in this range are the highest one, i.e. 98.3

%. Then we narrow down the range with the lowest FPR and FNR. Finally, we set the MD threshold as the lowest MD value in this range, i.e. 0.5.

Figure 9. The accuracy rate, FPR, and FNR under different MDs.

5.3 Performance of ANN-MD

5.3.1 Using the same testing and training sample space

In this experiment, we used the same testing sample space as the training sample space to evaluate the performance of the proposed ANN-MD. The experimental results with MD threshold = 0.5 using the proposed ANN-MD are shown in Table 4.

It shows that there are only 4 false positive testing samples among the 500 benign testing samples. And the false negative testing samples are 15 among the 500 malicious testing samples. The FPR and the FNR are 0.8% and 3.0%, respectively, which are relatively low compared to two existing schemes [5] [7] which will be shown in Table 7. The accuracy rate of the ANN-MD is 98.1%. Figure 10 illustrates the distribution of the number of testing samples. It shows that ANN-MD can distinguish malicious samples from benign samples with high accuracy.

Table 4. Experimental results using the proposed ANN-MD under the same sample space.

TP TN FP FN FPR FNR Accuracy rate

485 496 4 15 0.8% 3.0% 98.1%

Figure 10. Distribution of the numbers of testing samples under different MDs.

We conducted an experiment to evaluate the effects of different initial weights to the proposed ANN-MD. The results are shown in Table 5. It shows that FPR, FNR and accuracy rate for the initial weights chosen by function initnw are the best. And the FPR, FNR and accuracy rate for the initial weights of the hidden layer chosen by the appearance frequency of each behavior are second worse. The worst one is the one without using ANN, where the appearance frequency of each behavior is used to set its corresponding weight. Since this case does not use ANN to train and adjust the weights of each behavior, its accuracy rate is only 93.7%.

Table 5. The FPR, FNR, and accuracy rate under different initial weights.

Weights

5.3.2 Using different testing sample space from the training sample space

In order to verify the feasibility of the proposed ANN-MD, we conducted another experiment by using a sample space in the testing phase which is different from the sample space in the training phase.

The experimental results with MD threshold = 0.5 using the proposed ANN-MD are shown in Table 6. It shows that there are 5 false positive samples among the 100 benign samples. The FPR is 5.0%. And there is 1 false negative sample among the 100 malicious samples. The FNR is 1.0%. The accuracy rate of the ANN-MD is 97.0%, which means that the proposed ANN-MD still has a high accuracy rate even using different testing sample space from the training sample space. Figure 11 illustrates the distribution of the numbers of samples under different MDs. It shows that ANN-MD can distinguish malicious samples from benign samples with high accuracy.

Table 6. Experimental results using the proposed ANN-MD under different sample space.

TP TN FP FN FPR FNR Accuracy rate

99 95 5 1 5.0% 1.0% 97.0%

Figure 11. Distribution of the numbers of samples under different MDs.

5.4 Compared with existing schemes

5.4.1 Using the same testing and training sample space

Table 7 shows the comparisons among the proposed ANN-MD and two related schemes, MBF [5] and RADUX [7]. We implemented these two schemes and tested them with the same samples used in the experiment in section 5.3.1. In Table 7, the FPR of ANN-MD is 0.8%; however, the FPR of MBF is 5.6% and the FPR of RADUX is 14.2%. In Table 7, the accuracy rate of ANN-MD is 98.1%; however, the accuracy rate of MBF is only 88.7% and the accuracy rate of RADUX is 91.2%.

Table 7 indicates that the proposed ANN-MD is better than MBF and RADUX on unknown malware detection.

Table 7. Comparison of the proposed ANN-MD with two related schemes by using the same testing and training sample space).

Approach

5.4.2 Using different testing sample space from the training sample space

Table 8 shows the comparison among the proposed ANN-MD and two related schemes, MBF [5] and RADUX [7] by using different testing sample space from training sample space). The FPR of ANN-MD is 5.0%; however, the FPR of MBF is 44.0% and the FPR of RADUX is 68.0%. The accuracy rate of ANN-MD is 97.0%;

however, the accuracy rate of MBF is only 77.5% and the accuracy rate of RADUX is only 66.0%. Table 8 indicates that the proposed ANN-MD is much better than MBF and RADUX even when using different testing sample space from training sample space. This is due to that MBF and RADUX use static weights in the training phase.

Table 8. Comparison of the proposed ANN-MD with two related schemes by using different testing sample space from the training sample space).

Approach

TPR FNR Accuracy

rate

FPR TNR

ANN-MD (proposed)

99.0% 1.0%

97.0%

5.0% 95.0%

MBF [5]

99.0% 1.0%

77.5%

44.0% 56.0%

RADUX [7]

100.0% 0.0%

66.0%

68.0% 32.0%

Chapter 6 Conclusions and Future Work

6.1 Concluding remarks

In this thesis, we have proposed an artificial neural network-based behavioral malware detection (ANN-MD). By observing and analyzing known malware’s behaviors obtained from sandboxes, we construct a malicious degree (MD) expression.

We have collected 13 common suspicious behaviors. We utilized ANN to train and adjust the weight of each behavior to obtain an optimum MD expression. With the MD expression, we can calculate unknown software’s MD value and judge whether the software is malicious or not according to its MD value. Experimental results have shown that the proposed ANN-MD has a high accuracy rate of 98.1% (using the same sample spaces as the training sample spaces), which is better than the accuracy rate of 88.7% in MBF [5] and the accuracy rate of 91.2% in RADUX [7]. In addition, the FPR (FNR) of the proposed ANN-MD is 0.8% (3.0%) (using the same sample spaces as the training sample spaces), which is much smaller than FPR (FNR) of 5.6%

(17.0%) in MBF and FPR (FNR) of 14.2% (3.4%) in RADUX. In order to further verify the feasibility of the proposed ANN-MD, we conducted another experiment by using a different sample space in the testing phase from the training phase.

Experimental results show that ANN-MD still has a high accuracy rate of 97.0%, even though the testing sample space is different from the training sample space.

However, MBF and RADUX only have the accuracy rates of 77.5% and 66.0%, respectively. In addition, the false positive rate of ANN-MD is 5.0%, which is much

smaller than the false positive rate of 44.0% of MBF and the false positive rate of 68.0% of RADUX. This is due to that MBF and RADUX use fixed weights in the training phase. The experimental results have supported that the proposed ANN-MD is a promising methodology in detecting unknown malware and the variations of known malware.

6.2 Future work

In the proposed ANN-MD scheme, we only consider the host behaviors of malware. In addition, the malware detection system we have implemented is semi-automatic, which is time-consuming. Our future work will focus on adding some network suspicious behaviors to our scheme and automating the malware detection system to achieve higher accuracy rate, lower FPR, lower FNR, and faster alarm.

References

[1] C. Mihai and J. Somesh, “Static analysis of executables to detect malicious patterns,” in Proceedings of the 12th conference on USENIX Security Symposium, Vol. 12, pp. 169 - 186, Dec. 2006.

[2] J. Rabek, R. Khazan, S. Lewandowskia, and R. Cunningham, “Detection of injected, dynamically generated, and obfuscated malicious code,” in Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76 - 82, Oct. 2003.

[3] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: a tool for analyzing malware,”

in Proceedings of 15th European Institute for Computer Antivirus Research, Apr.

2006.

[4] M. Egele, C. Kruegel, E. Kirda, H. Yin, and D. Song, “Dynamic spyware analysis,”

in Proceedings of USENIX Annual Technical Conference, pp. 233 - 246, Jun.

2007.

[5] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,” in Proceedings of Complexity and Data Mining (IWCDM), pp. 39 - 42, Sep. 2011.

[6] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple execution paths for

在文檔中基於可疑行為及類神經網路之惡意軟體偵測機制 (頁 20-0)