Experimental Studies - 對混淆後之殭屍網路及惡意軟體自動化分析與分類

We design three experiments to verify our proposed algorithm and system design.

First we evaluate the correctness rate of grouping same bots by applying our algorithm to same bot with different obfuscation tools. Then we measure the correctness rate of distinguish different bots by applying our algorithm to different bots with the same obfuscation tool. Finally we shows the execution time with and without our recorder module.

To evaluate the correctness and efficiency of the proposed algorithm and overall system design, we implement a testbed and conduct the following two experiments: (1) 10 random bot agents are chosen and obfuscated with ASProtect, Themida and UPX.

We then feed the obfuscated bot agents to our classification system and measure the classification accuracy by counting the number of correctly classified bot agents, (2) estimating the execution overhead of PIN by comparing the running time of a bot agent with and without PIN.

5.1 Experiment Environment

Fig. 8 Experiment environment

In the experiment environment shown in Fig. 8, a database is used for storing bot agent binary executables and their system call sequences, which are later collected by the recorder module in Sec. 4.2. The controller module loads a bot agent from the database and runs the bot in a virtual machine running Windows XP. The recorder module then collects the system call sequence from the bot agent while it is running.

The virtual machine (VM) makes it easy to fall back to a clean system state before processing the next bot agent. For the virtual machine, we use VirtualBox, which is free and comes with a command line interface (CLI) for controlling virtual machine.

The CLI facilitates the experiment process as the instantiation of a clean VM can be automated via scripts. Inside the virtual machine, we use the version of Windows XP prior to service pack 1 to ensure the maximal compatibility with the bot being analyzed (e.g. DEP in Windows XP might prevent some of the bots from being successfully executed.). All the machines are connected to the Internet through a NAT firewall to prevent malicious traffic from the Internet to interfere with the experiment testbed while allowing bot agents to make connections to the C&C server.

In Table 2, we presents 10 different randomly selected bot agents from 463 bot

samples. These 10 bot agents are packed with different obfuscation tools - ASProtect, Themida and UPX. This results in 40 test targets in total (10 original plus 10 from each obfuscation tool).

Table 2 List of Bots used in the experiment

Id md5 hash Kaspersky Sophos

1 ea46b4606531d28474e06cb4cd060c71 Backdoor.Win32.Anibot.b Mal/IRCBot-B 2 c1ed6261902ebc178f55159ca1b061b1 Backdoor.Win32.Afbot.a Mal/IRCBot-C 3 d7b32cc7056f37eb8ccf0d1f472d8e5b Backdoor.Win32.Rbot.gen W32/Rbot-Gen 4 fa29f9048e3b57705e97583d70f00ba1 Backdoor.Win32.Agobot.gen W32/Agobot-Gen 5 f1f9f762f899a24a2d71a35c4b825db8 Backdoor.Win32.Rohbot.a Mal/Generic-A 6 69fd63dade7cd4f8878c6e80084069fb Backdoor.Win32.Rbot.gen W32/Rbot-Fam 7 4aac3724863070dc422ad0dc0a39a5af Backdoor.IRC.Botva.b Troj/Bckdr-MPJ 8 8a87d88714f2017e2cdd74912449e7cf Backdoor.Win32.DevBot.b Troj/DevBot-B 9 c3207feb5160c71227dbd92cc3fe4e53 Backdoor.Win32.DaSBot.12 Mal/Generic-A 10 0ce8ccbd76e6126ed10350fd70c37d98 Backdoor.Win32.PoeBot.a W32/Poebot-Gen

5.2 Static Analysis Experiment

Fig. 9 PEiD analysis result

In Fig. 9, PEiD is used for analyzing 463 bot samples. PEiD is a static analysis

tool for detecting the existence of obfuscation tool. From the PEiD scan, it finds that 41.2% of the samples are obfuscated. This highlights the rampancy of obfuscated bot samples and justifies the need for better analysis and classification techniques in dealing with bots.

Table 3 Static analysis result (AV engines v.s. obfuscation tools)

ID Anti Virus Non-obfuscate d

ASProtect Themida UPX

1 Antivir TR/Dldr.Small.c af.3

N/D¹ N/D Backdoor.Win32.

Anibot.b NOD32 Win32/Genetik Win32/Geneti

N/D Win32/Genetik

Sophos Mal/IRCBot-B Sus/Behav-32 5

Mal/Behav-28 5

Mal/IRCBot-B

2 Antivir BDS/Backdoor.

Gen

Sophos Mal/IRCBot-C Sus/Behav-32 5

Mal/Behav-28 5

Mal/IRCBot-C

3 Antivir EXP/DameWar e.ggg NOD32 Win32/Rbot Win32/Rbot Win32/Rbot Win32/Rbot sophos W32/Rbot-Gen Sus/Behav-32

Mal/Behav-28 5

W32/Rbot-Gen

4 Antivir BDS/Agobot.3.

200704

N/D N/D Backdoor.Win32.

Agobot.gen NOD32 Win32/Agobot Win32/Agobot N/D Win32/Agobot sophos W32/Agobot-G

5 Antivir TR/Crypt.FKM.

Gen

N/D N/D Backdoor.Win32.

Rohbot.a NOD32 unknown

NewHeur_PE

N/D N/D unknown

NewHeur_PE sophos Mal/Generic-A Sus/Behav-10

To show how obfuscation can easily fool state-of-the-art static analysis, we use Jotti's malware scan to scan the sample #1 to #5 at Table 2. Jotti integrates multiple scan engines. All of these files are scanned at May 5, 2010. Both the original unpacked bot binaries and the obfuscated versions are scanned by this scanner. From Table 3, all four anti-virus scanners correctly identify the five bots in the original forms. However, with obfuscation, both false positives and false negatives are observed, which are shown in bold text in Table 3. All four anti-virus tools can effectively decrypt UPX-packed bots because UPX simply compresses an executable without involving any obfuscation. ASProtect change the program structures substantially and fool the analysis from these anti-virus scanners quite effectively. For sample #6 to #10, they also show the same tendency that AV tools are affected by obfuscation.

5.3 LCS Similarity of Bot Variants Created by Obfuscation

This experiment shows the LCS similarities between bot variants created by obfuscating a bot sample with different packers. The test targets include the 10 bot samples without any obfuscation listed in Table 2 (denoted as group A). We obfuscate each of those 10 bot samples with ASProtect to create ASProtect-obfuscated test targets (denoted as group B). Similarly, we have Themida-obfuscated test targets (group C) and UPX-obfuscated test targets (group D).

Ideally, the similarities between two test targets from the same bot sample should be 1, which means that the two targets belong to the same class. Detailed results are shown as Table 4. In each cell, the value on the first line corresponds to the LCS similarity S(X,Y) (Eq.1). Values on the second line correspond to the Gap Shift values

(Eq.2):

Table 4 Similarities between test targets derived from the same bot sample

24 and R values are consistently low. Segment Identification can improve the accuracy of LCS similarity greatly. For instance, if we turn off segment identification, the LCS similarity between B and C in Sample #5 will drop from 0.91 to 0.79.

5.4 LCS Similarity across Different Bot Samples

This experiment evaluates the LCS similarities across 10 different bot samples. It shows the range of S and N values is different than Sec.5.3. This is conducted on four batches of experiments. First, we calculate the pair-wise LCS similarities from the 10 bot samples. The result is presented in Table 5A. We then calculate the pair-wise LCS

similarities on ASProtect-obfuscated bot samples with the result shown in Table 5B.

The results with Themida-obfuscated bot samples and UPX-obfuscated bot samples are presented in Table 5C and Table 5D respectively.

Table 5 Numeric result on the same obfuscation samples A. Non-obfuscated bots B. ASProtect obfuscated bots

2 3 4 5 6 7 8 9 10 C. Themida obfuscated bots

2 3 4 5 6 7 8 9 10

26 D. UPX obfuscated bots

2 3 4 5 6 7 8 9 10 lower than the LCS similarities between variants of the same sample (give reference to the previous section). Some of the pairs have high LCS similarity values. However, their Gap Shift values ( ) are also high (see Sample 2 v.s. 4 in Table 5A). Therefore the formula (3) is a good criteria for deciding whether two sample are similar.

5.5 Choosing the threshold values for S and R

In order to determining the threshold values T_S and T_R, we plot the curve of classification correctness vs. TS and also the curve of classification correctness vs. TR.

For the classification correctness, we consider both the correctness on identifying R

class of bot variants (#_of_pairs_above_T_S_in_Table 4 / #_of_pairs_in_Table 4), and the correctness on distinguishing different samples (#_of_pairs_below_TS_in Table 5/

#_of_pairs_in_Table 5).

Fig. 10 LCS correctness threshold

We consider 60% as an appropriate threshold value for LCS similarity because for identifying class of bot variants created by obfuscating a source bot sample with different packer tools, this achieves 90% correctness (i.e. for about 6 out of the 60 pairs of the bot variants used in Table 4, the pair-wise LCS similarity is incorrectly reported to be below 60%).

For distinguishing different bot samples, the classification correctness (distinguishing them as unrelated bots) rises linearly when increasing the threshold value for S. With the threshold value of 60%, the classification correctness on distinguishing different bot samples is only about 40%.

Fig. 11 Gap Shift correctness threshold

To improve the correctness on distinguishing different bot samples, we need to consider the gap shift value as well (Eq. 2). As shown in Fig. 11, if we choose 6%

as the threshold value TR for , we can correctly distinguish different samples with 94% probability while maintaining the correctness on identifying class of bot variants at 90%.

5.6 Classification Result Comparison

Based on the threshold experiments in previous section, the samples in Table 2 can be classified into groups by our algorithms. Table 6 is the comparison of the number of samples that classify correctly based on our algorithms and Anti-Virus Engines. It shows our algorithms can classify more correctly.

Table 6 The rate of classified correctly in total samples

Our algorithms Antivir Kaspersky NOD32 Sophos

90% 68% 70% 80% 48%

5.7 Efficiency Experiment

As mentioned before, PIN Tool uses JIT compiler to instrument the target program with check code dynamically. In this experiment, we want to observe the

R R

overhead introduced by the instrumentation and the recorder module.

Fig. 12 Execution time with and without PIN

Fig. 12 shows the execution times for running bot samples directly (without PIN) and with our recorder module (together with PIN). We use two samples (Agobot and Rbot) for this experiment. We also examine the execution time for running the obfuscated versions of the bots. We employ three obfuscation tools (B: ASProtet, C:

Themida, D:UPX, and A is without obfuscation). Overall, the overhead from using our system is around 55%.

在文檔中對混淆後之殭屍網路及惡意軟體自動化分析與分類 (頁 26-36)