Background - 對混淆後之殭屍網路及惡意軟體自動化分析與分類

Botnet pulls bots together to initiate large-scale attacks. In this section, we give a

brief overview of the life cycle of botnet, including the injection of bot agents into victim computers, establishing connection to C&C server, and the attack stage. We also discuss related works on analysis and classification of bot binaries.

2.1 Taxonomy of Botnet

Fig. 1 Botnet Lifecycle

A botnet typically consists of three operations as shown in Fig. 1: (1) injection, (2) establishing connections to C&C server and waiting for commands, and (3) launching attacks. During injection, bots are injected to computers on the Internet.

The injection of a bot into a target computer can be achieved via exploiting a remote vulnerability of the computer, disguising the bot as a harmless e-mail attachment, including the bot in some software package that is likely to be downloaded to the target computer via P2P file sharing, and etc. After a bot is injected into a computer, it can soon start seeking for other vulnerable computers and infect those computers as well. In this case, the growth rate of a botnet is exponential.

After a bot injects itself into a computer, it will attempt to establish a channel with the C&C server, which is often an IRC server. The attacker (bot herder) can then issues commands to or receives messages from the bots via the C&C. Fig. 1 shows how a botnet is formed from the C&C servers and the bots. To hide the location of the

bot herder, proxies (or step stones) can be used in-between the bot herder and the C&C server. Sometimes, a botnet will have more than one C&C servers. This prevents the botnet from failing in case the single C&C crashes or powers off.

Once a botnet is formed, depending on the types of bots in use, different types of attacks such as e-mail spam, DDoS attack, click fraud, etc can be carried out by the bots on the botnet collaboratively. Unlike a traditional network attack, in which attack traffic originates from only a few hosts, botnet-induced attacks can involve attack traffic from thousands of sources, and that makes tracking and blocking the attack traffic much more difficult. Furthermore, the attack traffic from a botnet can be huge when the participating bots all launch attacks around the same time. For instance, the botnet formed by MyDoom[18] employs 160,000 computers to generate DDoS traffic targeting the web site of SCO.

2.2 Overview of Bot Analysis

As botnet is formed by bots, one way to look at the botnet is by analyzing its constituent bot. By understanding the internals of a bot, we can have a clear picture of not only how the botnet is formed but also the attack vectors associated with the botnet.

For the analysis of bots, there are two different approaches: static analysis and dynamic analysis. Static analysis analyzes a bot binary without executing it. It typically involves dissembling the binary code, and then depicts its function call graph statically. E. Carrera, and G. Erdelyi[5] use graph isomorphism techniques to compare the call graphs from collected malwares, then use the comparing result to determines the similarity of collected malwares. Z. Liang, T. Wei, Y. Chen et al.[6]

merge function calls into modules, these modules perform specific types of jobs, such as file modifying, registry modifying, and command handling. Q. Zhang, and D.

Reeves[7] look at specific patterns of assembly code and use the patterns to measure the similarity between malwares. The above works cannot handle obfuscated samples correctly, so S. Cesare, and Y. Xiang[8] design an unpacker that dump the original program from memory, after obfuscation tools restore it in memory for executing.

Then they analyze this program for avoiding obfuscation.

Static analysis is typically very efficient. They may explore all execution paths in a malware. However, this means that binary obfuscation[19] can easily fool the static analysis by incurring additional execution paths, and shuffling and twisting the execution paths in an obfuscated malware. They can also restructure the data variables and tables in a binary to confuse the static analysis further. The experiments in 5.2 shows that some bots use obfuscation to hide themselves, and static analysis cannot identify them well.

Dynamic analysis is the most effective solution in obfuscated malware analysis.

U. Bayer, C. Kruegel, and E. Kirda[12] proposed a system named "TTAnalyze", which executes a malware sample inside a virtual machine and observes behaviors like file modification, registry modification and network access from the sample. A key challenge in dynamic analysis is that only those control paths that are actually executed will be analyzed. A. Moser, C. Kruegel, and E. Kirda[14] use speculative branch prediction to overcome this challenge. M. Bailey, J. Oberheide, J. Andersen et al.[13] measure the similarity between bot samples based on the result from dynamic analysis, but they only look at high-level information like file name, connected host, and registry, which is easy to be randomized in bot samples. C. Willems, T. Holz, and F. Freiling,[15] also use a virtual machine to conduct dynamic analysis on bot samples.

They also provide a public web interface to their analysis environment, where people can upload suspicious malware samples for analysis. P. Trinius, J. Gobel, T. Holz et

al.[16] try to use a block diagram to present the system calls of a program, this

diagram can help us to distinguish malware more easily. J. Li, M. Xu, N. Zheng et al.[17] collect system call sequence by hardware virtualization, then try to identify

function blocks based on the patterns occur mostly. Their comparison is based on the blocks, but ours is based on the system calls and the arguments.

2.3 Research Goal

We propose a framework that unifies the analysis and the classification of obfuscated malware. We rely on dynamic analysis techniques to extract system call details of an obfuscated bot binary. We consider not only the system call IDs, as seen in previous work, but also the call arguments used in each system call to improve the resolution of our analysis. For the classification, we devise a similarity metric based on the longest common subsequences from the extracted system call information.

Finally, we notice that obfuscation often relocates code segments in a binary. In evaluating the similarity metric, we adopt some heuristics to compensate for these relocations, which would otherwise decrease the classification accuracy dramatically.

在文檔中對混淆後之殭屍網路及惡意軟體自動化分析與分類 (頁 10-14)