In the work, we focus on achieving a good balance between open network (the most transparent) and closed network (the most secure) for a dynamic malware analysis environment. Our system can be generally applied to any kind of malware.
However, the benefit is most eminent when it is used on malware that exhibits some sort of network activities (e.g. propagation over the Internet). Among all the different types of malware, bot, a type of malware that takes commands from a controller on the Internet to achieve specific attack goals (e.g. denial-of-service attack to a specific target on the Internet) is the one that involves most network activities. In this chapter, we will give a brief overview of the network activities involved in a bot’s operation and mention how existing dynamic analysis systems deal with malware’s network activities and their shortcomings.
2.1 Network Traffic of Botnet
Figure 1: Botnet operations
Typically, bots are designed to function in a collective manner as shown in
Figure 1, where a controller run by a malicious guy commands a herd of bots to carry out attack on a victim (i.e. Victim #2 in Figure 1). The whole system is often referred to as a ‘botnet’ [13-14], meaning a network of bots. A bot may attempt to proliferate itself onto other machines over the Internet (Bot #1→Victim #1 in Figure 1). This increases the number of bots in a botnet and can make a subsequent attack more powerful. Note that those machines infected by bots can be the target of attack as well.
For instance, part of the purpose of a botnet may be to steal private personal information, such as passwords and credit card numbers, from the infected machines.
As shown in Figure 1, a botnet involves a lot of network activities. The network activities of a botnet can be roughly categorized into three different types:
propagation, C&C communication, and attack [13-14, 17]. Propagation is used for increasing the population of bots. C&C communication is for the controller to command the bots and for the bots to send information (e.g. credit card numbers) back to the controller. Finally, attack corresponds to those network traffic generated by the bots for the purpose of attacking the target victim.
2.2 Related Works
Since a lot of malware, especially the bot, involves network activities in their operations. In dynamic analysis, one will have to be able to provide a compatible network environment in order to get the malware run properly. On the other hand, the environment has to be secure so that the malware will not actually cause damages to the Internet. From here, there are two possible approaches for setting up the network environment for dynamic analysis. The first is to allow the malware to have full Internet access (i.e. open network) [11, 18]. This is obviously very dangerous. The second approach (the mainstream approach) is to use a closed network environment, in which the dynamic analysis environment is completely disconnected from the
5
Internet [12, 18-19]. While this approach is very secure, it is not very effective for capturing the full runtime behavior of a malware.
The work that is most closely related to ours is the Honeypot project [20-21].
Although their goals are fundamentally different from ours, their system also involves traffic redirection. However, they did not address the state synchronization issue between the would-be victim and the honeypot that receives the redirected traffic. As a result, if the attack traffic is carried by a stateful connection (e.g. with TCP, or some upper layer stateful protocols), the abrupt redirection will cause the connection to be broken, and the full behavior of the malware is still unknown.
To prevent a malware that had been implanted in a honeynet from leaking out attack traffic to the Internet, the Honeynet project proposed the payload rewriting technique that can nullify any attack effect in Internet-bound attack traffic [22]. Again, the payload rewriting will cause the attack to fail and the full behavior of the malware is left unknown.
The work [23-24] by E. Alata et al. assumes C&C communication can be recognized and transparently relayed to the Internet. They also relay DNS traffic to the Internet and well-known ports of vulnerable services (e.g. TCP port 139 and 445) traffic to honeypots directly. Except them, all the other traffic from the analysis environment is filtered. Because they relay traffic for some specific ports directly, they have no protocol state synchronization issues between the would-be victim and the honeypot that receives the redirected traffic. Their design relies on the fact that bot C&C communication has been based on the well-known IRC protocol for a long time.
However, nowadays, we have seen bot C&C communication running customized protocols. Some of them even involve encryption that is almost unbreakable [25-27].
In our system, we follow a different design, in which no assumption is made about C&C communication being recognizable.
The work [28] by G. Berger-Sabbatel et al. also assumes C&C communication can be recognized. They monitor plain text C&C communication for a few weeks, and simulate the C&C server. For cipher text C&C communication, they relay to real C&C servers on the Internet. They did not address the state synchronization issues when redirecting packets in the middle of a connection. However, we can handle the synchronization issues even if the redirection occurs in the middle of a connection.
7