Malware (malicious software) remain a serious problem in spite of the wide use of various anti-virus applications. For the time being, thousands of new malware are being generated per day. According to the reports [1] [2], there are 1,017,208 instances of new malware were detected in the first half of 2010, approximately 10% more than the previous half year. The malware writers continuously develop new methods of polymorphism and metamorphism such as obfuscation, encryption, or packing to evade signature-based detection. Furthermore, metamorphism enables malware to change its appearance when every time it propagates. To deal with such large numbers of malware instances efficiently, automatically deriving representative malware behavior patterns, which are used to recognize a whole malware family, is necessary. Fortunately, the observation that numerous malware share common behaviors enables us to derive a generalized signature for each group of them. In doing so, testing whether a malicious program belongs to an existing well-known group of malware can be determined efficiently. In this chapter, we give a brief introduction to existing related schemes, proposed methods, and our contribution.
1.1. Background
For recognizing malware families with behavior pattern, in this section, we indicate that why not use signatures but behavior patterns, the existing monitoring mechanisms and their drawback, and the malicious behaviors generally focus.
1.1.1. Behavior Patterns
Signature-based recognition is the most widely used approach, but one signature could not identify other malware. Due to the continuously development of malware program, it is no longer valid to deal with the large number of mutant malware programs.
2
Using behavior patterns to recognize malware family is efficient. As indicated by recent studies [3], each malware instance in the same family shows similar behavior patterns.
Because most of original malware are created by the same authors, they also have several different versions though many times upgrade. In addition, other authors often rewrite the existing malware programs. According to these reasons, one behavior pattern is useful to recognize lots of malware that in the same family.
Checking arguments of API is effective. When the malware is executed, it must invoke APIs with several arguments. Hence, the API invocation sequences are adapted to represent the malware behavior. Moreover, each API‟s name and argument represents withmeaningful word, so that it is easy to use when analyzers want to functionality of malware programs. To profile arguments of known malware and the frequently used arguments of each family could be apply as behavior pattern for future recognition.
1.1.2. Monitoring mechanism
In order to observe malware behavior, based on considering the system call workflow from user level to kernel level, we separate the monitoring mechanisms into two perspectives for discussion. The former mention that what kind of the object we monitor, the latter is about where to monitor. In addition, we define” in-kernel function” as the low level kernel functions that system call must invoke. The reference of the section is depicted in Figure 1.
Objectives for monitoring: Monitoring on user-level library APIs, attacker cloud invoke system call directly without using the user-level library APIs, therefore, the monitor mechanism is bypassed. Otherwise, monitoring on system call, Rootkit could enter kernel level directly instead of invoking system call, hence the monitor is invalid.
In-system monitoring: As long as the monitor and malware exist in the same circumstance, the monitor mechanism could be overridden by in-kernel level Rootkit. No matter monitoring on kernel level APIs or user-level library APIs, such us the approaches in
the previous mention, the result are no different.
1.1.3. Malware Behavior
In this paper, we monitor process, registry, file, network as recent studies and the famous malware analysis website [4] [5] [6] , because all malware have the subset of these four types object‟s behaviors. Out research is integrity and sufficient that not less than other related works. Running malware under our monitor system, the outcome is a human readable report which profile malware behaviors. The report contains sufficient information, including the cross-process malware interaction, the contents of malware communication over network, registry modification, dynamic API loading, etc.
Figure1. System calls workflow
1.2. Requirement
We believed that an ideal recognizing malware family system should provide following features: Automatic behavior pattern generation, in order to cope with malware efficiently;
Accuracy, means that using behavior patterns to recognize malware family with low false positives and low false negatives; Non-circumventable, no malware is liable to bypass the monitor, which is to ensure that the system could get the malware behavior completely.
4
1.3. Concept
In order to achieve non-circumventable monitor, we analyze malware behavior by collecting in-kernel function calls and arguments from the underlying emulator. All of above monitor mechanisms are too easy to bypass and cannot capture the malware behavior information completely. Because of no matter how malware program avoids using user-level API, it must invoke in-kernel function finally. For the reason, we monitor in-kernel functions even arguments. Also, when monitoring in-kernel level functions, in-kernel level Rootkit could override the monitor mechanism. To overcome this problem, we use out-of-box hooking technique, to build our monitor on the underlying emulator, so monitor and malware are not in the same space that the monitor mechanism works well.
For the purpose of recognizing malware family with high accuracy, we use tainting to precise the monitor result. Taint could track which arguments are related with malware.
When a monitor system working without taint, it cloud only distinguish process between tested malware and other program by the help of the CR3 processor register, nevertheless, not know which arguments have high relationship with malicious behavior. Especially, taint could monitor relations between data across multiple processes, even in kernel. Using taint help us to get the malware information more completely, thus, improving the accuracy of recognize malware family.
We produce an automatic pattern generation system, the basic function of malware analysis. We extract invocation sequence to dilute unrepresentative information, in order to precise the behavior traces before generate pattern. For example, when in-kernel function arguments involve meaningless string such as hashed filenames, this information must be dilute. Finally, the system describes in-kernel function transitions with Hidden Markov Model (HMM). The HMM is easy used to recognize malware family, so is suitable for our system.
1.4. Contribution
In this paper, we proposed a novel approach to generate a behavior pattern for a family of malware. In addition, two important features distinguish our work from existing researches.
Firstly, unlike previous approaches, which can be circumvented by lower-level hooking or overwriting, our out-of-box in-kernel function hooking is inescapable for malware being tested. Secondly, taint-based argument checking gives more accurate behavior profiling because the taint status help us differentiate between arguments fed by malware and those by benign programs running in background. Thirdly, the taint propagation is done system-wide, and it can hence deal with cross-process malware, which are not covered by previous work.
Obviously, our system produces more complete malware behavior patterns than other approaches. An experiment on 511 malicious samples originating from 15 different families was performed. The evaluation result shows that our behavior patterns give zero false positive and low false negative (less than 5.8%) at recognition phase.
1.5. Synopsis
The paper is organized as follow. Chapter 2 gives introduction to related works. Chapter 3 gives the detailed description of our system. Implementation and evaluation are in Chapter 4 and Chapter 5. At the end of the paper, we make an overall conclusion in Chapter 6.
6