Based on above, we introduce our implementation. Environment parameters and architecture are in Table 1 and Figure 7. We modify Xen 4.2.1 kernel and use CPUID-based approach to intercept system calls. Dmm_Tool mainly negotiates communication, which 1) makes Xen start to intercept system calls, 2) retrieves system call information from Xen 3) and transmits it to the behavior engine through JNI[12]. The behavior engine is deployed on Dom0 for malicious behavior identification.
Host OS Fedora Linux Core 3.9.10 (x86_64)
Guest OS Windows 7 (x86_64)
Virtual Layer Xen hypervisor 4.2.1 Language Java SE 1.7
Table 1. Environment parameters
Definition 6:Encode Function (EF), EF ∶ Information Flow Path → String
For an information flow path 𝑑𝑑𝑖𝑖𝑡𝑡 =
�〈𝑑𝑑1, 𝑑𝑑2, … , 𝑑𝑑𝑘𝑘〉, 〈𝑡𝑡(1,2), 𝑡𝑡(2,3), … , 𝑡𝑡(𝑘𝑘−1,𝑘𝑘)〉� in a IFMG
EF(𝑑𝑑𝑖𝑖𝑡𝑡) = " < 𝑑𝑑1. 𝑡𝑡𝑠𝑠𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑𝑑𝑑𝑔𝑔 >< 𝑑𝑑2. 𝑡𝑡𝑠𝑠𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑𝑑𝑑𝑔𝑔 > ⋯ < 𝑑𝑑𝑘𝑘. 𝑡𝑡𝑠𝑠𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑𝑑𝑑𝑔𝑔 > ", where 𝑑𝑑𝑖𝑖. 𝑡𝑡𝑠𝑠𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑𝑑𝑑𝑔𝑔
= "|𝑙𝑙𝑝𝑝𝑙𝑙𝑡𝑡𝑙𝑙𝑑𝑑𝑝𝑝𝑠𝑠𝑡𝑡1? 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼. 𝐿𝐿𝑉𝑉�𝑑𝑑1�. 𝑙𝑙𝑝𝑝𝑙𝑙𝑡𝑡𝑙𝑙1|𝑙𝑙𝑝𝑝𝑙𝑙𝑡𝑡𝑙𝑙𝑑𝑑𝑝𝑝𝑠𝑠𝑡𝑡2? 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼. 𝐿𝐿𝑉𝑉�𝑑𝑑2�. 𝑙𝑙𝑝𝑝𝑙𝑙𝑡𝑡𝑙𝑙2| … |"
Definition 7:Pattern is a regular expression with backreferences, and 𝐿𝐿�𝑡𝑡𝑝𝑝𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑� represents the language that pattern defines.
Definition 8:Match(𝑡𝑡𝑝𝑝𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑, 𝑑𝑑𝑖𝑖𝑡𝑡) = � 𝑡𝑡𝑠𝑠𝑠𝑠𝑡𝑡 𝑑𝑑𝑖𝑖 𝐸𝐸𝐼𝐼(𝑑𝑑𝑖𝑖𝑡𝑡) ∈ 𝐿𝐿(𝑡𝑡𝑝𝑝𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑑𝑑)𝑖𝑖𝑝𝑝𝑙𝑙𝑙𝑙𝑡𝑡 𝑡𝑡𝑙𝑙𝑙𝑙𝑡𝑡
12
Figure 7. Architecture of environment
Figure 8 is the flow chart of our behavior engine. After the hypervisor intercepts a system call, it decodes some parameters and sends the system call to the behavior engine. The behavior engine first analyzes system calls semantic and ignores unimportant system calls. Next, it finds corresponding rule from the policy applying the caller (process) for updating security flag.
Finally, it generates corresponding information flow operation to update the information flow multigraph. The pattern matcher periodically check if there is a user-defined pattern matches some information flow path for malicious behaviors. The following part introduces some features in behavior engine in detail.
13
Figure 8. Behavior engine flow chart 5.4.1 Policy for processes
If we record and monitor all information flows in a system, the computation is definitely considerably tremendous. In fact, malware accounts for very small part in processes of a system.
It is reasonable that using multi-level monitor to optimize behavior engine’s performance. In short, for each process, we use security flag to distinguish between high-risk and low-risk processes for the sake of distributing monitor resource.
In our system, there are many policies defined by users beforehand. Figure 9 is a policy example. Policy path and policy match method are used to determine what policy apply to the process. Policy contains many rules, each of which defines if the process access a file in some directory or a registry key in some path, then behavior engine will do what action like raising the process’s security flag. There are presently two security flag in our system, SAFE and WARNING respectively. If a process in WARNING, behavior engine not only records system calls but also construct corresponding IFOs to update the IFMG. If a process in SAFE, behavior
14
engine only log its system call record for the situation that needs the process’s information flows in future.
Figure 9. Policy example 5.4.2 Pattern for user-defined behavior
Users can define pattern in database beforehand to forbid some information flows in the system. Practically, pattern matcher first translate patterns into the strings fitting with Encoding Function, and then it can use these strings to match information flow paths.
Figure 10 is a pattern example. The pattern describe an information flow from a file to a windows startup registrykey through some process that the path is same as the file’s. Simply speaking, the process copy itself and modify windows startup registrykey. Furthermore, the behavior mentioned above is common in viruses or malwares.
Policy name: Default Policy path: c:\
Policy match_method: prefix match Rules:
//Rule format Rule #: type | path | match_method | action
Rule 1: file | C:\windows | prefix match | raise secrity_flag to WARNING
Rule 2: registrykey | \registry\machine\software\microsoft\windows\windows error reporting | fully match | ignore
……
15
Figure 10. Pattern example
5.4.3 Handling of Information Flow Path
Using pattern to match information flow paths is the most importance part of behavior matching. Due to incremental construction of IFMG and periodically need of behavior matching, we implement a set of information flow paths which construction is also incremental.
Assuming that collected system calls have increasing time order, information flow operations as well as information dependencies generated by them with only one handle also have increasing time order. Therefore, new IFPs resulting from new information dependency must add destination node of the ID to the end of the paths that are end with the source node of the ID. The algorithm is in Figure 11.
{"expression":"A.*B.*C"
"A":{"type":"file","path":".*"}
"B":{"type":"process","path":"\\k<A0path>","PID":".*"}
"C":{"type":"registrykey","path":"\\\\registry\\\\machine\\\\software\\\\wow643 2node\\\\microsoft\\\\windows\\\\currentversion\\\\run"}}
#\\k<A0path> in path of B is a backreference representing the value of path f B is same as A’s in matching.
The translated string of the regular expression
(<\|type\?(?<A0type>file)\|path\?(?<A0path>([^<>?|])*)\|>)(?:<\|(([^<>?|])*
Figure 11. Algorithm of incremental construction of the set of information flow paths
#𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ is a hashtable, which key is information node and value is set of strings encoded by Encode Function from information flow paths
#concatenate(𝑝𝑝, 𝑙𝑙) is a function concatenating two string
# 𝐸𝐸𝐼𝐼 is Encode Function
When an ID id = (𝑙𝑙, 𝑑𝑑, 𝑙𝑙𝑝𝑝𝑙𝑙𝑡𝑡𝑙𝑙𝑙𝑙) update an 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼, then 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ𝑡𝑡𝑠𝑠𝑠𝑠 = 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ. get(𝑙𝑙) 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ𝐷𝐷𝑡𝑡𝑙𝑙𝑡𝑡 = 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ. get(𝑑𝑑) For each path 𝑡𝑡 in 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ𝑡𝑡𝑠𝑠𝑠𝑠
𝑑𝑑𝑡𝑡𝑠𝑠𝑛𝑛𝑝𝑝𝑡𝑡ℎ = concatenate(𝑡𝑡, 𝐸𝐸𝐼𝐼(𝑑𝑑))
𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ𝐷𝐷𝑡𝑡𝑙𝑙𝑡𝑡 = 𝑡𝑡𝑝𝑝𝑡𝑡ℎ𝑙𝑙𝐸𝐸𝑑𝑑𝑑𝑑𝑠𝑠𝑑𝑑𝑡𝑡ℎ𝐷𝐷𝑡𝑡𝑙𝑙𝑡𝑡 ∪ 𝑑𝑑𝑡𝑡𝑠𝑠𝑛𝑛𝑝𝑝𝑡𝑡ℎ
17
Evaluation
In the chapter, we first evaluate effectiveness of our behavior engine with two viruses. Next, evaluate performance and overhand under high pressure. Finally, we discuss issues of subpath check and partial information. Table 2 is our testbed environment.
Host CPU Intel(R) Xeon(R) CPU E5520 @ 2.27GHz x16