• 沒有找到結果。

Three-phase Behavioral Detection and Classification

In this chapter, we describe the process of three-phase behavioral detection and classification (TPBDC) based on permissions and system call sequences. In Section 4.1, we give an overview of permissions and system call sequences. The details of our two detection phases and one classification phase are introduced in Section 4.2, Section 4.3, and Section 4.4, respectively. We discuss implementation issues in Section 4.5.

4.1 Overview of Three-phase Behavioral Detection and Classification

To achieve high detection performance and accuracy, we propose a three-phase approach. We choose to check permissions in the first phase so that the number of applications passed to the second phase can be reduced. To have better accuracy, we check the system call sequence to reduce false positive rates in the second phase.

Figure 1 shows the overview of our three phases: the permission-based detection (PBD) phase, the system call-based detection (SBD) phase, and the behavior-based classification (BBC) phase. In the PBD phase, we extract permissions from BP, MP, and IP. PP is obtained from BP and MP and we then utilize PP to judge whether ipk is suspicious and only a suspicious application is passed to the next phase.

In the SBD phase, we record system calls of BP and MP for training. We train a set of system call sequences from all the applications and then utilize the trained system call sequences to obtain MBS. For detection, we record system calls of ipk and then match with MBS to detect whether ipk is malicious. Note that only applications not filtered out in the previous phase are processed by this phase. In the BBC phase, we exploit the behaviors of malware to train TV and then utilize TV to classify ipk into a known type or a new type depending on whether its behaviors are in TV or not.

11

Figure 1. Overview of the proposed solution

4.2 Permission-based Detection (PBD) Phase

In this section, we introduce the PBD phase, which is composed of a permission extractor, a Bayes analyzer, and a permission comparator.

Permission Extractor

The permission extractor is used to retrieve built-in permissions requested by each application. Android has 139 built-in permissions. We extract permissions from BP and MP for training and then extract permissions from IP for detecting.

Bayes Analyzer

We evaluate the probability of being malicious for each built-in permission.

Requested permissions are retrieved from both BP and MP and then probabilities are obtained using the Bayes theorem. To simplify the evaluation, we only count Android’s built-in permissions. The formula to evaluate the probabilities is

12

𝑃(𝑀|𝑝𝑙) = 𝑃(𝑝𝑙|𝑀) 𝑃(𝑀)

𝑃(𝑝𝑙|𝑀) 𝑃(𝑀) 𝑃(𝑝𝑙|𝐵) 𝑃(𝐵) , (1)

where pl is one of 139 built-in permissions to be evaluated. P(B) denotes the ratio of BP while P(M) denotes the ratio for MP. P(pl|B) and P(pl|M) represent the probability that pl is requested by BP and MP, respectively. We then get the probability P(M|pl), which indicates the probability to be malicious on the condition that ipk requested permission pl. The permission probability set PP is obtained by using formula (1) for all the 139 built-in permissions.

Permission Comparator

We also extract the requested permissions of ipk and calculate the product of probabilities of all requested permissions using probabilities from PP. If the product is larger than the upper bound, ipk will be judged as malware. If the product is lower than the lower bound, ipk will be judged as benign. If the product of probabilities of ipk is between the upper bound and the lower bound, it is marked as a suspicious application and passed to the next phase.

4.3 System Call-based Detection (SBD) Phase

The SBD phase is composed of four components. They are system call recorder, system call sequence trainer, system call sequence analyzer, and system call sequence comparator, as shown in Figure 2.

System Call Recorder

The system call recorder records the system calls triggered by applications. First, we install bpi, mpj, or ipk into the Android 2.1 emulator and launch the application.

After it has been launched, we emulate several system events such as rebooting, receiving short messages, and answering phone calls. We record system calls of the application for a period of time.

13

Figure 2. Procedure flowchart of SBD

System Call Sequence Trainer

The goal of the system call sequence trainer is to generate BBS, SBS, and IBS using the N-gram and the LCS algorithm. We consolidate successive system calls before computing system call sequences because a system call could be issued repeatedly in loops. For instance, a raw system calls sequence of “open, read, read, read, close” would become “open, read, close”.

After consolidating successive system calls, BBS, SBS, and IBS are generated by either the N-gram or the LCS algorithm. The purpose of system call sequence trainer is to find out common sub-sequences. Since a common malicious behavior is the great resemblance of malware, the system call sequences recorded from the malware should share the system call subsequences considerably. The system call sequences for BP are stored in BBS, the common system call subsequences for MP are stored in SBS, and the system call sequences for IP are stored in IBS.

System Call Sequence Analyzer

The system call sequence analyzer finds out malicious system call sequences.

We obtain MBS from SBS and BBS in this module. We filter out a system call sequence if it appears in both SBS and BBS. After the filtering, the malicious behavior

14

set (MBS), which contains only system call sequences appeared in MP, is obtained.

System Call Sequence Comparator

To inspect ipk, the system call recorder and the system call sequence trainer are used to record system calls and generate system call sub-sequences. Figure 3 shows how the system call sequence comparator compares the system call sequences 𝑖𝑏𝑟 of ipk against all malicious behaviors listed in MBS. The ipk is classified as malicious if a malicious behavior is matched.

Figure 3. Procedure of system call sequence comparator

4.4 Behavior-based Classification (BBC) Phase

If a malicious application is detected, we propose another technique to classify the detected malware into a known type or a new type of malware. In this subsection, we explain the detailed design of BBC, which is composed of a type vector extractor and a type classifier.

Type Vector Extractor

We utilize a bit vector to denote what behaviors malware has. Suppose all identified behaviors are indexed from 1 to 500 and malware has the first, the third, and the 499th behaviors. The corresponding bit vector tv1 would be {1, 0, 1, 0, …, 1, 0}. We build bit vectors for MP and use the bit vectors to detect whether ipk is a known type or a new type. All the obtained bit vectors for MP are stored in a set TV.

15

Type Classifier

Figure 4 shows the procedure of the type classifier. We construct a bit vector ipv for ipk and calculate the similarity between ipv and all bit vectors of MP by cosine similarity [16]. The cosine similarity is obtained by

𝑀 𝑖𝑝

√∑ 𝑀 ( ) √∑ 𝑀 (𝑖𝑝 )

, (2)

where tv is one of available MP bit vectors in TV and ipv is the bit vector of ipk. We define a threshold as the lower bound for the cosine similarity to classify ipv into a known type or a new type. If the similarity is greater than the threshold, we classify ipv into the same type of the bit victor having the maximum cosine similarity value.

Figure 4. Procedure of type classifier

4.5 Implementation

We have developed tools to automatically retrieve permissions and system call sequences of Android applications.

Permission Analyzer

Figure 5 shows the procedure of the permission analyzer. Because an APK file is basically a ZIP archive file with an APK file extension, we decompress an application

16

to retrieve permissions of the application by apktool [17]. After decompressing, we get assets, resources, application’s source codes (via disassemble), and the manifest file. To retrieve permissions, we only parse the manifest file because a developer declares requested permissions in this file.

Figure 5. Procedure of permissions analyzer

System Call Recorder

Figure 6 shows the procedure of the system call recorder. In order to record the system calls of an application, we need to modify the system ramdisk.img and install strace [18] into the emulator. First, we decompress the ramdisk.img, install the strace tool into the system, and modify the init.rc file to launch the strace tool. The strace tool is placed in the /data directory. For the init.rc file, we insert the strace command

“/data/strace –F –ff –tt –o /data/tracefile/zygote” into this file as shown in Figure 7.

With the above modifications, strace is launched to record system calls right after the emulator boots. The output of the strace tool is placed in /data/tracefile/zygote file.

Figure 6. Procedure of system call recorder

Figure 7. Detailed modification of the init.rc file

17

相關文件