Background - 利用行為相似性偵測Android平台惡意應用程式

In this chapter, we describe the issue on how to detect repackaged malicious applications on the Android environment. Malware has been an issue on desktops many years ago, and many works have been proposed to detect malware. In recent years, malware has propagated to handhelds. Herein, we give a brief overview about malware behavior differences between desktops and handhelds. Some related works, such as malware detection by system calls on desktops and Android platforms, are also discussed.

2.1 Malware Differences between Desktops and Handhelds

Compared with desktops, malware detection in handhelds suffers from some limitations. Among them, one of the biggest problems is the power consumption.

Unlike powerful desktops, handhelds are only supported with finite energy from batteries. This situation constrains us from exhausting the energy to detect malware.

The other problem is the limitation of the kernel of the operating system. For instance, Android is built on the Linux kernel. Android applications are executed on the virtual machine, Dalvik, which isolates applications by marking each application run as its own user. Hence, it is hard to observe or to block the malicious application from our programs without the root privilege. Due to these significant drawbacks, traditional desktop based malware detection mechanisms cannot be performed on the Android environment.

Many changes in malware’s purpose and behavior also lead us to improve detection methods for the new circumstance. Basically, attack methods can be categorized into two types: server-side attack and client-side attack. In a server-side attack, attackers directly aim to the potential vulnerabilities in a server which exposes its services to clients. Most active malware, like worms or some kinds of bots, usually

utilizes this avenue for spreading. Fortunately, handhelds do not always access the Internet services, a server-side attack cannot mount very well on handhelds. In contrast, client-side attacks target vulnerabilities in client applications which interact with the malicious server. Hence, attacker can construct a phishing site or embed the malicious code into an ordinary server and infect applications of a client when a client accesses the forgery information. For example, spyware and trojan horses are the two typical instances to infect systems through this scenario. In addition, an attacker can easily obtain the privacy information through the applications that have been infected by spyware and trojan horses, i.e., the secret information can be obtained through the SMS messages or phone calls. Compared with detecting worms and bots, it is a thorny problem to detect spyware and trojan horses on handhelds.

2.2 Related Works on Desktop Environment

In the traditional desktop environment, signature-based detection is the most popular technique used in anti-virus and intrusion detection systems. Unfortunately, signature-based methods cannot detect the zero day malware. To overcome this security flaw, many behavior-based methods were proposed. Among them, system call is one of the popular techniques used to deeply monitor program behaviors.

Forrest, et al. [11] records the normal system call behaviors of a specific program into a database. As the compromised program executing malicious code paths, the anomalous system call sequence patterns are detected because the system call patterns do not exist in the database. After that, many researches have extended the system call approach and apply system call sequences into different models, such as hidden Markov model [12], finite state automata [13], or Bayes model [14], to improve the detection efficiency. Unfortunately, those proposed methods need to collect all the behaviors in the specified programs. Collecting all behaviors in a large system, such as the Android environment, is computational infeasible.

Christodorescu, et al. [15] apply malware behavior sequences into dependence graphs and extract the subgraphs which do not appear in benign program for detecting malware variants. Rozenberg, et al. [16] also collect fix length short system call sequences from malware to detect unknown malware. In this work, we do the similar but more force on multi threads processing and discover more precise system call sequences to achieve the high accuracy for detecting repackaged applications.

2.3 Related Works on Android Environment

Table 1. Related Works of Malicious Android Applications Detection

Analysis

ScanDroid [6] Program Code Application Verification Kirin [7] Permission Application Verification DroidMOSS [17] Code Instructions Repackaged Applications

Dynamic

TaintDroid [5] Data flow Data Leaking

Kernel-based behavior analysis [9]

System Call Name and Parameter

Anomaly Behaviors

AASandbox [8] Amount of System Call Anomaly Behaviors CrowDroid [10] Amount of System Call Repackaged Applications

This Work System Call Sequences Repackaged Applications

Referring to Table 1, in the Android environment, many researchers focus on how to verify Android applications. For instance, Enck, et al. [5] developed a novel system called “TaintDroid”. The key idea is that they track the sensitive data by labeling data in the memory and detecting the runtime privacy leaking behavior of applications. Although TaintDroid system is simple and efficient, it only detects the malware that attempts to obtain the sensitive data. For other types of repackaged applications, TaintDroid cannot detect them well.

Kirin [6] utilizes a set of permission rules to block before installing applications which require unsafe permission combination. ScanDroid [7] extracts security specifications from manifest for automatically checking data flows in the application

code. They can identify malicious applications which violate permissions in codes.

AAsandbox [8] applies both static and dynamic approaches to analyze applications. This means that it decompiles the installation file and then searches for the suspicious patterns in the decompiled codes. During the application runtime, AAsandbox counts all system calls used in Android system. However, AAsandbox only sworks with specific malware, i.e., it cannot detect multiple types of malware.

This is because its dynamic approach only detects abnormal system call usage.

The method proposed by T. Isohara, et al. [9] collects the runtime information of applications through system calls. They search some keywords in the names and parameters of system calls to generate a set of regular expression rules. In addition, they utilize the generated rules to search suspicious system calls and detect Android malware. However, the malicious applications still have a chance to escape from being detected by encapsulating or obfuscating the keyword in parameters.

CrowDroid [10] is another typical method to evaluate applications by using system calls. After the system collects all the system calls used from a set of user devices during the application runtime, it adopts K-means clustering algorithm to classify the collected data into two groups, the benign group and the malicious group.

And the malicious group can be used to identify the specified user who is running the repackaged application. CrowDroid needs a set of users to execute the same original application or the repackaged application, nevertheless, find the original application of all repackaged malicious applications is inefficient and impossible in Android environment.

DroidMOSS [17] is a system which aims to discover repackaged applications in the third-party marketplace. DroidMOSS calculates the similarity scores by comparing the author information and code instruction hash between the original applications and the repackaged applications in the different marketplaces. Finally, it

finds the repackaged applications in the third-party marketplaces since the repackaged application has a different appearance with the original application provided by the official market. However, DroidMOSS has two weaknesses. Let us consider the following scenario, i.e., once the malware is distributed both in the third-party marketplace and in the official market, DroidMOSS cannot distinguish the malware from the repackaged applications. The other is that not all of the original applications can be obtained from the official market.

As a result of these incomplete methods, our proposed method aims to extract longest system call sequences patterns from the same type of repackaged applications and utilize the patterns to detect repackaged malware without the original applications.

Our contribution includes a new approach about extracting multi-thread common system call sequences and applying the system call sequence to detect repackaged applications with Bayes probability.

在文檔中利用行為相似性偵測Android平台惡意應用程式 (頁 11-16)