Implementation - 行動應用程式的函式行為分析

In the following section we will cover the detail architecture of our analysis system AppBeach, including the distributed algorithm on Hadoop environment and the malicious behavior detector.

We implement our distributed analysis on the framework of Hadoop MapReduce; the Hadoop environment of this work is consisted of one Namenode and five Datanodes: Namenode is the instance responsible to control the distributed computing jobs on the Hadoop environment, which is not used either in distributed file system and MapReduce computing in our build. And the Datanodes are the instances actually run the distributed computing jobs on the Hadoop Distributed File System (HDFS) composed by these instances. All the instances, including Namenode and Datanodes are the virtual machines on VMWare hypervisor ESXi, each virtual machine is with 4 core CPU and 4GB RAM and 20GB HDD.

After obtaining the resolved assembly file of apps, we put all these files on the HDFS, and feed these files as the input of the distributed syntax analysis. Every function call in the app indicates its class name and method name, in our analysis we collect them separately to detect more behaviors within the apps.

Furthermore, we consider the correlation between method sets or class sets, therefore, for both for class name and method name, we record the pairs of invocations consisted by every invocation and the invocation right after it, and collect the triple combination with the same logic, and we define these records as different sampling results.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

After finishing the sampling steps, now we got six sampling result as the table below.

Combination

Target 1-sequence 2-sequence 3-sequence

Class C1 C2 C3

Method M1 M2 M3

Table 4. The table of sampling.

Defining the sampling type help us to recognize the better sampling type on some specific behavior, for instance, sampling with method is more efficient than sampling with class on the behavior of access location.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 14. Different sampling result of the app “Twitter”.

Result of C1

Result of C2

Result of C3

‧

In our system, we develop 18 apps for 9 specific target behaviors, and use these self-develop apps in pairs to generate the pattern for each behavior. Since the patterns in our system are processed by the binary analysis to generate the behavior collection, we need to generate different collection for different sampling type. For every behavior in our library, we build the corresponding pattern for each sampling we take, for instance, in our system we use six different sampling types and focus on thirteen different behaviors of app, therefore, we generate 6 patterns for every single behavior and will total give 54 (6 x 9) patterns in sum.

As for the target apps we want to analysis, as the same as the approach with generating patterns, we need to generate the behavior collections of every sampling type for each of them. We will generate more than 8400 behavior collections for about 1400 apps with 6 sampling types. In the implementation, we store the behavior collections in key value pairs consist of the class or method names and their count for every app, then compare these behavior collection with the pattern collection we prepared from the pattern library to determine the apps are with suspicious behavior or not.

Considering the over specific problem on saying an app was matched to the sensitive behavior pattern if and only if all the methods in the pattern were found in the behavior collection of the app, and all the count numbers for these methods are larger than the ones in the pattern collection. We take another approach to evaluate how likely the apps perform the sensitive behavior we care by calculate the ratio of coverage of the sensitive behavior pattern.

‧

We evaluate our system against over 1,500 of popular apps downloaded from Apple app store.

At current phrase, we have examined and analyzed iOS applications along with 9 sensitive behaviors. For each behavior, we implement a pair of normal and abnormal apps that are identical except a needed routine to perform the malicious behavior is inserted.

The patterns that we learned from the differences of their method call counts. FTP indicates building connection with the external machine through ftp. Loc indicates to access your current location, and Loc2 updates GPS location continuously. Screen takes the screen shot of your app. Internet represents the app assess the Internet. HTTP uses the ASIHTTP package. Both build Internet connections. REST indicates app may perform the data transmission by REST-API, TCP indicates app may perform the TCP connections, and FB indicates app may connect to resources on Facebook.

These behaviors are commonly implemented in apps on their own purpose with (or without) user awareness. Our goal is to reveal whether apps have included these behaviors in their executable, but leave users to judge whether apps are malicious.

These methods may be wrapped in various (third-party or user) functions in the source code.

For example, using the ASIHTTPRequest framework to handle network interaction events, developers simply use "startAsychronous"

在文檔中行動應用程式的函式行為分析 - 政大學術集成 (頁 36-40)