useful features in our map function thus reducing functions in the MapReduce Design Patterns[14].
1.0.4 Hadoop
Hadoop[1] is an apache project provides a solution that we can build a reliable distributed environment with some commodity hardware. By linking these hardware, it can provide more computing power, memory and hard-disk storage. Moreover, these resources own high tolerance and good scalability.
1.0.5 MapReduce
MapReduce[12] was proposed by Dean and Ghemawat of Google Inc. The basic idea of this distributed computing model is the divide and conquer concept. We will construct one or many map functions for digesting repeat tasks in our job, and reducing functions to gather the results from map functions. After processing literately, we will converge a final result in our job. Through the divide and conquer algorithm, we can figure out huge data processing problem.
2 Literature review
In recent years, more and more studies concerned about mobile applications’ behavior, related privacy and security issues. Early in the ”A survey of mobile malware in the wild”[17], it discusses many isssues about leaking of the user sensitive information, and later in year 2013, a survey on security for mobile devices[23], they describe many di↵erent types of mobile malware and predictable potential issue in many di↵erent mobile apps operating system. The behviors of apps were not only disscuss in any single platform, it is a general issues in both iOS and android operating system. Because of the di↵erent official policy between Google Inc. and Apple Inc. on apps. Android apps are more open than apps on iOS apps. Therefore, many statements claim that the iOS apps are more safer than android apps. However, in the research ”Android or iOS for better privacy
‧
protection?”[20] are also clearly analysis the privacy and security issues on both two plat-forms, and further points that additional SS-APIs(Security sensitive APIs) are always invoked on iOS, and these may cause higher risks on privacy leak. In these researchs, they have developed many analysis tools and methods. However, the bolcked policy on iOS apps lead the di↵erent research progress between android system and iOS system.
In contrast of iOS platform, most of these studies focus on android platform, and many analysis tools were built for android apps only such as TaintDroid[16], AsDroid[21], Flow-Droid [8], Flow-Droidra[25] and the research ”Static analysis of implicit control flow: resolving Java reflection and Android intents”[31].
In general, two di↵erent approches are use in these research to detect behaviors of apps, the static analysis and the dynamic analysis.
Static analysis methods are usually use on application behavior detectations. In Flowdroid[8], it adopts static analysis by building a control flow graph of API meth-ods call within android app for detecting privacy leakage. However, the dynamic loaded classes problems will be ignored by only construct the control flow graph. Futher, in Droidra[25] they use constant propagation solver to solve this problem, and the ”Static analysis of implicit control flow: resolving Java reflection and Android intents”[31] is also facing the problem, they focus on control flow related to Java reflection and android in-tent checking of applications and they also use static analysis on constructing control flow graph, and in the Appintent[35], and IccTA[24], they both adopt static taint analysis to preprocess and extract the information from apps. Appintent focus on all possible data transmission paths and possible events related to each path.
However, static analysis have the limitations on checking the behavior that trigger by external input in application runtime, and the dynamic checking related methods can be apply with this situation. Dynamic analysis will observe application by actually executing it. The TaintDroid[16] is one of researches that adopt dynamic anaylsis. They detect privacy-sensitive data and tracing data will be passed to external or not as a risk evaluation and perform a dynamic taint analysis tool to detect sensitive data leaves from device, and
‧
for another research, IccTA detects specific ICC (Inter-Component Communication, a base communication model provided by android os) links and leak in android apps.
Apparently, there are relatively rare studies on iOS than android. PiOS[15] is an im-portant study for detecting privacy leak in iOS apps, they introduce a way to reconstruct apps CFG (Control flow graph) from binaries of apps, and checking the privacy leak with data flow.
In PSiOS[33], and Jekyll on iOS[32], they are not only detect the leak of privacy data, they also provide a tool to prevent the leak, and the AppBeach[36] uses the static way to check the possible malicious behavior pattern in an application. Then, iRiS[13] reveals issues of abusing the private API illegally in iOS apps with both static and dynamic analysis.
Besides, in these researches, they also use many other aspects to analysis mobile apps, and eager to solve these malware application issues. In the PMP[7], they gather the feed-backs from real users, and build up this system to receive and analysis the applications’
behavior in crowdsourcing power. In Checking app behavior against app description[19], they clustered apps with their description topic, and identify the sensitive APIs usage in each cluster, and the Apposcopy[18] adopt static analysis along with semantics ap-proach to figure and detect the malware apps issue. The AsDroid[21] focus on the UIUX logic conflict with the program behavior and regard these abnormal situation as strange and need to be removed. The research ”Hey, you, get o↵ of my market: detecting mali-cious apps in official and alternative android markets.”[38] detect the apps accroding the behaviors that need to request permission with user in app.
In this research, we are going to check iOS applications by adopting static analysis and sequential checking method with distributed computing framework, Hadoop, and we want to study the deprecated APIs of iOS SDK usage and specific API usage pattern within these applications. The deprecated APIs issues are very common problems in software, and many studies are related to this, they explore many di↵erent points like APIs retrospective[37] and design or programing issues of empirical[27] [10]. In this paper,
‧
we will identify the deprecated usage and many defined patterns of iOS applications. It can help to improve the bad or wrong using APIs problems in iOS applications.
3 Application properties and checking
Before we get start to process the applications, it is necessary to define some behavior properties may be found in an app. In our research, we defined a sequence of method calls as a application behavior property. We will prepare a behavior property filter list for those behaviors we are interested in, then we will use it to check by our distributed sequencial analysis program. These property filters are consists of several aspects. The first one we focus on the methods that are deprecated by iOS SDK. These methods may create the risk of crash issues when operating system updated by Apple Inc., and the problems will cause the inconvenient for users who downloaded the apps. The second part is related to some specific method usages which is especially mentioned in iOS API developer reference[2] such as asking for user permissions, or asking for device locations..., the Apple Inc. given a sample to demonstrate how to use these methods to perform the property feature, so we called these properties as ”Golden rules”.
On the order hand, in order to check the correct context in binary, we have to decrypt apps downloaded from App Store in advance and convert them into readable assembly files with annotations. By using disassemble tool, IDA pro[29], we can turn decrypted binary into assembly file. Every application assembly file contains many subroutine blocks, and when we dive into each subroutine we will get more small logic area called locations that can help us analyze the code and realize the classes or methods invoked in the context. We check context subsequence in assembly files by behavior properties, and use distributed computing to do the sequence analysis. Before analyzing, we have to do some data processing and define the meaning of subsequence in our experiment. The details are described below.
‧
3.1 iOS developer api reference web crawler
In order to get the frameworks, classes, and methods in iOS SDK, we have to fetch the data from the iOS developer API reference site[2]. However, the web pages are made by HTML and CSS contents, so we need to extract the information we need then format and store them in our database. Therefore, we develop a web crawler to process the data we need.
Our crawler are basically designed in three parts as show in Figure 1, including the services, parsers, and stores. The services are reusable, mostly they are used to handle with file write out and initial http requests. The parsers are used to parse html content and select the data we need, so basically a parser will map to one or more target web pages. Stores can be regarded as the data model, we will define the model to shape our data object and implement the serialize methods.
Therefore, we will dispatch http request to the target page in the iOS developer API reference site[2], and fetch the data, parsing with our parsers, collect and format the data into the object we defined in stores. It’s a pipeline working flow to send request, process, format, collect and write out. Finally, when we got the output files (we use the json format as output), we will further to write the json into our database.
Through this process flow, we can fetch the API data from the Apple official site, and generate the deprecated api list and method sequence patterns we need.