4.2 Research and system process
4.2.1 Data collection and processing Sandbox of iOS and binaries on App Store
When an iOS app finished, the developers have to compile and submit it to App Store for publishing. We can easily download these binaries with our iDevice or iTunes application in our computer then synchronize the binaries with our iDevice. However, Apple. Inc.
set a sandbox policy of iOS. It means that, one application is restricted in an container, and granted limited resources under its own directory, and many important features of an operation system are forbidden in iOS. For building an stable and robust system environment to users is a good policy, but it also restricted users’ right to access this system file in common way. In order to make our experiment workable, we have to get the highest permission of iOS system, generally speaking we call this jailbreak.
Auto download apps from App Store
At beginning of this system, the apps’ binaries must be well prepared. So, we develop a auto download process by using Selenium[28], a browser driver, and Sikuli[9], a GUI control library using image recognition to identify any component on screen. With Sele-nium, we can drive the web browse to the application download link page on Apple app store, and automation click download button to redirect and active the iTunes on our iMac. Then, jump out the browser environment, we will use Sikuli to control our iTunes application on macOS and download the app and synchronize into our connected iPhone.
Decrypt and extract application from iPhone
When we have binaries in iPhone, we have to dump them into our computer for decrypt and analysis. Because the binaries of apps represent in digital(0 or 1), that is hard to comprehend for people. We have to resolve these binaries into assembly files further.
Here is a quick overview for whole process of application binaries, and I will go through the details below. Because all available apps on App Store are singed and encrypted by
‧
Apple Inc., until these binaries execute on iDevice. When we got these binaries, we have to decrypt first then we could analysis them. In this step, we have jail broken our iPhone to have the highest permission. The current active jail break tool can help us to get this permission under the version 9.0.2 of iOS operation system. After we got an jail broken device, we got sufficient right to access the device’ file system and use sftp (SSH File Transfer Protocol) with scp command to dump these apps to our computer. To achieve the goal, we have find the entry point within this binary then execute it under debugger and dump the decrypted part and patch the origin encrypted binary. After doing these steps, we will get an decrypted binary. Next, we use disassembler tool IDA pro[29], to help us convert these binaries to assembly files. Once we got these assembly files, we can start our analysis on the context later.
Decrypting Apps
Installed mobile application on iOS device is accompanied with its executable, databases, media data files, and they would be placed within a specific file directory (so-called sand-box). At normal situation, users can’t get permission to access these file directories. To extract the binary file of applications for our research, we break the sandboxes and act as root administrators with the highest privileges. Then, we could access application file directory to retrieve data. Nowadays, there are lots of jailbreak tool for di↵erent iOS versions on the Internet. We use the jailbreaking tool for an iOS device is PanGu9, which can jailbreak iOS version from 9.0 to 9.0.2 [4]. After finishing the jailbreak process, we could find an icon of Cydia, a third party iOS application repository, on the iDevices desk-top. We download OpenSSH and MobileTerminal packages, which enable us to connect to the iDevice as root administrator by using ssh tunnels from Cydia. All of the installed applications from AppStore in iDevice are stored under a director, /var/mobile/Contain-ers/Bundle/Application, where we use ssh command to access them after the jailbroken.
The apps downloaded from Apple App store are partially encrypted to prevent official apps from third-party disassembling and reverse engineering. To overcome these obstacles,
‧
we leverage the property of debugging technique, i.e. and dumping decrypted machine instructions during the runtime, since it’s always decrypted before execution, and swap-ping the encrypted part with the decrypted one and the decrypted version is obtained.
We use third-party library o↵ered by Stefan Esser and adjust library to apply to iOS 9 [5]. With the library, we could dump out decrypted files from encrypted applications. We write a python script with his library so Mach-O application files on an iDevice could be decrypted and transfer files to an external storage.
Disassembling App
After retrieving application files, we disassemble applications to generate assembly lan-guage source code and related information from machine-executable code in this section.
We could do static binary analyzing on assembly code in following sections. In this procedure, we rely on Interactive Disassembler, aka IDA, a powerful multi-processor dis-assembler, and debugger to extract related information from compiled executable. IDA has two di↵erent versions, the starter and the pro. The former only supports process 32-bits executable, while the latter is capable of processing 32bits and 64-bits executable.
We also install IDAPython, an extension plugin for IDA, allowing python script to run in IDA environment. We use IDA pro version and write a python script to unveil all the hidden process, and then generated an ASM file, which is written in assembly language, control flow graph of all subroutines, and function call dependency graph.
4.2.2 Application methods property list and iOS API reference web crawler In our system, we want to check whether given property appear in an application or not, so we have to prepare a list of all possible methods in iOS SDK to compose that property.
Because most of developers develop iOS application with objective-C language, and there are also many SDKs (Software Development Kit) for objective-C provided by no matter officially Apple Inc. To get all these methods data, we directly develop a web crawler for the iOS API Reference site hosted by Apple Inc. After fetching all the information we
‧
need, we can generate the whole framework class and method list. Therefore, we can index these methods, classed and frameworks for easier processing and storing in our database.
At the same time, we can also generate property pattern from these methods we fetched for specific application behavior. For example, we may interest in an application asking for the authority of private data access function. Then we can use the these methods to generate the property. As we mentioned early, we use these methods to generate the deprecated API list, and the golden rule properties.
4.2.3 Property syntax sequence processing
Now, we have well-prepared all the data we need to process. We can use hadoop to execute the analysis for existence checking and single subroutine sequential checking, and use CFGs, FCG with two stage AllCS method for acrossing subroutines sequential checking. First, we will compile many jar files for di↵erent purpose and analysis as we mentioned before. Substantially, for one purpose of analysis will have a corresponding jar. For example, we have a jar for checking deprecated API, and a jar for check LCS sequence with Location property ..., etc. Therefore, we can chain all our analysis processes together, and log each step in our database. Basically, for every task in our system, it will need a target application, a analysis property pattern, and a correspond jar file as inputs.
By di↵erent analysis task and purpose, we will maintain di↵erent job queue to monitor all the process. Here, we will initial a executor to dispatch new task in to our jobs queue, and monitor our pending jobs and completed jobs in database, and change the analysis status for each job target.
4.2.4 Web service
Back-end API Service - Restful architecture
To automate the system process when we have any new data input, we make our system can record the whole experiment work flow and record the status. Building it up with Java Spring framework with postgresSQL and HDFS. However, directly manipulate these file
‧
in HDFS or data in database is very tedious and error prone. So we mapping the database schema into program objects, by manipulate these objects, we can relatively easy to use these data. Next, we wrapped our results into data endpoints and export APIs in RESTful style to make them more clearly, comprehensive, and easy to communicate with front-end application.
ORM, ODM and HDFS
Our database are designed for many sequential jobs for di↵erent input data, and we adopt di↵erent features in di↵erent database types, and further make a hybrid database sys-tem with relational database, and HDFS(Hadoop File Syssys-tem). We use RDB(Relational Database) with ORM mapping tool to record our jobs status and work-flow, and store results as JSON blob type. The others raw data such .asm files or image files ... , we will keep these blob data in HDFS and record the URI in our RDB, that will help us easy to find and access these data.
Front-end presentation and data visualization
Because most of our experiment results are not easy to realize in short time. Making our result more easy to read, we use many charts and tables with visualization tool to present our result. On the other side, to solve the presentation of complicate data form di↵erent experiment results, we use ReactJS library with data flow architecture redux to make whole web application more maintainable.
ReactJS, Redux
ReactJS is an open source project for building web application provided by Facebook.Inc . In our implementation, we use this library to present our whole system. ReactJS provide many convenient feature to make front-end application more flexible, for example, we make each type of our experiment result into a basic present unit and wrap it into an component in React, by reusing and assembling these components we can quickly finish
‧
our web page. In the same time, we adopt Redux, an application architecture, to make our front-end web application more organized and maintainable. Redux provide an single way data flow with a state control center idea, that is, we will maintain all data in a data store with many di↵erent defined states, and by dispatching di↵erent actions to change data state. When data state changed, it will trigger the render component to update the view if necessary. By this way, we will make our data in a single way data flow cycle that can make data more easy to handle its state and trace logic bugs. Besides, to make our experiment results more comprehensive, we use many visualization tool for data presentation and make them more readable and meaningful.
5 Experiment Analysis
5.1 Intro
In this research, we have downloaded around 13000 apps and successfully disassembled around 2200 apps shown in Figure8 available on AppStore, and still downloading currently.
The following statistic results and founds are based on these apps that we successfully processed. We major analysis is on the deprecated usage analysis and golden rules pattern checking, and the data will be present and downloadable on the website AppScan shown in Figure 7 we hosted.