Static analysis on iOS application - 靜態廣告欺詐行為偵測技術研究-以 iOS 為例

team presents a systematized implementation of binary code analysis [50]. Other re-searchers can compose it to develop new approaches. Indirect control flows and useful tool kits with binary code analysis techniques are also be presented [51, 52, 53]. We can get help from these works when we need to adopt the binary analysis automatically.

Bitblaze [54] is useful for malware detection with verification tools integrated. Because it combines dynamic analysis and static analysis components for binary code analysis.

2.4 Static analysis on iOS application

Lee’s team detects the Chamelon app in their work[55]. It will go to diﬀerent views in Chamelon apps even users do the same action in an app. They decrypt and disassemble UI layout files and resource files from the app file to find ViewController class names. They also create a labeled view controller graph (LVCG), which consists of view controllers(VC), UI transitions defined between the VCs and the texts of the VCs. If the VC contains URL ViewController, it will not only get the text of the VC but also get the text report from VirusTotal with the URL. Finally, they use the Semantic Analyzer(NLP model to extract keyword and classify with SVM) to determine if the apps contain PHI-UI based on these texts.

The illicit app (Chamelon app) ”Happy Daily English” on iOS system has also been report by Claud Xiao[56]. It will be diﬀerent content according to the location of users is China or not, which causes that the reviewer of Apple can not see the actual content of app. The app can be installed from pirate iOS App Store, such as TutuApp, TweakBox, and App Valley. In the report of Nick Statt[57], users can easily download the pirate apps from these stores because of enterprise certificates, which are designed for the developers in large companies to distribute apps internally. However, these certificates are used by malicious companies to obtain illegal profit or private user information.

Lin’s team [58] has bridge the gap between binary analysis on iOS and string analysis.

It constructs the control flow graph and dependency graph from iOS executable. Then they adopted string analysis on property checking to detect dynamically loaded classes of

‧

The static analysis technique on the iOS application has been adopted to detect crowd-turfing UIs. Users can earn profit by performing app downloading tasks [55].

The feasibility of binary analysis for iOS applications has been introduced in PiOS [59]. Egele’s team developed a tool to detect possible leaks of sensitive information from the iOS application and the third party library. They compiled Objective-C code to get the binary code and disassembled the binary. Finally, they construct the control flow graph and perform reach-ability analysis to detect possible leaks. They scanned more than 1,400 iPhone applications to evaluate their approach.

Werthmann’s team develop their own Objective-C static analyzer,PSiOS [60].They re-solve all direct calls to the system call wrapper. They parse the Mach-O header, Objective-C Objective-Classes, Selectors and API Objective-Calls to extract all relevant Objective-Objective-C structures. They also leverage MoCFI [61] to enforce control flow integrity dynamically on iOS devices running on ARM processors.

Hybrid analysis has been adopted in Deng’s team[62]. They integrate static analysis and dynamic analysis in their detect tool iRis.The first step for iRis is similar to the potential privacy leakage detection in Pios, and then they perform dynamic analysis of iOS applications to resolve unsolvable calls. AppBeach [63] and AppReco [64] also perform static analysis technique on iOS application to scan all the class/method in the assembly to reveal potential loaded classes and invoked methods.

String analysis can determine all reachable states of string variables in string ma-nipulation programs [65]. The technology has been adopted in many fields, such as the security of web applications [66]. While there are a lot of string analysis tools that have been developed [67], none of them have been applied to analyze loaded classes with a dynamic invocation on mobile applications. Reverse engineering in the iOS app has been adept in the works of Wang’s team [43] and Lin’s team [58]. They solve the problems of obfuscation iOS apps and the sensitive API call in iOS apps respectively. The works for Ad relate issue in the web and mobile has been presented in the works of Wang’s team

‧

The unknown results generated from the string analysis are also important, they will help us get more precise verification on detecting vulnerability due to the unknown string flow to sink will be a vulnerability too [68].

Julian’s team has implemented a new way to generate a supergraph that extracts lifted information from Lifter Frontend, Class Hierarchy Frontend, and Disassembly Frontend [69]. They reconstruct the control flow graph as a supergraph, a single graph-based representation of combining multiple frontend information. IDA Pro tool [70] does not add any object-oriented concept when they parse the binary of Mach-O file. LiOS [69]

has parsed the Objective-C sections to reconstruct the object-oriented construct such as classes, meta-classes, methods and so on. The information about liOS comes from three frontends. Lifter Frontend reconstructs the control flow graph and cross-references to make a McSema-compatible CFG. Class Hierarchy Frontend complete representation of the Objective-C class hierarchy. Disassembly Frontend comprises a call graph and cross-references that resemble the actual high-level program in Swift/Objective-C, rather than the immediate references at the assembly level. They implement a built-in pass to link and extend the isolated sub-graphs from the three frontends. They also provide an extensible graph query language ”Cypher” to query the information they store in the Neo4J graph database [71].

We can check Unavailable API in supported iOS SDK with Infer[72]. They oﬀer a supported and unsupported API list according to diﬀerent versions, and they will check the version declared by developers. An error warning will be triggered if the developers use the API only defined in a version higher than the version they support. This tool is also used in Facebook to check their software for high quality, and they integrate it into the software development cycle of mobile applications both on iOS and Android at Facebook[73].

Separation logic[74] and Bi-abduction[75] is the root for Infer [73]. Separation logic is a mathematical logic for formal verification of software. It can break the reason into the

‧

chunk corresponding to operations on memory first, and then composing the reasoning chunks together to facilitate scalability. Smallfoot[76] is also a successful academic tool on Separation Logic before Infer. Bi-abduction can be used to form local reasoning of logical inference for Separation logic. A ⊢ B which says that A implies B. Infer represents the program statements in an internal theorem prover. The Infers question in Bi-abduction is A∗?antiframe ⊢ B∗?frame. Infer needs to discover a pair of fram and antiframes to make the statement valid.

Inf er.AI is a framework for developers to write their own abstract interpretation-based checkers, also known as intraprocedural analysis[77], provided by the Infer team.

They also provide a list of pre-defined abstract domain and transfer functions wrote with Inf er.AL, a declarative language for writing linters in Infer. Take the linter Unavailable API in supported iOS SDK as an example. They use Define-checker to define Unavail-able class in supported ios sdk checker[78]. In this checker, it will call Class unavailUnavail-able in supported ios sdk to check its unsupported version first and check if Call class method is alloc or new to make sure it will call the method. The Class unavailable in supported ios sdk is defined in cPredicates.ml[79]. It will compare each API allowed version and the support version defined by developers. If the allowed version is greater than the support version, it will alarm the message.

Previous investigators mentioned Infer in their research[80, 81, 82]. DeepBugs[80] is a learning-based and name-based bug detection. It is a checker that learning the rules from the incorrect code examples to check the bug in the new incoming code. Rijnard’s team [81] creates an Automated Program Repair program based on Infer. It will automatically infer the Separation Logic assertions over the statement of the program and repair the bug in the code. It can handle various types of bugs with their dynamic APR techniques.

Mark’s team[82] research on the scale-ups problems of static and dynamic analysis. They discuss six aspects(Irrelevant, Unconvincing, Misdirected, Unsupportive, Closed System and Closed Mind) of the problem based on the tool Infer and Sapienz[83].

Alshahwan’s team describes the deployment of the Sapienz Search-Based Software

‧

Engineering (SBSE) testing system in their work[83]. Sapienz has been deployed in pro-duction at Facebook to design test cases, localize and triage crashes to developers and to monitor their fixes.

From the example AL checkers Infer provide[84], we can see that they can detect class inheritance, instance method call, class method call, the type of parameter checking and so on. Apart from checking the constant of class, method, and parameters in our work, we do string operation examination. A varied combination of API may be adopted by developers, for instance, the type of parameter is not constant but string operation, such as union or concat. It is hard for Infer to detect it because it needs to know the flow to the parameter. On the contrary, our string dependency graph can represent the flow to the parameter, and we can discover the situation in our research. In addition, the major input of Infer is source code. Considering most applications uploaded to the App Store are not open-source, which means the source code cannot be directly acquired, we research binary code instead of source code.

3 A Motivating Example

To illustrate our approach, we will use an Ad fraud example to explain our approach concisely. It contains multiple advertisements in one ViewController (The same thing mentioned in Lee’s team work[55]), shown in Figure 1. Displaying multiple Ads in inap-propriate numbers is an Ad fraud defined in previous work [8, 9]. It also violates the App Store Review Guidelines 3.1.7 [85].

A ViewController denotes a view in the apps. There are multiple Ad views in the same ViewController shown in Figure 1. However, it will be a Multi-view violation of Ad fraud if a view adds multiple Ad views. We will count the number of Ad views added by addSubView method called by one ViewController. If the number is higher than one time, we will record it as Multi-view violation Ad fraud.

In this example, we will find the ViewController nodes in this app first, which repre-sents the views that users will touch and see in an application. Then we perform algorithm

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 1: View of Multi-view violation app

6 to detect if it occurred Multi-view violation. We need to construct the control flow graph for the application first.

Control Flow Graph Construction To perform our analysis, we will get the segment information data in JSON format as follows. We extract assembly code to the segment information data. Through ARM Information Center [86], we can determine the relation between control flow graph nodes. Since we can know the definition for each command node in ARM Information Center, we will present it as a control flow graph node in our work. It contains three things in the control flow graph node: command, first parameter, and second parameter. Each command has its definition, so we will try to understand what the command does so that we can know the first parameter and second parameter are register or address, and the relation between these two parameters.

{

” s u b r o u t i n e D a t a ” : {

” R B D a t a b a s e M a n a g e r f a v o r i t e R e c i p e s ” : {

” l a b e l ” : ” R B D a t a b a s e M a n a g e r f a v o r i t e R e c i p e s ” ,

” l o c a t i o n s ” : [ {

” l a b e l ” : ” ENTRY LOC RBDatabaseManager favoriteRecipes ” ,

‧

We build the Control Flow Graph through Binflow[58]. Binflow extracts segment information to resolve register values of indirect jumps to link these routines. During the CFG construction, it will also mark the dependency relations of registers for each assembly statement. The process of generating the dependency relations is that We will parse each assembly statement to control flow graph node(CFG node) and construct dependency relation for CFG node at the same time. We can trace the value through the dependency relation. See section 4.1.2 for more detail explanation.

We will show the CFG graph of the app in Figure 2. It will be only one part of CFG.

The real control flow graph contains more than 100 thousand nodes.

We need to construct the dependency graph and explain what is property checking before we go to the following checking steps.

Dependency Graph Construction The dependency graph will be useful when we need to detect with dynamic invocation where exact calls and their arguments depend on runtime values of nested parameters.

We can perform algorithm 1 with the dependency relation of a control flow graph node mentioned in the section 4.1.2. Each control flow graph node has its dependency relation, which uses a key-value set to represent. The dependency key will represent that the dependency value depends on.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 2: Part of CFG Graph

We adopt static flow analysis techniques on the iOS executable with which we build dependency graphs on parameters of target functions. Since we can traverse the depen-dency relations of the corresponding register of the parameters, we can determine the parameters will be a constant or external input.

Checking the depKey of the input node is the first step of constructing a dependency graph. We will go into a while loop to createDP G until we get all the information we need, such as the dependency relation, the predecessor of the node and so on. The detail processing steps of constructing a dependency graph shown in section 4.1.3. The dependency graph we generate is like Figure 3 4 5.

In each step of Ad fraud detection, we may construct a dependency graph for the control flow graph node we interested to perform analysis. After we build the dependency graph, we can also build the automata of it and perform property checking.

Property Checking When we need to determine which string will flow to the sink, we will perform string property checking. Take Interstitial violation Ad fraud as an example,

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3: DPG 1 Figure 4: DPG 2 Figure 5: DPG 3

we need to perform algorithm 3 to check if the apps use the interstitial API with dynamic invocation. First, we get the function nodes that represent N SClassF romString. If the node belongs to a function node of N SClassF romString, we will build dependency graphs on parameters(R0) of this function. After generating the dependency graph, we will build the string automata of it so that we can determine the state of the input string for the dependency graph. We will also build string automata for the Interstitial API string defined in diﬀerent string among Ad network providers. Then we can check the intersection between these two automata. If there is an intersection between these two automata, we will record the result as violating Interstitial violation Ad fraud. We will use this property checking at the moment when we want to know what string can flow to the dependency graph we are interested in.

Check the Ad fruad We get the addSubView functions of the ViewController nodes first. Then we will build dependency graphs on the parameters of these functions. We will trace the dependency relation of the parameter to find if it will be the same node as the top node of the Figure 4.

We will construct the automata for this union operation dependency graph(union operation means that it depends on one of the nodes it connects). This automata can represent potential API invocations with their argument values. In this example, potential API will be FBAdView, FBAdBannerView, FBAdLink or FBAdNativeAd. We will also

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

build automata to represent the list of Adview API string, which contains FBAdView and FBAdBannerView. If we check the result of the intersection between these two automata, we can find it will be true. Finally, we record this ViewController node add Ad view one time. We can know that this ViewController has added multiple Ad views through the table 7.

4 Ad fraud detection analysis

4.1 Static Analysis on iOS application

The process of static analysis for systematic API vulnerability checking in iOS executables has mentioned in the previous work [58]. The detection process chart is like Figure 6. We will use this approach to detect Ad fraud.

Figure 6: Detection flow chart

‧

The first part is to download and install online apps from Apple’s App store into a jail-broken iOS device. We can access the file system directly to fetch the target binary. We use a third-party binary decryption tool Dynist to generate the decrypted binary. After generating the decrypted binary, we then use third party disassembler tools IDAPro to generate the plain text format assembly code. Then we use the tool Binflow [58] to extract segment information from the assembly code.

The output of segment information extraction contains 14 segment files, each corre-sponding to an extracted segment in the assembly. The formal definition for each segment is defined in the document [70].

4.1.2 Control Flow Graph Construction

We build the Control Flow Graph through Binflow[3]. Binflow needs the extract segment information mentioned in the previous step to resolve register values of indirect jumps to link these routines. During the CFG construction, it will also mark the dependency relations of registers for each assembly statement.

The dependency relations of registers can be used when it needs to know the steps of Ad related API call. Take the Figure 7 and Figure 8 as example,the Figure 7 shows assembly code that triggers a function-call with B instruction. We need to resolve the value of R6 to get the target function. So we plan to adopt static backward slicing to resolve R6. When we scan forward, we will see MOV instruction first. So we record R2 as dependency key, string malloc as dependency value due to R2 depends on string malloc.

Then we will see the next MOV instruction and do the same thing. We record R6 as dependency key, R2 as dependency value due to MOV means it copies the value of R2 into R6, that is to say, R6 depends on R2.

We will record the dependency relation in other commands as well, such as LDR/STR, which means loads/store the value from other registers. After recording the dependency relation, we can track the value we interested in. For example, when we want to know the

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

value of R6 of B instruction, we can trace the value by the dependency relation shown in Figure 8.

Figure 7: Location Figure 8: Relation

We will use GetCallerSbrt to find the callSbrtN ode list of the input node in Algorithm 5 6 7. We will give detail explanation as follows. When we construct control flow graph,

在文檔中靜態廣告欺詐行為偵測技術研究-以 iOS 為例 - 政大學術集成 (頁 17-30)