靜態廣告欺詐行為偵測技術研究-以 iOS 為例 - 政大學術集成

全文

(1)國立政治大學資訊管理學系. 碩士學位論文指導教授：郁. 立. 方博士. 政治大. ‧ 國. 學. ‧. 靜態廣告欺詐行為偵測技術研究-以 iOS 為例. n. al. er. io. sit. y. Nat. Static Ad Fraud Detection on iOS Applications. Ch. engchi. i n U. v. 研究生：黃存宇中華民國一〇八年十一月 DOI:10.6814/NCCU202000021.

(2) Abstract While mobile applications (apps) become one of the most popular and dominant software applications, app developers (particularly for those who deliver free apps) gain considerable parts of profits from advertisements on apps. Demonstrating ads on apps in a suitable way benefits both customers and advertisers. Various ad frauds have been identified with which developers may gain extra benefits but damage user experience or advertisement effects. We present a static analysis technique to check ad frauds of iOS apps in this work. Particularly, we detect apps that have their ads against interstitial violation, size violation, multi-view and overlap violation. To detect these violations, it requires to identify advertisement API invocation with specific arguments in apps. It becomes hard to detect with dynamic invocation where exact calls and their arguments depend on runtime values of nested parameters. We adopt static flow analysis techniques on iOS executable with which we build dependency graphs on parameters of target functions. We then conduct string analysis on dependency graphs to reveal potential API invocations with their argument values on ad fraud violations. We have analyzed more than one thousand apps that have their control flow graphs constructed by our previous app static analysis tool Binflow, and found 208 apps using dynamic invocations on Ad related API calls. We further identified 70 apps having interstitial-violation ads, 48 apps having size violation ads, 31 apps having multi-view violation ads, and 19 apps having overlay violation ads.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/NCCU202000021.

(3) Contents 1 Introduction. 1. 1.1. Ad fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.2. Discoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2 Related Works. 5. 2.1. Mobile Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.2. Detecting Ad fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 政治 ................. Static analysis . . . . . . . . . . . . . . . . .大 Static analysis on 立 iOS application . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.3. 學. ‧ 國. 2.4. 3 A Motivating Example. 19. Nat. 4.1 Static Analysis on iOS application . . . . . . . . . . . . . . . . . . . . . .. 19. sit. y. 10 14. ‧. 4 Ad fraud detection analysis. 8. 4.1.2. Control Flow Graph Construction . . . . . . . . . . . . . . . . . . . 20. 4.1.3. Dependency Graph Construction . . . . . . . . . . . . . . . . . . . 22. n. al. er. Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. io. 4.1.1. Ch. engchi. i n U. v. 4.2. Overview of Ad fraud detection analysis . . . . . . . . . . . . . . . . . . . 23. 4.3. Find the Ad Related API . . . . . . . . . . . . . . . . . . . . . . . . . . . 25. 4.4. Check Ad fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4.1. Interstitial violation Ad fraud . . . . . . . . . . . . . . . . . . . . . 27. 4.4.2. Size violation Ad fraud . . . . . . . . . . . . . . . . . . . . . . . . . 28. 4.4.3. Multi-view violation Ad fraud . . . . . . . . . . . . . . . . . . . . . 30. 4.4.4. Overlay-view violation Ad fraud . . . . . . . . . . . . . . . . . . . . 31. 5 Evaluation. 31. 5.1. Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31. 5.2. Result of detecting Ad related API . . . . . . . . . . . . . . . . . . . . . . 33. DOI:10.6814/NCCU202000021.

(4) 5.3. 5.4. Result of Ad fraud detection . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3.1. Result of Interstitial violation Ad fraud . . . . . . . . . . . . . . . . 34. 5.3.2. Result of Size violation Ad fraud . . . . . . . . . . . . . . . . . . . 38. 5.3.3. Result of Multi-view violation Ad fraud . . . . . . . . . . . . . . . . 43. 5.3.4. Result of Overlay-view violation Ad fraud . . . . . . . . . . . . . . 47. Result of Pirate App Store . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. 6 Conclusion. 53. References. 立. 政治大. 54. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/NCCU202000021.

(5) List of Figures 1. View of Multi-view violation app . . . . . . . . . . . . . . . . . . . . . . . 15. 2. Part of CFG Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17. 3. DPG 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 4. DPG 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 5. DPG 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 6. Detection flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19. 7. Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21. 8. Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21. 9. 23. 政治大 Ad fraud detection analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 立. Apps use NSClassFromString to call class . . . . . . . . . . . . . . . . . . . 33. 11. The number of calling Ad related API in the NSClassFromString apps . . . 33. 12. Number of apps in each Ad fraud . . . . . . . . . . . . . . . . . . . . . . . 34. 13. Count Detail in Interstitial violation Ad fraud . . . . . . . . . . . . . . . . 35. 14. Count in Interstitial violation Ad fraud (NotUnknown) by Operation type. 15. Count in Interstitial violation Ad fraud (Unknown) by Operation type . . . 36. 16. Example of call Ad view . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. 17. Count in Ad view node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. 18. Count in Size violation Ad fraud . . . . . . . . . . . . . . . . . . . . . . . . 39. 19. Count in Size violation Ad fraud (NotUnknown) by Operation type . . . . 39. 20. Count in Size violation Ad fraud (Unknown) by Operation type . . . . . . 40. 21. Size violation Dependency Graph 1 . . . . . . . . . . . . . . . . . . . . . . 41. 22. View with Ad of app 335445524 . . . . . . . . . . . . . . . . . . . . . . . . 41. 23. View without Ad of app 335445524 . . . . . . . . . . . . . . . . . . . . . . 42. 24. Count in Multi-view violation Ad fraud . . . . . . . . . . . . . . . . . . . . 43. 25. App number in each times of Multi-view violation Ad fraud . . . . . . . . 43. 26. Multi-view violation Dependency Graph 2 . . . . . . . . . . . . . . . . . . 45. 27. App number in each times of Overlay-view violation Ad fraud . . . . . . . 47. ‧. ‧ 國. 學. 10. n. er. io. sit. y. Nat. al. 35. Ch. engchi. i n U. v. DOI:10.6814/NCCU202000021.

(6) 28. Overlay-view violation Dependency Graph 1 . . . . . . . . . . . . . . . . . 48. 29. Overlay-view violation Dependency Graph 2 . . . . . . . . . . . . . . . . . 49. 30. Overlay View of app 335445524 . . . . . . . . . . . . . . . . . . . . . . . . 50. 31. Interstitial violation Dependency Graph 1 in Pirate App Store . . . . . . . 52. 32. Interstitial violation Dependency Graph 2 in Pirate App Store . . . . . . . 52. 33. Multi-view violation Dependency Graph in Pirate App Store . . . . . . . . 53. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/NCCU202000021.

(7) List of Tables 1. Ad related API Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25. 2. ADInterstitial API Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 34. 3. Top 20 Interstitial violation apps Table . . . . . . . . . . . . . . . . . . . . 36. 4. Related information of Top 20 Interstitial violation apps table . . . . . . . 37. 5. Top 20 Size violation apps Table . . . . . . . . . . . . . . . . . . . . . . . . 40. 6. Related information of Top 20 Size violation apps table . . . . . . . . . . . 42. 7. Multi-view violation apps Table (callTimes above 2) . . . . . . . . . . . . . 44. 8. Related information of Multi-view violation apps Table . . . . . . . . . . . 46. 9. 48. 政治大 Overlay-view violation Table . . . . . . . . . . . . . . . . . . . . . . . . . . 立. Related information of Overlay-view violation apps Table . . . . . . . . . . 50. 11. Interstitial violation apps Table . . . . . . . . . . . . . . . . . . . . . . . . 51. ‧. ‧ 國. 學. 10. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/NCCU202000021.

(8) 1. Introduction. 1.1. Ad fraud. Most of the applications(apps) we use in daily life are free to users. Half of the apps in Google Play, Apple Store, Microsoft Store will display advertisements [1, 2]. These applications show advertisements through Ad networks, such as AdMob [3], Apple’s Search Ads [4],Facebook Ad [5] and so on. Developers of these apps may earn profit through in-app advertisements. They follow. 政治大. the documents provided by these Ad networks to place Ads in their apps. The developers. 立. get paid by these Ad networks based on the numbers of Ad view shown in the app or how. ‧ 國. 學. many times Ads are triggered by the users.. However, unscrupulous developers often attempt to cheat not only advertisers but. ‧. also users to earn more profit. The famous example is Chamois [6]. It generates ad. y. Nat. fraud and executes a malicious process in Android devices. Stones team (Googles Project. io. sit. Zero) found it has been preinstalled in 7.4 million Android devices. The behavior of Ad. n. al. er. fraud is various. It may show the advertisements too frequently to make users click the. i n U. v. advertisements inadvertently. Not only Ad network but also users have been plagued. Ch. engchi. with it. To avoid malicious developers earning profit by fooling users, these Ad networks provide a policy [7] for the developers to present an Ad in an app. However, there are still plenty of tricks that evade the policy checking by Ad network providers to earn more profit. These tricks have been introduced in previous works [8, 9, 10]. The goal of these frauds tries to earn more benefits from Ad network. For example, Interstitial violation Ad fraud may trigger the user to click the advertisements accidentally at a high rate. Multi-view violation Ad fraud will earn more benefits due to more impressions of advertisements. There is one thing that needs to mention, if the developers use an abnormal way to call Ad API provided by Ad network, we will think they want to prevent the checking process of the App Store or Ad network. We will check these behaviors from the applications that 1. DOI:10.6814/NCCU202000021.

(9) have already submitted to the App Store. Measuring which iOS apps benefit both customers and advertisers in a suitable way through static analysis techniques is our objective. Through our tool, we can make the entire advertising network industry more sustainable. In the paper, we propose a system which can detect Ad frauds as follows, • Interstitial violation Ad fraud • Size violation Ad fraud. 政治大 • Overlay-view violation 立Ad fraud • Multi-view violation Ad fraud. ‧ 國. 學. Interstitial violation Ad fraud Interstitial violation Ad fraud, also called Ad fraud [8] has been used on the web for a while. Developers use an interstitial way to display. ‧. Ads between web pages. Similarly, when an interstitial Ad shows between the pages in. Nat. sit. y. an app, users will be tricked to accidentally click the app at a high rate. The behavior is. er. io. harmful to user experience. We need to check if an app involved Interstitial violation Ad. al. v i n C dynamic provided by the Ad network with by using static analysis techniques h e n ginvocation chi U n. fraud so that they will not annoy users. Our system can detect specific interstitial API. to prevent such Interstitial violation Ad fraud.. Size violation Ad fraud When a user sees an advertisement on the mobile, it should be a suitable size [11]. If the size of the Ad is too small or too large, it will be a Size violation Ad fraud. Most Ad networks often define the specific size in different types of Ads and each device, such as Google Banner Ads [12]. While smaller Ad network will not define the size, so we will check both of them in our research. When developers can manipulate the Ad size, they may make the Ad size as small as possible. Users will think it is an Ad-free app, but it violates the policy of Ad network providers. 2. DOI:10.6814/NCCU202000021.

(10) In the same words, if the size is too small so that an advertisement on the page cannot be viewed by the user, but it still attaches on one view, it will be a Size violation Ad fraud. Developers manipulate the size of an Ad view smaller than the common one so that users will not be interrupted by the advertisement. It will increase the benefit of developers, however cheating advertisers. Our system can detect the Ad view function with a dynamic invocation (which is used for setting the Ad size) and check if the Ad size is zero with the static analysis technique.. Multi-view violation Ad fraud In general, one page should contain only one adver-. 政治大. tisement view. The number of advertisements shown in one view should be restricted. 立. [13]. However, Greedy developers will put multiple Ads in one view so that they can earn. ‧ 國. 學. more profit. The more advertisement put in one view, the more probability the user will click accidentally. Our system will detect Multi-view violation by checking if there exist. ‧. multiple AddSubView functions (an API to attach view) to add ad view in one View. sit. y. Nat. Controller (which is a component representing a view of an app).. er. io. Overlay-view violation Ad fraud If there is an ad view and a full-screen view in. al. n. v i n C the ad view. ThisUbehavior violates the ad network the full-screen view may overlay h engchi one View Controller, it may be an Overlay-view violation Ad fraud, which means that. policy. We seldom add there two views in the same View Controller, so we detect this behavior because it is abnormal. Our system will detect Overlay-view violation Ad fraud by checking if an Ad view and a full view are attached in the same View Controller.. 1.2. Discoveries. First, we download iOS apps through Sikuli based approach[14] from Apples App Store via iPhone 7. We collected 30 thousand apps in our experiment, covering 33 genres. The release dates of these apps range from 2008 to 2017, all of which have been updated after 2016. Our analysis is based on the binary code on iOS 9, which belongs to the arm v32 architecture. We then install the app, fetch its binary, and decrypt the binary. Next, we 3. DOI:10.6814/NCCU202000021.

(11) construct its ARMv7 assembly with IDA Pro, then we generate its control flow graph. The number of generated control flow graph nodes could be more than 10 thousand which makes it a memory-intensive task. Hence, we deploy a high-end server with 24 GB RAM to generate the control flow graph of an app. After the preprocessing steps, we can get 1391 apps (Control flow graph) to perform our Ad fraud detection. we will check if an app has used Ad Related API first. If an app does not include any Ad Related API, the Ad frauds behaviors in this app are not possible. Among 1391 apps, we find there are 208 apps included Ad Related API.. 政治大. We will represent the result of our Ad fraud detection in section 5.3. The count for each Ad fraud violation shown in Figure 12. In the 208 apps, We further identified 70. 立. apps having Interstitial violation ads, 48 apps having size violation ads, 31 apps having. ‧ 國. 學. multi-view violation ads, and 19 apps having overlay violation ads.. ‧. 1.3. Contributions. Nat. sit. y. The previous work on checking ad frauds detect vulnerability with dynamic analysis. er. io. [8, 9, 10]. They need plenty of time to build the simulation for an app and perform ad fraud. al. v i n C hBesides, Apple Inc. app binary code spends little time. e n g c h i U does not provide View Simulation n. detection. In contrast, performing ad fraud detection with static analysis techniques on an. API for developers to do a simulation approach [15].. The simulation approach will not detect some vulnerability. For example, if an interstitial ad pops up too quickly so that the simulation recorder cannot record. Despite the previous situation happening, binary analysis and string analysis will still know the Interstitial violation due to its checks the behavior of interstitial ad creation from the code developers wrote. Our approach, Static analysis on Ad Fraud Detection, will detect the Ad fraud introduced by previous work, and we can detect more Ad fraud violations that other works cannot find. Besides, our work can be used in iOS and the previous work not. We will conduct binary analysis and string analysis to detect Interstitial violation Ad. 4. DOI:10.6814/NCCU202000021.

(12) fraud, Size violation Ad fraud, Multi-view violation Ad fraud, and Overlay-view violation Ad fraud in our works.. 2. Related Works. 2.1. Mobile Security. Mobile security has become more and more important. There is a lot of work about. 政治大. security issues in mobile have been presented. Ian Beer has presented iOS Exploit chains security problems in the blog of Google Project Zero[16]. Attackers use the security. 立. vulnerabilities to install a monitoring implant in the vulnerable iPhones. According to. ‧ 國. 學. the report of Beer, these vulnerable mobile devices have been invaded by hackers if the users have visited the hacked sites through cell phones.. ‧. Wang’s team resolve the third-party Online Ads issue between Ad network publishers. sit. y. Nat. and developers [17]. They publish a tool which allows publishers to specify constraints. io. er. on events associated with third-party ads, like URL requests, HTML element creations, and timers, AdJust is able to monitor and regulate resource abusing ads by transparently. n. al. Ch. intercepting key JavaScript APIs.. engchi. i n U. v. Yang’s team [18] present Entity-based characterization and analysis of mobile malware. They detect the interaction patterns of malware with characterizing mobile malware firstly. They perform a static analysis technique on the bytecode of the apps to identify the entities and entity references of them. We also perform static analysis techniques on binary code to detect Ad fraud in our Ad fraud detection analysis 4.2. Take iOS exploit chain 1[19] as an example. The hackers will evaluate task threads() then thread terminate() to initial remote code execution, which can cause heap overflow in the function AGXAllocationList2 :: initW ithSharedResourceList. This method is a C++ virtual method that takes two arguments. These two arguments point to memory which is shared with userspace, which means that the two parameters are attackercontrolled. The hackers will inject these two parameters externally, that is to say, the 5. DOI:10.6814/NCCU202000021.

(13) value of the iOS exploit chain flew to the parameter which is external input. There’s an 0x18 byte header structure. The last dword n entries of which is a count of the number of following sub-descriptor structures. The kernel developers of iOS assume that each sub-descriptor has at-most 6 sub-entries, however, there’s actually space 7 subentries controlled resource id and f lag pairs. Attackers will use the vulnerability to perform their attack. Since n entries comes from external input completely, the hackers will pass the size to IOM alloc, which represents the location of n entries. The n entries will point to sub-descriptors where the code will execute, so attackers can control the. 政治大. number of sub-descriptors to be 7 rather than 6 entries. It will cause the end of the target IOM alloc allocation will be controlled by these attackers.. 立. Ian Beer[16] finds that the kernel reading a structure from shared memory describe as. ‧ 國. 學. above. The next step in the exploit is to use the AGX driver’s external method interface, such as AGXAllocationList2 :: initW ithSharedResourceList, to allocate two shared. ‧. memory regions. Attackers call external method creates hmem of the AGXSharedU ser −. sit. y. Nat. Client with the AGX driver’s external method interface. IOConnectCallM ethod is More exactly, attackers. io. er. the main method to call external methods on user clients.. can create a new IOAccelResource via the AGXSharedU serClient external method. n. al. Ch. i n U. v. IOAccelSharedU serClient :: newr esource due to they control the two shared memory regions.. engchi. We have a similar goal, which is discovering the risk of external flow. First, we identify ad-fraud-related functions and find the parameters of the functions. We then research the flow to the parameters, to confirm if the flow belongs to the constant, operation or unknown nodes. We create a dependency graph for the flow to the parameters, and we have found there might be an unknown node flowing to the parameter. The source of the unknown nodes may be an external input, so we will consider the unknown nodes to be possible sinks in our work. Take Size violation Ad fraud (Section 4.4.2) as an example, if the flow (to the parameters of setSize functions) is an unknown node, it means that the size of an advertisement may be controlled by external input. As a result, Ads in. 6. DOI:10.6814/NCCU202000021.

(14) unsuitable size will be generated through this behavior. Ian Beer’s team focused on buffer overflow attack, which resulting from external control. Unlike previous research, we analyze the process of calling API functions with externally-input parameters. The third-party issues of apps have been researched by Wang’s team [20]. They think there are many security issues in third-party libraries. They give a brief of existing approaches and their limitations on checking third-party libraries. In our work, we will also scan the code of third-party libraries in applications to find the vulnerability.. 政治大. More research on the security issue has present in previous years. Fake news can cause potential impacts on individuals and businesses, it will cause security problem [21]. The. 立. data on VirusTotal for security issue is also important, both web and mobile security. ‧ 國. 學. research has used it to obtain ground truth [22, 23, 24, 25, 26].. Detecting privacy leaks [27, 28] has also been researched in the Android app. Evasive. ‧. er. io. 2.2. sit. Nat. been widely adopted in security analysis [31, 32, 33, 34].. y. malware analysis is also an important part of security [29, 30]. NLP analysis has also. Detecting Ad fraud. al. n. v i n Cdollars An estimated loss of 19 billion US been generated due to Ad frauds, surpassing h e nhas gchi U. tax-refund fraud [35]. Different forms of fraud on mobile devices has yet been substantially investigated, while research results on web frauds had been acquired abundantly. The third-party Online Ads issue between Ad network publishers and developers has been discussed in AdJust [17]. Previous efforts aimed to detect web frauds, for example, the probe of click frauds based on network traffic [36, 37] or search engine query logs [38], characterizing click frauds [39, 40]. We believe that beneficial clues can be drawn from previous works; therefore, existing approaches against mobile frauds intended to pinpoint fraudulent apps where the fraudulent conducts can be marked statically (the so-called static placement frauds). For instance, Liu’s team [8] has investigated static Ad placement frauds on Windows Phone. 7. DOI:10.6814/NCCU202000021.

(15) via analyzing the layouts of apps. In DECAF [8], they design and implement a scale-able system for automatically detecting Ad fraud in windows phone apps. Crussell’s team [41] has built access for automatically pinpointing click frauds in Android apps. They run the app and record HTTP request trees to perform dynamic analysis. Their strategies are building HTTP request trees and detecting the feature of the query parameter or HTTP headers. Then they predict the fraud behavior through machine learning. Wang’s team [10] also use data analyzing the based approach to detect placement fraud.. 政治大. Dong’s team [9] perform hybrid analysis to detect Ad fraud in Android. They first detect the static Ad placement fraud like DECEF [8] and run apps to record the view. 立. state to detect interactive Ad fraud. They have mentioned that the detect methods above. ‧ 國. 學. are failed in detecting fraudulent interactive conducts. Considering the situation that Interactive frauds cannot be probed effectively, FraudDroid [9] explored a few kinds of. ‧. frauds that have not been familiar with communities. They first build UI State Transition. sit. y. Nat. Graph Construction to simulate the view transition in the Android app. Then they use. io. er. Ad view detector and Ad fraud checker to find the Ad fraud in apps. The UI State methods not only used by detecting Ad Fraud in the FraudDroid[9], but. n. al. Ch. i n U. v. also checking the security problems in protecting sensitive data. Xiao’s team [42] uses. engchi. the UI widget to detect the sensitive input in the Android apps. They detect sensitive input with UI widgets and icons. They define a series of sensitive UI icons and calculate the Icon Association with UI widgets. They perform a static analysis technique on the Android app code to identify sensitive object icons and text icons.. 2.3. Static analysis. Most of the researches about Ad fraud on Android use View simulation and View checking [8, 10, 9]. They spent a lot of time on running applications and recording the UI State. Instead, we use a static analysis technique to find the behavior of Ad fraud. We only scan the binary code converted by the executable file of the applications to identify the. 8. DOI:10.6814/NCCU202000021.

(16) Ad fraud. Additionally, Apple Inc. does not provide View API for developers to do View simulation and checking. Most of the source code of iOS applications written by unscrupulous developers are not open-source, so we need to analysis the binary code of the iOS executable application to check whether its behavior violates Ad fraud. Compared with other analysis techniques that can only analyze the existing strings in the binary code [43]. We can detect the strings represented by the use of the string operation. Wang’s team [43] aims to protect the iOS app by applying obfuscation on the iOS app. 政治大. code. They implement their obfuscation tool based on LLVM IR transformation passes. They improved resilience and overhead. Resilience indicates how well the obfuscation. 立. can withstand automated reverse engineering. They focus on binary size expansion and. ‧ 國. 學. execution slowdown in overhead measurement.. In the past few years, the targets of most researches and analysis tools are detecting. ‧. mobile malicious behavior[44]. Static analysis and dynamic analysis are both suitable. sit. y. Nat. for Ad fraud Detection, however, both the static analysis and dynamic analysis on the. io. er. Android system has been adopted more widely than the iOS system. On the Android system, using a static analysis technique on the binary code of Android. n. al. Ch. i n U. v. apps to detect mobile malicious behavior has been widely adopted for exploring apps.. engchi. Arzt’s team [45] has developed a tool to forward taint analysis and on-demand backwardalias analysis. Li’s team [46] uses a static analysis technique to detect reflective calls in android applications. Also, P. Barros’s team constructs a static analysis technique to generate a control flow graph on Java reflection to detect the data flow of applications. Dynamic analysis on Android has been developed by TaintDroid [47]. They invoke the Dalvik VM to monitor the behavior of an Android app. Unlike Android, there is no such well-established execution platform in the iOS application. This limits the feasibility of dynamic analysis on iOS applications. Binary code analysis techniques have been studied in previous work. Meng’s team and Bao’s team focus on handling function entries and boundaries [48, 49]. Shoshitaishvili’s. 9. DOI:10.6814/NCCU202000021.

(17) team presents a systematized implementation of binary code analysis [50]. Other researchers can compose it to develop new approaches. Indirect control flows and useful tool kits with binary code analysis techniques are also be presented [51, 52, 53]. We can get help from these works when we need to adopt the binary analysis automatically. Bitblaze [54] is useful for malware detection with verification tools integrated. Because it combines dynamic analysis and static analysis components for binary code analysis.. 2.4. Static analysis on iOS application. 政治大 Chamelon apps even users do the same action in an app. They decrypt and disassemble UI 立 layout files and resource files from the app file to find ViewController class names. They Lee’s team detects the Chamelon app in their work[55]. It will go to different views in. ‧ 國. 學. also create a labeled view controller graph (LVCG), which consists of view controllers(VC), UI transitions defined between the VCs and the texts of the VCs. If the VC contains URL. ‧. ViewController, it will not only get the text of the VC but also get the text report from. Nat. sit. y. VirusTotal with the URL. Finally, they use the Semantic Analyzer(NLP model to extract. er. io. keyword and classify with SVM) to determine if the apps contain PHI-UI based on these. al. n. v i n C h”Happy Daily English” The illicit app (Chamelon app) e n g c h i U on iOS system has also been. texts.. report by Claud Xiao[56]. It will be different content according to the location of users is China or not, which causes that the reviewer of Apple can not see the actual content of app. The app can be installed from pirate iOS App Store, such as TutuApp, TweakBox, and App Valley. In the report of Nick Statt[57], users can easily download the pirate apps from these stores because of enterprise certificates, which are designed for the developers in large companies to distribute apps internally. However, these certificates are used by malicious companies to obtain illegal profit or private user information. Lin’s team [58] has bridge the gap between binary analysis on iOS and string analysis. It constructs the control flow graph and dependency graph from iOS executable. Then they adopted string analysis on property checking to detect dynamically loaded classes of. 10. DOI:10.6814/NCCU202000021.

(18) iOS application. The static analysis technique on the iOS application has been adopted to detect crowdturfing UIs. Users can earn profit by performing app downloading tasks [55]. The feasibility of binary analysis for iOS applications has been introduced in PiOS [59]. Egele’s team developed a tool to detect possible leaks of sensitive information from the iOS application and the third party library. They compiled Objective-C code to get the binary code and disassembled the binary. Finally, they construct the control flow graph and perform reach-ability analysis to detect possible leaks. They scanned more. 政治大. than 1,400 iPhone applications to evaluate their approach. Werthmann’s team develop their own Objective-C static analyzer,PSiOS [60].They re-. 立. solve all direct calls to the system call wrapper. They parse the Mach-O header, Objective-. ‧ 國. 學. C Classes, Selectors and API Calls to extract all relevant Objective-C structures. They also leverage MoCFI [61] to enforce control flow integrity dynamically on iOS devices. ‧. running on ARM processors.. sit. y. Nat. Hybrid analysis has been adopted in Deng’s team[62]. They integrate static analysis. io. er. and dynamic analysis in their detect tool iRis.The first step for iRis is similar to the potential privacy leakage detection in Pios, and then they perform dynamic analysis of. n. al. Ch. i n U. v. iOS applications to resolve unsolvable calls. AppBeach [63] and AppReco [64] also perform. engchi. static analysis technique on iOS application to scan all the class/method in the assembly to reveal potential loaded classes and invoked methods. String analysis can determine all reachable states of string variables in string manipulation programs [65]. The technology has been adopted in many fields, such as the security of web applications [66]. While there are a lot of string analysis tools that have been developed [67], none of them have been applied to analyze loaded classes with a dynamic invocation on mobile applications. Reverse engineering in the iOS app has been adept in the works of Wang’s team [43] and Lin’s team [58]. They solve the problems of obfuscation iOS apps and the sensitive API call in iOS apps respectively. The works for Ad relate issue in the web and mobile has been presented in the works of Wang’s team. 11. DOI:10.6814/NCCU202000021.

(19) [17] and Wang’s team [10]. The unknown results generated from the string analysis are also important, they will help us get more precise verification on detecting vulnerability due to the unknown string flow to sink will be a vulnerability too [68]. Julian’s team has implemented a new way to generate a supergraph that extracts lifted information from Lifter Frontend, Class Hierarchy Frontend, and Disassembly Frontend [69]. They reconstruct the control flow graph as a supergraph, a single graph-based representation of combining multiple frontend information. IDA Pro tool [70] does not. 政治大. add any object-oriented concept when they parse the binary of Mach-O file. LiOS [69] has parsed the Objective-C sections to reconstruct the object-oriented construct such as. 立. classes, meta-classes, methods and so on. The information about liOS comes from three. ‧ 國. 學. frontends. Lifter Frontend reconstructs the control flow graph and cross-references to make a McSema-compatible CFG. Class Hierarchy Frontend complete representation of. ‧. the Objective-C class hierarchy. Disassembly Frontend comprises a call graph and cross-. sit. y. Nat. references that resemble the actual high-level program in Swift/Objective-C, rather than. io. er. the immediate references at the assembly level. They implement a built-in pass to link and extend the isolated sub-graphs from the three frontends. They also provide an extensible. n. al. Ch. i n U. v. graph query language ”Cypher” to query the information they store in the Neo4J graph database [71].. engchi. We can check Unavailable API in supported iOS SDK with Infer[72]. They offer a supported and unsupported API list according to different versions, and they will check the version declared by developers. An error warning will be triggered if the developers use the API only defined in a version higher than the version they support. This tool is also used in Facebook to check their software for high quality, and they integrate it into the software development cycle of mobile applications both on iOS and Android at Facebook[73]. Separation logic[74] and Bi-abduction[75] is the root for Infer [73]. Separation logic is a mathematical logic for formal verification of software. It can break the reason into the. 12. DOI:10.6814/NCCU202000021.

(20) chunk corresponding to operations on memory first, and then composing the reasoning chunks together to facilitate scalability. Smallfoot[76] is also a successful academic tool on Separation Logic before Infer. Bi-abduction can be used to form local reasoning of logical inference for Separation logic. A ⊢ B which says that A implies B. Infer represents the program statements in an internal theorem prover. The Infers question in Bi-abduction is A∗?antif rame ⊢ B∗?f rame. Infer needs to discover a pair of fram and antiframes to make the statement valid. Inf er.AI is a framework for developers to write their own abstract interpretation-. 政治大. based checkers, also known as intraprocedural analysis[77], provided by the Infer team. They also provide a list of pre-defined abstract domain and transfer functions wrote with. 立. Inf er.AL, a declarative language for writing linters in Infer. Take the linter Unavailable. ‧ 國. 學. API in supported iOS SDK as an example. They use Define-checker to define Unavailable class in supported ios sdk checker[78]. In this checker, it will call Class unavailable in. ‧. supported ios sdk to check its unsupported version first and check if Call class method is. sit. y. Nat. alloc or new to make sure it will call the method. The Class unavailable in supported ios sdk. io. er. is defined in cPredicates.ml[79]. It will compare each API allowed version and the support version defined by developers. If the allowed version is greater than the support version,. al. n. it will alarm the message.. Ch. engchi. i n U. v. Previous investigators mentioned Infer in their research[80, 81, 82]. DeepBugs[80] is a learning-based and name-based bug detection. It is a checker that learning the rules from the incorrect code examples to check the bug in the new incoming code. Rijnard’s team [81] creates an Automated Program Repair program based on Infer. It will automatically infer the Separation Logic assertions over the statement of the program and repair the bug in the code. It can handle various types of bugs with their dynamic APR techniques. Mark’s team[82] research on the scale-ups problems of static and dynamic analysis. They discuss six aspects(Irrelevant, Unconvincing, Misdirected, Unsupportive, Closed System and Closed Mind) of the problem based on the tool Infer and Sapienz[83]. Alshahwan’s team describes the deployment of the Sapienz Search-Based Software. 13. DOI:10.6814/NCCU202000021.

(21) Engineering (SBSE) testing system in their work[83]. Sapienz has been deployed in production at Facebook to design test cases, localize and triage crashes to developers and to monitor their fixes. From the example AL checkers Infer provide[84], we can see that they can detect class inheritance, instance method call, class method call, the type of parameter checking and so on. Apart from checking the constant of class, method, and parameters in our work, we do string operation examination. A varied combination of API may be adopted by developers, for instance, the type of parameter is not constant but string operation, such. 政治大. as union or concat. It is hard for Infer to detect it because it needs to know the flow to the parameter. On the contrary, our string dependency graph can represent the flow. 立. to the parameter, and we can discover the situation in our research. In addition, the. ‧ 國. 學. major input of Infer is source code. Considering most applications uploaded to the App Store are not open-source, which means the source code cannot be directly acquired, we. ‧. research binary code instead of source code.. sit. y. Nat. n. al. er. A Motivating Example. io. 3. i n U. v. To illustrate our approach, we will use an Ad fraud example to explain our approach. Ch. engchi. concisely. It contains multiple advertisements in one ViewController (The same thing mentioned in Lee’s team work[55]), shown in Figure 1. Displaying multiple Ads in inappropriate numbers is an Ad fraud defined in previous work [8, 9]. It also violates the App Store Review Guidelines 3.1.7 [85]. A ViewController denotes a view in the apps. There are multiple Ad views in the same ViewController shown in Figure 1. However, it will be a Multi-view violation of Ad fraud if a view adds multiple Ad views. We will count the number of Ad views added by addSubView method called by one ViewController. If the number is higher than one time, we will record it as Multi-view violation Ad fraud. In this example, we will find the ViewController nodes in this app first, which represents the views that users will touch and see in an application. Then we perform algorithm 14. DOI:10.6814/NCCU202000021.

(22) 政治大. Figure 1: View of Multi-view violation app. 立. 6 to detect if it occurred Multi-view violation. We need to construct the control flow graph. ‧ 國. 學. for the application first.. ‧. Control Flow Graph Construction To perform our analysis, we will get the segment information data in JSON format as follows. We extract assembly code to the segment. y. Nat. sit. information data. Through ARM Information Center [86], we can determine the relation. n. al. er. io. between control flow graph nodes. Since we can know the definition for each command. i n U. v. node in ARM Information Center, we will present it as a control flow graph node in our. Ch. engchi. work. It contains three things in the control flow graph node: command, first parameter, and second parameter. Each command has its definition, so we will try to understand what the command does so that we can know the first parameter and second parameter are register or address, and the relation between these two parameters. { ” subroutineData ” : { ”. RBDatabaseManager favoriteRecipes ”: { ” label ”: ”. RBDatabaseManager favoriteRecipes ” ,. ” locations ”: [ { ” l a b e l ” : ” ENTRY LOC. RBDatabaseManager favoriteRecipes ” , 15. DOI:10.6814/NCCU202000021.

(23) ”rawAsm ” : ”PUSH\ t \ . . . ” } ], ” constMap ” : { ” v a r 1 0 ” : −16, ” v a r 1 4 ” : −20 } ... }. 政治大 We build the Control Flow Graph through Binflow[58]. 立. Binflow extracts segment. ‧ 國. 學. information to resolve register values of indirect jumps to link these routines. During the CFG construction, it will also mark the dependency relations of registers for each assembly. ‧. statement. The process of generating the dependency relations is that We will parse each assembly statement to control flow graph node(CFG node) and construct dependency. y. Nat. al. er. io. relation. See section 4.1.2 for more detail explanation.. sit. relation for CFG node at the same time. We can trace the value through the dependency. n. v i n C h more than 100 U The real control flow graph contains e n g c h i thousand nodes.. We will show the CFG graph of the app in Figure 2. It will be only one part of CFG.. We need to construct the dependency graph and explain what is property checking before we go to the following checking steps.. Dependency Graph Construction The dependency graph will be useful when we need to detect with dynamic invocation where exact calls and their arguments depend on runtime values of nested parameters. We can perform algorithm 1 with the dependency relation of a control flow graph node mentioned in the section 4.1.2. Each control flow graph node has its dependency relation, which uses a key-value set to represent. The dependency key will represent that the dependency value depends on.. 16. DOI:10.6814/NCCU202000021.

(24) 立. 政治大. ‧ 國. 學 Figure 2: Part of CFG Graph. ‧. y. Nat. We adopt static flow analysis techniques on the iOS executable with which we build. io. sit. dependency graphs on parameters of target functions. Since we can traverse the depen-. n. al. er. dency relations of the corresponding register of the parameters, we can determine the parameters will be a constant or external input.. Ch. engchi. i n U. v. Checking the depKey of the input node is the first step of constructing a dependency graph. We will go into a while loop to createDP G until we get all the information we need, such as the dependency relation, the predecessor of the node and so on. The detail processing steps of constructing a dependency graph shown in section 4.1.3. The dependency graph we generate is like Figure 3 4 5. In each step of Ad fraud detection, we may construct a dependency graph for the control flow graph node we interested to perform analysis. After we build the dependency graph, we can also build the automata of it and perform property checking.. Property Checking When we need to determine which string will flow to the sink, we will perform string property checking. Take Interstitial violation Ad fraud as an example,. 17. DOI:10.6814/NCCU202000021.

(25) Figure 3: DPG 1. Figure 4: DPG 2. Figure 5: DPG 3. 政治大. we need to perform algorithm 3 to check if the apps use the interstitial API with dynamic. 立. invocation. First, we get the function nodes that represent N SClassF romString. If. ‧ 國. 學. the node belongs to a function node of N SClassF romString, we will build dependency graphs on parameters(R0) of this function. After generating the dependency graph, we. ‧. will build the string automata of it so that we can determine the state of the input string. sit. y. Nat. for the dependency graph. We will also build string automata for the Interstitial API. io. er. string defined in different string among Ad network providers. Then we can check the intersection between these two automata. If there is an intersection between these two. n. al. Ch. i n U. v. automata, we will record the result as violating Interstitial violation Ad fraud. We will. engchi. use this property checking at the moment when we want to know what string can flow to the dependency graph we are interested in.. Check the Ad fruad We get the addSubView functions of the ViewController nodes first. Then we will build dependency graphs on the parameters of these functions. We will trace the dependency relation of the parameter to find if it will be the same node as the top node of the Figure 4. We will construct the automata for this union operation dependency graph(union operation means that it depends on one of the nodes it connects). This automata can represent potential API invocations with their argument values. In this example, potential API will be FBAdView, FBAdBannerView, FBAdLink or FBAdNativeAd. We will also 18. DOI:10.6814/NCCU202000021.

(26) build automata to represent the list of Adview API string, which contains FBAdView and FBAdBannerView. If we check the result of the intersection between these two automata, we can find it will be true. Finally, we record this ViewController node add Ad view one time. We can know that this ViewController has added multiple Ad views through the table 7.. 4 4.1. Ad fraud detection analysis Static Analysis on iOS application. 政治大 The process of static analysis for systematic API vulnerability checking in iOS executables 立. has mentioned in the previous work [58]. The detection process chart is like Figure 6. We. ‧ 國. 學. will use this approach to detect Ad fraud.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 6: Detection flow chart. 19. DOI:10.6814/NCCU202000021.

(27) 4.1.1. Preprocessing. The first part is to download and install online apps from Apple’s App store into a jailbroken iOS device. We can access the file system directly to fetch the target binary. We use a third-party binary decryption tool Dynist to generate the decrypted binary. After generating the decrypted binary, we then use third party disassembler tools IDAPro to generate the plain text format assembly code. Then we use the tool Binflow [58] to extract segment information from the assembly code. The output of segment information extraction contains 14 segment files, each corre-. 政治大. sponding to an extracted segment in the assembly. The formal definition for each segment. 立. Control Flow Graph Construction. 學. 4.1.2. ‧ 國. is defined in the document [70].. ‧. We build the Control Flow Graph through Binflow[3]. Binflow needs the extract segment information mentioned in the previous step to resolve register values of indirect jumps. y. Nat. er. io. relations of registers for each assembly statement.. sit. to link these routines. During the CFG construction, it will also mark the dependency. al. n. v i n C hFigure 7 and Figure Take the e n g c h i U 8 as example,the Figure 7 shows. The dependency relations of registers can be used when it needs to know the steps of Ad related API call.. assembly code that triggers a function-call with B instruction. We need to resolve the value of R6 to get the target function. So we plan to adopt static backward slicing to resolve R6. When we scan forward, we will see MOV instruction first. So we record R2 as dependency key, string malloc as dependency value due to R2 depends on string malloc. Then we will see the next MOV instruction and do the same thing. We record R6 as dependency key, R2 as dependency value due to MOV means it copies the value of R2 into R6, that is to say, R6 depends on R2. We will record the dependency relation in other commands as well, such as LDR/STR, which means loads/store the value from other registers. After recording the dependency relation, we can track the value we interested in. For example, when we want to know the. 20. DOI:10.6814/NCCU202000021.

(28) value of R6 of B instruction, we can trace the value by the dependency relation shown in Figure 8.. Figure 7: Location. Figure 8: Relation. 政治大 We will use GetCallerSbrt to find the callSbrtN ode list of the input node in Algorithm 立. ‧ 國. 學. 5 6 7. We will give detail explanation as follows. When we construct control flow graph, we will confront a node that denotes obj msg sender. We can know the register and the. ‧. message it receives from others from this node. That is to say, we can know the node and. cmd ). er. io. sit. Nat. objc msgSend ( i d r e g I d , SEL. y. the subroutine list it calls. The code example is as follows.. There are two parameters in obj msg sender function. The first parameter is regId.. n. al. Ch. i n U. v. It denotes the register id. We will call this id as receiver since it receives the message. engchi. from others. The cmd is the method that we need to lookup in the methods table of the assembly, and it will be called message. That is to say, we will send the message cmd to the receiver regId through obj msg sender function. After we explain the parameters, we will explain how we use it in our analysis. Because we know which methods will be sent to the register id, we will record it as the subroutine list of the register id. At this moment, we do not know what the register id represents. The cmd only denotes the method the node wants to send, but we cannot know it is called from which node. For instance, if there is a setView method, it may be called by ViewController instance or AdViewController instance. When we need to know the setView method of AdViewController instance, we need to get the instance id (register id) list of AdViewController first. We will use Algorithm 4 to get the id list of AdViewController. After we get the regId 21. DOI:10.6814/NCCU202000021.

(29) of AdViewController, we know which subroutines or methods the regId was called due to the objc msgSend function. We will record the subroutines as a subroutine list of it.. 4.1.3. Dependency Graph Construction. When we construct the dependency graph, we will determine the sensitive function we are interested in. we build dependency graphs on parameters of the sensitive functions to reveal which potential API invocations with their argument values. This can be done by traversing the dependency relations of the corresponding register (sink) backward up to. 治政 We then conduct string analysis on dependency graphs 大 to reveal potential API invoca立 tions with their argument values on ad fraud violations. In each step of Ad fraud detection constants or external inputs.. ‧ 國. 學. analysis, we need to construct dependency graphs on parameters of target functions. We will show how to create a dependency graph (createDP G) in algorithm 1 as follows.. ‧. Algorithm 1 explains how to createDPG with a node and depKey. First, We will. sit. y. Nat. build a global variable depGraph, which is used to record each node we trace. We need. io. er. to know the dependency relation of a control flow graph node mentioned in section 4.1.2 to perform algorithm 1. Each control flow graph node has its dependency relation, which. n. al. Ch. i n U. v. uses a key-value set to represent. The dependency key will represent that the dependency value depends on.. engchi. The first step of constructing a dependency graph is to execute HasDepKey, checking if the depKey of the input node is what we want. If not, we will go into a while loop to createDP G. The logic about how to createDP G will be described later. At line 4, we will check if the node is CallSbrtN ode. If it is CallSbrtN ode, we will extend the dependency graph with R0 due to the return value stored in R0. Then we will get the predecessor from the node. If there is no predecessor for this node, it means we reach the top of the root node and still cannot find any node contain depKey we want, so we record Unknown Node. If the node contains only one predecessor, we just update the depKey with the current. 22. DOI:10.6814/NCCU202000021.

(30) node. The depV alue may store the depKey so that we can trace with the depKey to get the previous depValue. If the node contains two or above predecessors, it means that it will be a unionN ode, so we will add a union node on the graph and createDP G from it with branch nodes. When a while-loop end, it will go to line 19 in algorithm 1. When the program executes in this step, it means that the node contains the depKey we want. So we will try to resolve the depV alue. The depV alue denotes a node. If the node is a literal, we will get the literal value and add it to depGraph. If not, we will still try to extend with the same step in line 7 to line 16.. 4.2. 政治大. Overview of Ad fraud detection analysis. 立. The overview of the checking step is shown in Figure 9. After getting the preprocess. ‧ 國. 學. result and the segment information from the assembly code, we will use control flow graph construction, dependency graph construction and property checking in each ad. ‧. fraud analysis step.. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 9: Ad fraud detection analysis In the section 1, we introduce Size violation Ad fraud, Interstitial violation Ad fraud, Multi-view violation Ad fraud, and Overlay-view violation Ad fraud. We will detect them by our algorithm in this section. Since developers will do Ad fraud actions with dynamic invocation, we will perform different algorithms to detect them. 23. DOI:10.6814/NCCU202000021.

(31) Algorithm 1 CreateDPG 1: procedure createDPG(node, depkey) 2: ▷ node is a control flow graph node 3: while !HasDepKey(node, depkey) do 4: if node ∈ CallSbrtN ode then ▷ current node is a function call,the return. value stored in R0 5: createDPG(node,R0) 6: end if 7: preN odeList ∈ getP redecessor(node) 8: if preNodeList.length = 0 then 9: depGraph.add(UnknownNode(node)) 10: else if preNodeList.length = 1 then 11: createDPG(preNodeList[0],depkey) 12: else 13: depGraph.add(unionN ode(c)) 14: for branchN ode ∈ getBranchN odes(node) do 15: createDPG(branchNode,depkey) 16: end for 17: end if 18: end while 19: depV al ← getDepV alF rom(node, depKey) 20: if depV al ∈ Storable then 21: cf gLiteralN ode ← getN ode(depV al) 22: depGraph.add(cf gLiteralN ode) 23: else 24: preN odeList ∈ getP redecessor(node) 25: if preNodeList.length = 0 then 26: depGraph.add(UnknownNode(c)) 27: else if preNodeList.length = 1 then 28: createDPG(preNodeList[0],depVal) 29: else 30: depGraph.add(unionN ode(c)) 31: for branchN ode ∈ getBranchN odes(c) do 32: createDPG(branchNode,depVal) 33: end for 34: end if 35: end if 36: end procedure. 立. 政治大. n. er. io. sit. y. ‧. ‧ 國. 學. Nat. al. Ch. engchi. 24. i n U. v. DOI:10.6814/NCCU202000021.

(32) We cannot get the source code of apps in the App Store, so we use binary code (iOS Executables) of the apps instead. We will perform Ad fraud detection base on static analysis techniques on iOS. We will use the control flow graph generated from the assembly of iOS applications to perform our analysis. The following sections will show how to find different kinds of Ad fraud.. 4.3. Find the Ad Related API. To discover various Ad frauds effectively, we first check the behavior of calling Ad related. 政治大 include Ad related API. That is to say, in order to search Ad frauds efficiently, we need 立 to focus on apps that included Ad related API before performing our Ad fraud detection. API in the app. Then we can perform our Ad fraud detection analysis on the apps which. ‧ 國. 學. analysis. If an app does not include any Ad library, the Ad frauds behaviors happening in this app is not possible.. ‧. Before checking the behavior of calling Ad related API, we need to collect Ad related. Nat. sit. y. API Table 1, which contains Ad related API provided from Ad network provider. The. Google Admob Facebook Ads Other Ad network. al. n. Ad network Common. er. io. API string in Table 1 will be the parameter of the algorithm 2.. i n U. v. Ad Related API ASIdentifierManager GADInterstitial, GADMediaView, GADNativeAd, GADMediationCredentials, GADVideoOptions, other GAD API FBAdView, FBInterstitialAd, FBMediaView FBNativeAdView, FBMediaView, other FBAd API ADInterstitialAd, FlurryAdInterstitial, MOBFAdViewController SSiPadViewController, BaiduMobAdView ,other AD API. Ch. engchi. Table 1: Ad related API Table. Developers may use N SClassF romString to call the API for convenience or call it maliciously, so we will check the behavior of calling Ad related API from N SClassF romString. BuildStringAutomataF romDP G is the function we will traverse the input dependency graph to find the string operation and string expression, then build the automata for it. BuildStringAutomata is the function we will build the automata for the input 25. DOI:10.6814/NCCU202000021.

(33) Algorithm 2 Check Ad related API with dynamic invocation algorithm 1: procedure checkIAD(CF G, ApiString) 2: ▷ CFG is the control flow graph of an app 3: 4: for ∀node ∈ CF GAD do 5: if node ∈ N SClassF romString then 6: DP G ← createDP G(node) 7: SDP G ← BuildStringAutomataF romDP G(DP G) 8: SAP I ← BuildStringAutomata(ApiString) 9: checkResult ← SDP G ∩ SAP I 10: Record(checkResult, SDP G, ApiString) 11: end if 12: end for 13: end procedure. 政治大 string. We will use the algorithm 立 2 to check an Ad related API string shown in Table 1.. ‧ 國. 學. Detailed descriptions of the algorithm 2 will show as follows. Building dependency graphs on parameters of N SClassF romString functions is The first step. The second. ‧. step is building a string automata of the constructor or method of Ad related API. Then check if there is an intersection between the string automata of the dependency graph and. y. Nat. sit. these Ad related API string automata. So that we can reveal Ad related API invocations. n. al. er. io. with their argument values on N SClassF romString functions. The final step is to record. i n U. v. the checking result so that we can skip the apps without Ad related API when we perform. Ch. engchi. the Ad fraud detection algorithm as follows.. 4.4. Check Ad fraud. If an app includes Ad related API, we will start performing Ad fraud detection in Figure 9. We will use different algorithms to detect different Ad fraud. The detailed steps of the algorithm will show in subsections as follows. When our algorithm needs to solve the possible string of parameters, we will use the BuildStringAutomataF romDP G function and BuildStringAutomata function mentioned in section 4.3 to check. We will use BuildStringAutomataF romDP G to traverse the parameter of the input DPG to find the string operation and string expression, then building the automata for it. Then we will use BuildStringAutomata to build the automata for the string we con26. DOI:10.6814/NCCU202000021.

(34) cerned about. After building these two string automata, we will check the intersection for these two automata so that we can reveal which potential API string invocations with their argument values.. 4.4.1. Interstitial violation Ad fraud. The behavior of showing Interstitial Ad is harmful to user experience. Interstitial violation Ad fraud means developers will call ADInterstitial API to generate an interstitial Ad. ADInterstitial API represents the union set of the Interstitial API provided by Ad. 治政 the class with dynamic invocation. Then we conduct 大 string analysis on these dependency 立 graphs to determine if these parameters can take the ADInterstitial API string as an. networks in Table 1. We will determine the flow to the parameter of functions that load. ‧ 國. 學. input. That is to say, developers may use reflection to call the ADInterstitial API to generate an interstitial Ad, so we will perform the algorithm 3 to check the behavior of. ‧. calling ADInterstitial API with dynamic invocation. Developers need to use alloc or. sit. y. Nat. init method to call it from other locations.. io. er. When we perform algorithm 3, we first get all the nodes in the control flow graph of the app which includes Ad related API. Then we check if the node is a function node of. n. al. Ch. i n U. v. N SClassF romString. If the node belongs to a function node of N SClassF romString,. engchi. we will build the dependency graph on parameters(R0) of the functions through the algorithm 1. After getting the dependency graph, we will build the string automata of the dependency graph through buildStringAutomata, which can determine the state of the input string for the dependency graph. We will also build automata for the ADInterstitial API string. The string of the Ad API about building the interstitial Ad is various in different Ad network providers. We have collected them before this detection. Then we can check the intersection between these two automata. If there is an intersection between these two automata, it means that the apps have called ADInterstitial API. Then we will use hasInitM ethod function to check if the node calls alloc or init method in other locations of the control flow graph. If the node matches the two situations, We will record. 27. DOI:10.6814/NCCU202000021.

(35) the result as violating Interstitial violation Ad fraud. Algorithm 3 Check the Interstitial violation Ad fraud 1: procedure checkIVAF(CF GAD ) 2: ▷ CF GAD is the control flow graph of an app included Ad related API 3: for ∀node ∈ CF GAD do 4: if node ∈ N SClassF romString then 5: DP G ← createDP G(node, ”R0”) 6: SDP G ← buildStringAutomata(DP G) 7: SV S ← buildStringAutomata(adInterstitalAP IString) 8: checkResult ← SDP G ∩ SV S 9: if chekResult = true ∧ hasInitM ethod(node) then 10: Record(checkResult, SDP G, SV S) 11: end if 12: end if 13: end for 14: end procedure. 立. ‧ 國. 學. 4.4.2. 政治大. Size violation Ad fraud. ‧. To check Size violation Ad fraud, it is necessary to determine the flow to the parameters. y. Nat. sit. of the Ad view functions(which is used for setting the Ad size). For each parameter. n. al. er. io. corresponds to Ad view functions, we construct a dependency graph on parameters of. i n U. v. these functions. We conduct a string analysis on these dependency graphs to reveal. Ch. engchi. potential API invocations with their argument values on Ad view functions. By checking the intersection of these values with patterns that characterize a violate value for Ad view, we can detect Size violation Ad fraud. The algorithm 5 shows how we check Size violation Ad fraud in the apps which include Ad related API. When displaying Ads in the application, some Ad networks will provide API that developers can decide the size of the Ad view. The size of the Ad should be reasonable, so we will try to resolve the size of an Ad view, and check if the behavior is Size violation Ad fraud. There are some steps we need to do first. The algorithm 4 called by algorithm 5 to get the Ad view node list from the control flow graph. Because we record the callSbrt for each node when we construct the control flow graph, so we can use function GetCallerSbrt to find the callSbrtN ode list of the input node. 28. DOI:10.6814/NCCU202000021.

(36) The details for how we find the callSbrt for each node shown in the following section. We scan the whole control flow graph node first. When we confront a node that denotes obj msg sender, it will give us what node it will send and what method it will call. So we can know which subroutine or methods the node has called. We will record them as callSbrt for each node. Algorithm 4 Check Ad-view algorithm 1: procedure checkAdView(CF G) 2: ▷ CFG is the control flow graph of an app 3: ▷ AdViewString is the api of Ad view provided by Ad Network 4: AdV iewList ← [] 5: for ∀node ∈ CF G do 6: if node ∈ N SClassF romString then 7: DP G ← createDP G(node, R0) 8: SDP G ← buildStringAutomata(DP G) 9: SAP I ← buildStringAutomata(AdV iewClassString) 10: checkResult ← SDP G ∩ SAP I 11: for ∀callerN odes ∈ node.getCallerN odes do 12: AdV iewList.add(callerN odes) 13: end for 14: end if 15: end for 16: returnAdV iewList 17: end procedure. 立. 政治大. ‧. ‧ 國. 學. er. io. sit. y. Nat. al. n. v i n C h from an app included We will get the control flow graph e n g c h i U Ad related API. We get the Ad. view node list from the control flow graph with algorithm 4. Then we get the subroutine. node list called by the Ad view node. If the subroutine node, such as function or method, is in the Ad View API provide by any Ad Network, we will build the dependency graph on parameters of this subroutine node. V iolateSize denotes that the violate string, such as CGRectZero. Then we build the violate size string automata and check if there is an intersection between these two automata. It will be reported Size violation Ad fraud if there is an intersection. Finally, we record the result and calculate the number of the app which contains the behavior of Size violation Ad fraud.. 29. DOI:10.6814/NCCU202000021.

(37) Algorithm 5 Check Size-violation Ad fraud algorithm 1: procedure checkSVAF(CF GAD ) 2: ▷ CF GAD is the control flow graph of an app includ Ad related API 3: ADV iewList = checkADV iew(CF GAD ) 4: for ∀adviewN ode ∈ ADV iewList do 5: CallSbrtList ← GetCallerSbrt(adviewN ode) 6: for ∀sbrt ∈ CallSbrtList do 7: if sbrt.name ∈ ADV iewM ethodList then 8: DP G ← createDP G(sbrt, ”R0”) 9: SDP G ← buildStringAutomata(DP G) 10: SV S ← buildStringAutomata(V iolateSize) 11: checkResult ← SDP G ∩ SV S 12: Record(checkResult, SDP G, SV S) 13: end if 14: end for 15: end for 16: end procedure. 學. 4.4.3. ‧ 國. 立. 政治大. Multi-view violation Ad fraud. ‧. Each Ad view will detach on a ViewController. The same ViewController should not. y. Nat. detach more than one Ad view. If it does, it will be a Multi-view violation Ad fraud.. io. sit. There will be an instance of ViewController, and it will call addSubView method. We. n. al. er. will check if the parameter of addSubView is an instance of any kind of Ad view. We will. i n U. v. count the number of times that the parameter of the addSubView called by the instance. Ch. engchi. of ViewController is an instance of Ad view. When the number is higher than one time, we will call it Multi-view violation Ad fraud. Developers may import the Ad view with dynamic invocation, we will first use the algorithm 4 to find the instance called Ad view dynamically. We will store them in AdV iewList. Then we will find the addSubView method called by each ViewController of Ad network. Get the parameter of addSubView method we find and check if it is Ad view node in the AdV iewList. If it matches the process we give above, we will use the RecordAddSubAdV iew function to record the app has called addSubView with Ad view one time. If the number is higher than one time, we will call it Multi-view violation Ad fraud. The steps described above have shown in algorithm 6.. 30. DOI:10.6814/NCCU202000021.

(38) Algorithm 6 Check Multi-view violation ad algorithm 1: procedure checkFMAF(CF GAD ) 2: ▷ CF GAD is the control flow graph of an app include Ad related API 3: for ∀V iewN ode ∈ CF GAD do 4: ADV iewList = checkADV iew(cf g) 5: for ∀adviewN ode ∈ ADV iewList do 6: CallSbrtList ← GetCallerSbrt(V iewN ode) 7: for ∀sbrt ∈ CallSbrtList do 8: if sbrt.name ∈ addSubV iew ∧ sbrt.param ∈ adviewN ode then 9: RecordAddSubAdV iew(CF GAD ) 10: end if 11: end for 12: end for 13: end for 14: end procedure. 立. Overlay-view violation Ad fraud. 學. ‧ 國. 4.4.4. 政治大. A ViewController will call addSubView function to add many types of view. If one. ‧. ViewController adds ad view and full-screen view at the same time, the chance that it will be an Overlay-view violation Ad fraud is high. We will check if the parameter of. y. Nat. sit. addSubView of the ViewController is an instance of any kind of Ad view. Then checking. n. al. er. io. that if the View Controller calls addSubView to add an instance of the full-screen view. If. i n U. v. a ViewController matches the two behavior we mentioned above, we will call it Overlay-. Ch. engchi. view violation Ad fraud. The algorithm of checking Overlay-view violation Ad fraud is shown in algorithm 7. checkF ullV iew functions is almost the same as the checkADV iew but the input string automata are Full relate API.. 5 5.1. Evaluation Environment. We perform Ad fraud detection analysis in the real environment as the flow shown in Figure 9. First, We download iOS apps through Sikuli based approach[14] from Apples App Store via iPhone 7. We then install the App, fetch its binary, and decrypt the binary. Next, we construct its ARMv7 assembly with IDA Pro, then we generate its 31. DOI:10.6814/NCCU202000021.

(39) Algorithm 7 Check Overlay-view violation ad fraud algorithm 1: procedure checkFOAF(CF GAD ) 2: ▷ CF GAD is the control flow graph of an app include Ad related API 3: for ∀V iewN ode ∈ CF GAD do 4: ADV iewList = checkADV iew(cf g) 5: F ullV iewList = checkF ullV iew(cf g) 6: for ∀adV iewN ode ∈ ADV iewList ∧ ∀f ullV iewN ode ∈ F ullV iewList do 7: CallSbrtList ← GetCallerSbrt(V iewN ode) 8: for ∀sbrt ∈ CallSbrtList do 9: if sbrt.name ∈ addSubV iew ∧ sbrt.param ∈ adV iewN ode then 10: RecordAddSubAdV iew(CF GAD ) 11: return 12: end if 13: end for 14: for ∀sbrt ∈ CallSbrtList do 15: if sbrt.name ∈ addSubV iew ∧ sbrt.param ∈ f ullV iewN ode then 16: RecordAddSubF ullV iew(CF GAD ) 17: return 18: end if 19: end for 20: end for 21: end for 22: end procedure. 立. 政治大. ‧. ‧ 國. 學. sit. y. Nat. io. er. control flow graph by Binflow script [58]. In the last step, we perform the proposed Ad fraud algorithms 2,3,4,5,6 in section 4 to detect the Ad related API and Ad fraud by using. al. n. the generated CFG.. Ch. engchi. i n U. v. We collected 30 thousand apps from App Store, covering 33 genres. The release dates of these apps range from 2008 to 2017, all of which have been updated after 2016. The number of generated control flow graph nodes could be more than 10 thousand which makes it a memory-intensive task. Hence, we deploy a high-end server with 24 GB RAM to generate the control flow graph of an App. Our analysis is based on the binary code on iOS 9, which belongs to the arm v32 architecture. To confirm whether the apps we analyzed have violated Ad fraud violation, we downloaded the latest version of the app to observe its violation of Ad fraud. We have confirmed the revision history of the violated applications. There is no description of the way they change the presentation of advertisements. Therefore, we think there are. 32. DOI:10.6814/NCCU202000021.

(40) violations in the code of the latest applications if we have detected Ad fraud violation in the binary code of them. We also provide a GitHub to represent the result of Ad fraud detection analysis [87]. The results until the control flow graph will report in the Binflow github[88].. 5.2. Result of detecting Ad related API. 立. 政治大. ‧ 國. 學. Figure 10: Apps use NSClassFromString to call class. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 11: The number of calling Ad related API in the NSClassFromString apps After the preprocessing step in section 4.1.1, we get 1391 apps. Among these apps, we detect that 514 apps use N SClassF romString to call class (shown in Figure 10). We then check if there is an intersection between two string automata, one represents the parameter of N SClassF romString and the other represents the string automata of Ad related API (shown in the Figure 11). In summary, we find 208 apps call Ad related API.. 33. DOI:10.6814/NCCU202000021.

(41) 5.3. Result of Ad fraud detection. The count for each Ad fraud violation shown in Figure 12. In the 208 apps, we further identified 70 apps having Interstitial violation ads, 48 apps having size violation ads, 31 apps having multi-view violation ads, and 19 apps having overlay violation ads.. 立. 政治大. ‧ 國. ‧. 5.3.1. 學. Figure 12: Number of apps in each Ad fraud. Result of Interstitial violation Ad fraud. y. Nat. n. al. GADInterstitial IMAdInterstitial FBInterstitialAd FlurryAdInterstitial. er. io. one of the ADInterstitial API class string in Table 2.. sit. We will use algorithm 3 to check the Interstitial violation Ad fraud. The input will be. i n U. v. DFPInterstitial MMInterstitial MPInterstitialAdController. Ch. engchi. IMInterstitial ADInterstitialAd ALInterstitialAd. Table 2: ADInterstitial API Table. After we generate the dependency graph and check the intersection between two automata through the algorithm 3 we mentioned in section 4.4.1. We scan 208 apps and find that there are 3386 times call N SClassF romString. The intersections between the two automata are 419 true results. There are no unknown nodes in 143 results of true results. The results are shown in Figure 13. That is to say, we find that 70 apps violate Interstitial violation Ad fraud in the 208 apps. In the 70 apps, they violate Interstitial violation Ad fraud 419 times as the charts 34. DOI:10.6814/NCCU202000021.

(42) Figure 13: Count Detail in Interstitial violation Ad fraud shown above.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 14: Count in Interstitial violation Ad fraud (NotUnknown) by Operation type. We will show the 143 not-unknown true results in different operation types (shown in Figure 14) as well as the 276 unknown true results (shown in Figure 15). Directly operation means that the flow to the parameter is constant propagation. Figure 14 and Figure 15 will show the violation times(not unknown/unknown) and the number of the app according to the string operation it belongs to. In this Ad fraud, the results of a constant propagation are 142 times in not unknown true results and 92 times in unknown true results. the results of the string operation Concat are no results. The string operation Union is 1 time in not unknown true results and 184 times in unknown true results. 35. DOI:10.6814/NCCU202000021.