• 沒有找到結果。

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

After repeating the generative process of LDA, it would reach a convergence and steady result model. As a result, they proved that LDA model had a better prediction result than the mixture of unigrams model and the pLSA model.

2.3.4 Comparison of Topic Models

After the instigation of approaches of topic model, it is obvious that each approach has its unique strength and inevitable weakness. In order to find the most qualified method for our research, we filter out the less suitable methods with the comparison step by step.

Each points of comparison would be discussed separately in the following paragraphs.

For statistics, LDA and pLSA have strong enough background to graph their model in Fig. 5 and Fig. 6. They both use the concept of probability and their advantage is that the topics, extracted by these models are invariably interpretable. LDA and pLSA facilitate the analysis of model output, unlike the uninterpretable directions produced by LSA.

For efficiency, LSA has the lowest executing time than LDA and pLSA because it doesn’t need to calculate the probability of model. LDA is the second efficient method because it estimates the probability fast with its hyper-parameter Dirichlet priors to the topic distribution. On the contrary, pLSA has to estimate all probability in their equation so it has the lowest efficiency among all the methods.

After comparing, we first filter out LSA because its lack of statistical model. Second, we filter out pLSA because LDA has better efficiency on executing. As a result, we use LDA as our topic model method.

2.4 Behavior Analysis

2.4.1 Sensitive Behaviors of Mobile Applications

Nowadays, there are lots of mobile applications in the markets for satisfying users need.

Besides, an application often bears multiple functionalities, called mobile behavior of the mobile application. Mobile application developers design these mobile behaviors to cater

to users and gain profit recently. However, if a developer placed a malicious behavior into an application or misused the user information from an application, user’s privacy information would be leaked or be recorded to database which is handled by hackers or malicious developers.

Felt, Finifter, Chin, Hanna, and Wagner [34] classified mobile threats model into three types: Malware, Personal Spyware, and Grayware. They evaluated the malicious mobile applications of different platforms, such as iOS, Symbian, and Android, and classified their malicious behavior as follows: Novelty and Amusement, Selling User Information, Stealing User Credentials, Premium-Rate Calls and SMS, SMS Spam, Search Engine Optimization and Ransom.

Enck, Octeau, McDaniel, and Chaudhuri [35] classified malicious behavior of mobile application into two categories as follows: information misuse and phone misuse , where information misuse means that privacy sensitive information on the devices (e.g. geo-graphic location, IMEI, IMSI, ICC-ID, and etc.) has been being leaked outwards, and phone misuse means that the interface of the smart device has been manipulated in a wrong way, including telephony services, background recording of audio and video, socket API, and accessing the list of installed applications.

2.4.2 Analyzing Behaviors of Mobile Applications

There are two main approaches to analyze the behaviors of applications: the dynamic approach and the static approach, where the dynamic approach is used to perform the analysis through executing the application and the static approach is used to analyze the source or binary code of the applications without executions.

Enck, Gilbert, Han, Tendulkar, Chun, Cox, Jung, McDaniel, and Sheth [36] proposed a dynamic approach, TaintDroid for tracking and analyzing application behaviors in An-droid applications. TaintDroid automatically labels the privacy and sensitive data and tracks labels along with the propagation of the data through program variables, files, and interprocess messages. When the labeled data are transmitted via the Internet, or left to

the system, TaintDroid would log the data’s labels, the responsible application, and the destination where the data ate transmitted. It also produces real-time feedback to user so that user could know the security status on time. TaintDroid is limited to track data flows but not control flows.

Mann and Starostin [37] used static analysis to detect leakage of privacy data in Android applications. They sorted out private information into five categories: ”location data”, ”unique identifiers”, ”call state”, ”authentication data”, and ”contact and calendar data”. Their framework put signatures on the methods and parameters of the method which could be used to extract and transmit users private information private information.

The framework was also restricted to track explicit information flow merely. The leakage of user’s privacy data via implicit methods still could not be tracked.

Egele, Kruegel, Kirda, and Vigna [38] presented PiOS, the first static binary analysis that could analyze iOS applications, and automatically determine if these applications would leak user’s private data. They decrypted binary of iOS applications, disassembled the binary to initialize the control flow graph of method calls of the binary, detected data flow analysis and found out potential privacy leaks. They evaluated the approach against more than 1,400 iPhone applications. This work briefly showed the feasibility of binary analysis for iOS applications.

Werthmann, Hund, Davi, Sadeghi, and Holz [39] proposed PSiOS, a framework for privacy data security. They constructed a protection layer between iOS runtime environ-ment and the applications with MoCFI [40], which is a tool for iOS control flow integrity on the iOS platform.

Deng, Saltaformaggio, Zhang, and Xu [15] presented a hybrid approach that combines static and dynamic analyses. Their tool iRiS is capable of checking the abuses in pri-vate API in iOS applications. They used IDA Pro disassembler to generate control flow graph (CFG) on the target application, and then performed further analysis on the CFG.

Once the static approach could not resolve call targets, they went on to use the dynamic approach to analyze the targets. This is done by extending the Valgrind [41].

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

We adopted the static approach presented in [42] to analyze the behaviors of iOS mobile apps.

The proposed risk evaluation techniques are orthogonal to behavior analysis techniques and can be integrated with other analysis tools as well for systematic recommendation.

3 Methodology

3.1 System Architecture

The architecture of AppReco is shown in Fig. 7. Before rating and recommending ap-plications to the users, we could divide architecture into three parts: The first part, behavior analysis is from collecting application to checking application performing a cer-tain behavior. It concer-tains decrypting applications, disassembling applications, sensitive behavior lists, private API list, checking applications perform behaviors; The second part, application clustering is from gathering description to clustering applications with GH-SOM clustering algorithm. It contains description preprocess, topic analysis with Latent Dirichlet Allocation, GHSOM clustering. The last part, rating sums the results before and gets recommendation apps. It contains our rating formula and computation processing.

We would elaborate the three parts mentioned above in the following sections.

相關文件