• 沒有找到結果。

3.4 Recommendation

3.4.2 Recommending Based on Ranking Formula

means the number of documents which contain a specific term i and N is the total number of documents. Afterwards, we use logarithm to scale down the number to avoid a huge number. In the meantime, to avoid dfi,d equals 0, which would lead to a division-by-zero, we adjust the denominator to add 1 to the equation.

3.4.2 Recommending Based on Ranking Formula

AppReco calculates the three types of risk by the corresponding categories of behaviors.

Then, it would sum the score of three risks as the total risk. The higher the risk an appli-cation gets, there is more possibility of performing sensitive behavior in the appliappli-cation.

AppReco also highlights applications which are detected the usage of private API. Once an applications private API calls are revealed, it would not recommend it to user. Ap-pReco adjusts Eq.(8) to calculate behavior score. The following formula is our behavior score Eq.(9). First, AppReco reflects app behavior score on how many function calls under the behavior pattern. AppReco uses TF weighting function to distinguish behaviors which take an important role in application, and then it replaces tfi,d with bfb,a. It means behavior frequency, the number of times that a behavior b appears in an application a.

max bfa is the most common behavior times of an application a.

Second, AppReco compares an application’s behavior with other applications in the same cluster as a weighted basis. AppReco applies IDF weighting function to measure whether a behavior is rare or common in the cluster. NC means the number of applications in this cluster, and dfb,C means the number of applications, which behave in the cluster.

We could get the number which presents the behavior risk in the cluster by dividing NC

by dfb,C. If there is a behavior which is seldom adopted in the cluster, it means that the behavior is not really required in the cluster. Thus, an application in the cluster, which executes the seldom-performed behavior, would receive greater number. In contrast, if

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

there are lots of applications practicing the same behavior in the cluster, it means that this behavior is required in the cluster. An application in the cluster performing the common behavior would receive lower number.

Finally, AppReco performs IDF weighting function the second time to check whether the behavior is rare or not. N is the number of applications in AppReco, and dfb,A is the number of applications performing a certain behavior in AppReco. If there is the majority of applications using the behavior in AppReco, it implies that the behavior is widely-used by developers, so the potential risk evaluation would be low. On the contrary, if there is the minority of applications using the behavior, it indicates that the behavior is used by a small number of developers, so the potential risk evaluation would be comparatively high.

4 Evaluation

We have implemented the tool AppReco that integrates our previous work AppBeach [42].

We report some analysis results against iOS 9 applications in this section. The complete analysis result can be found online [46].

4.1 Data Set

We have collected around 6840 iOS applications, covering over twenty genres such as Social Networking, Utilities, Reference, etc. from the official App Store. The complete data can be found in our website [46].

We download these applications from iTunes top free and top grossing charts, that categorize applications. In this way, it is easy to reach famous and popular applications.

For example, ”SoundCloud” is at the top of music genre charts. ”Google Translate” is at the top of reference genre charts. ”Pic Collage” is at the top of photo & videos genre chars. The extracted topics are listed in Table 3.

Table 3: Topic result in iOS 9 ID Assigned Topic Name Most Representative Words

1 Information information, provide, service, user, market 2 Music music, sound, play, song, feature

3 Design color, use, create, design, screen 4 Sport live, sport, event, app, team

5 Language word, english, learn, language, chinese 6 Fashion love, friend, fashion, face, beauty 7 Video video, support, watch, tv, view 8 Children Book book, story, child, first, read 9 Children Game fun, game, child, learn, kid 10 Food shop, food, recipe, find, store 11 Life life, name, good, people, day

12 Exercise body, weight, workout, exercise, time 13 Weather weather, time, location, information, day 14 Recommend app, get, make, best, like

15 Game game, can, play, level, get

16 Subscription subscription, issue, purchase, period, account 17 Apple iphone, device, use, ipad, contact

18 Baby application, baby, design, product, group 19 News news, read, content, new, ipad

20 Travel map, travel, car, offline, information 21 Health medical, include, system, care, health 22 Time record, use, time, data, list

23 Finance service, card, provide, information, bank 24 Photo photo, share, file, image, support

25 Free app, com, free, please, http

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Then, we apply GHSOM for a two-layer-clustering. We used τ1 as 0.5 and τ2 as 1 to GHSOM cluster algorithm. Each cluster can be indexed with C-S, where C indicates the cluster in the first layer and S indicates the sub-cluster of the C cluster. After GHSOM processing, we have generated 27 clusters, and 1100 sub-clusters. We apply the static behavior analysis on apps within the same cluster and also rank apps within the same cluster to recommend less risky apps to users.

The following is some clusters including famous applications, that can serve as the examples to demonstrate our analysis results.

相關文件