More Experiments - 智慧型手機使用模式之探勘

CHAPTER 4 EXPERIMENT

4.4 More Experiments

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Table XIV. Pattern set 2.

Camera  Facebook 30 Seconds Support: 153/25880 Facebook  Camera 180 Seconds Support: 199/25880

4.4 More Experiments

The method we presented in 4.3 appears promising, and is capable of mining stronger rules with shorter time constraints. However, we wish to distinguish our approach from the approach of simply applying the time constraints during the phase of generating sessions. As explained in 4.2.1, we divide our raw data into sessions based on the timestamp between log records. Therefore, if one changes the dividing threshold from 10 minutes to a shorter interval, he or she may also claim that this implies a stronger association between the applications used.

In order to compare the difference between these approaches, we present some statistics and discussions as follows.

We generated a new session data based on a dividing threshold of 30 seconds between log records, and obtained 83158 sessions. Then we apply PrefixSpan to this session dataset, with a minimum support of 0.5%, to gather a rule set of 42 rules with 30 length-1 rules and 12 length-2 rules. On the other hand, using the same minimum support on the original session data with our modified PrefixSpan and a 30 seconds time constraint, gives us a rule set of 79 rules with 30 length-1 rules and 49 length-2 rules. The amount of rules generated is significantly different. The reason of the difference is mainly resulted from the extra sessions, a minimum support value of 0.5% means differently in two session data. Moreover, sequences in the new session data are shorter, which also caused the fewer longer rules.

Since our purpose of applying time constraints on the original session data is to gather stronger rules, we need a common ground to perform such comparison. That is, before

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

carrying out analysis and discussion, in order to say that some rules are stronger than others, the time constraints need to be applied on the same session data, with the same minimum support. Due to these reasons, simply changing the dividing threshold during the phase of generating session data does not give us rules with the same meaning.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

CHAPTER 5 RESULT ANALYSIS

5.1 Analysis of Rule Sets - PrefixSpan

Inspired by the idea mentioned in the paper [25], here we focus on the results about the relationships between Facebook, Camera, and Album. We get results regarding these three applications. In the paper [25], the authors provide statistics on photo-sharing behavior and discuss whether a user uses Facebook right after he or she used Camera.

Table XV. Rules generated by PrefixSpan algorithm.

Rule set 1:

Facebook  Camera SUP:0.012

Camera  Facebook SUP:0.016

Rule set 2:

Album  Camera SUP:0.010

Camera  Album SUP:0.021

Rule set 3:

Camera  Facebook  Camera SUP:0.005

Camera  Album  Camera SUP:0.008

Camera  Album  Album SUP:0.007

Album  Camera  Album SUP:0.007

Camera  Album  Camera  Album SUP:0.005

Rule set 4:

Facebook  Camera  Facebook SUP:0.006

Camera  Facebook  Facebook SUP:0.007

‧

The rules generated by method describe in chapter 4.2 are reported in Table XV. From the rule set 1, compared to the number of usages where Facebook is used before Camera is used, we can see that the number of usages where Camera is used before Facebook is used is higher by 33%. Similarly, from the rule set 2, compared to the number of usages where Album is used before Camera is used, we can see that the numbers of usages where Camera is used before Album is used is over two times higher. The rule sets 1 and 2 also imply that after a smartphone user uses Camera, he or she may use Album to check the photo or use Facebook to share the photo on the social website.

The rule set 3 and 4 present more results indicating that using Camera may suggest a smartphone user to do further actions such as viewing or uploading the photo that he or she just took. What is more, we extract partial rules from the rule set 3 and find that some users take more than one photos in a usage session (in a short period of time). Here is a possible scenario,:

A user takes a picture and checks it immediately. He or she is not satisfied with the work, however. So, the user takes one more picture right away. Another usage pattern is that some users are eager to get feedback or to interact with other people after they upload photos to Facebook. From the rule set 4, we find that some users come back to Facebook to browse and read comments given by other people (and stay on Facebook) after they used Camera.

The rule sets can be used to improve user interface and user experience. For example, social applications such as Facebook should emphasize on not only photo-sharing but also photo-taking and photo-editing. On the other hand, the designers of camera applications should seriously consider adding or enhancing functions for better browsing, viewing, and editing photos as well as for better integrating with social web sites. In summary, the rule sets could help the designers bring users better applications, such as “one-stop” applications, by which users could enjoy more functions with fewer switches between applications.

‧

Since we talked about lots of correlations among Camera, Album and Facebook, we do have some evidences that Facebook find these relationships useful. As our raw data is collected during September 2010 to March 2011, we now track updates on Facebook App and find many evidences which can support our results. Facebook enhanced its functions about photo viewing, uploading and commenting in 2011. We also can see that Facebook puts emphasis on user experience of bigger and better photos and more friendly photo operations.

We will show more sample rules in the following tables. Table XVI shows a portion of rules which extract applications that are used after using Facebook. We can see users using different kinds of communication applications including WhatsApp Messenger, MMS, Gtalk after using Facebook. Facebook did release its own Messenger for mobile on August 9, 2011 for iOS and Android.

Table XVI. A portion of rules that starts with Facebook.

Facebook  Plurk SUP:0.01 application- Plurk in Table XVII. Plurk is a free social networking and micro-blogging service that allows users to send updates through short messages or links. According to some survey, during 2012, about 40 Percentage Plurk's traffic comes from Taiwan. Its importance can be

‧

obtained therefrom. However, we discover an interesting trend that many users switched from Plurk to WhatsApp Messenger. This reflects the fact that users gradually turned to use other applications nowadays.

Table XVII. A portion of rules that starts with Plurk.

Plurk  Plurk SUP:0.014

Plurk  Plurk  Plurk SUP:0.006

Plurk  Plurk  Browser SUP:0.006

Plurk  WhatsApp Messenger SUP:0.01

Plurk  WhatsApp Messenger  WhatsApp Messenger SUP:0.006

Plurk  Facebook SUP:0.01

5.2 Analysis of Rule Sets - PrefixSpan with Time Constraint

With the time constraint, we can find stronger time-relevant patterns. For example, we can find applications used within five seconds, shown in Table XVIII. For example, users using Camera then using Album; and users using htcdialer or htccontacts then using Phone. It comes from the fact that, not only these actions (taking photo, dialing number, opening contacts) can be done quickly, but also the actions that follow are considered highly correlated with the prior actions. Moreover, by Table XVIII, we also can see that when user wants make a phone call, he or she may choose to key in the number directly, instead of using the contacts list. This phenomenon can be provided to the smartphone designers, too. The designers can think about how to strengthen the connection between contacts and phone. Otherwise even

‧

number of recipients to make a phone call.

Table XVIII. Sample rules with time constraint set to five seconds.⁵

5 Sec 30 Sec 60 Sec 180 Sec

Camera  Album 153 306 427 292

htcdialer  Phone 1651 2381 2440 2489

htccontacts  Phone 193 389 441 510

When using communication applications such as WhatsApp Messenger and Gtalk, users usually spend 30 to 60 seconds before switching to another application. It is worth mentioning that we can find length-4 pattern consists of all WhatsApp Messenger with time constraint. It reflects users’ high retention on this application. On the other hand, social applications such as Facebook and Plurk take more time to finish. Users stay at this kind of applications for several minutes before they switch to the next application.

Table XIX. Sample rules related to communication and social applications.

5 Sec 30 Sec 60 Sec 180 Sec

WhatsApp Messenger  Album - - 137 161

WhatsApp Messenger WhatsApp Messenger - 215 401 698

WhatsApp Messenger  Facebook - - 165 264

5 From Table XVIII to Table XXI, the statistics corresponding to 5, 30, 60, and 180 seconds represent the numerators over the total number of sessions (25880).

‧

Unexpectedly, mailing tools such as Mail or Gmail are not used for so long. After about one minute, users will likely begin their next usage. Obviously, user use mailing tools in smartphones for receiving and browsing e-mails mostly, or reply briefly at best.

Table XX. Sample rules related to the Gmail application.

5 Sec 30 Sec 60 Sec 180 Sec

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Google Maps they tend to keep using for minutes. This may be explained by the scenario that users would stare at the maps for directions while walking and finding their points of interests, and thus the long usage period of Google Maps. This finding contradicts the common perception that users may switch between browser or communication applications for address information or directions while using Google Maps.

Table XXI. Sample rules related to the Google Maps application.

5 Sec 30 Sec 60 Sec 180 Sec

Google Maps  Google Maps - - - 182

Google Maps  Facebook - - - 136

Google Maps QCustomShortcut - - - 161

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

CHAPTER 6 CONCLUSIONS AND FUTURE WORK

In this thesis, we presented our study on the analysis of a data set that contains log records generated by several smartphone users during a long period of time. We discussed in detail the problem modeling and gave practical information after dealing with the raw data.

Most of all, we used data mining methods to analyze user behaviors on smartphones, and additionally we gave some real examples.

The contribution of the thesis is to introduce the process of smartphone users’ log mining and discover hidden information from large amount of smartphone users’ log data collected by the platform automatically and without strong assumptions. Three implementations are proposed including association rule mining, sequential pattern mining, and sequential pattern mining with time constraint. We show the frequent patterns from the data mining tasks and present useful and interesting analysis about users’ general navigation behavior; we also can reconstruct and restore the users’ usage scenario when using smartphone.

We provided rules and patterns that are extracted from a real data set and could be beneficial to the designers of smartphone applications or user interfaces. Based on our results, the designers of camera applications can consider improving their functions for better browsing, viewing, and editing photos as well as for better integrating with social web sites.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Also, social applications should add more features of photo-taking and photo-editing to enhance its photo-sharing experience. To sum up, the rule sets could help the designers develop better applications, such as “one-stop” applications to the users, by which users could enjoy more functions without having to switch between applications frequently.

However, there are limitations in this thesis due to the nature of the raw data. First, the data was collected during a six-month period, between September 2010 and March 2011.

Though it is enough to do some research topics such as our research issue that analyze the applications user used, it is not enough for doing some research topics such as analyzing the specific user’s particular usage pattern. In addition, the quality of geographic information is not accurate and complete, since the error range did exist in not only 3G systems but also Wi-Fi system. Based on the most realistic way of collecting data, we do not interfere with users when they open the network, nor do we specify which type of positioning systems users choose.

As for our future work, we would like to perform more analyses on the log records to have findings even more beneficial to the designers. We plan to investigate better data mining methods (extended from the existing ones) in order to extract more informative and human-readable rules. On other issues, we can analyze this data based on the user’s own particular usage pattern such as finding the user’s usage pattern from being a new user to the smartphone, to the skilled user and observing his or her usage shift. We also can zoom in or out the granularity of research viewpoint, for macroscopic viewpoint, we can aggregate applications to groups or group labels, and analyze the relationships among groups; for microcosmic viewpoint, we can analyze the behavior of the application.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

REFERENCES

[1] H. Verkasalo. “Analysis of Smartphone User Behavior,” in Mobile Business and 2010 Ninth Global Mobility Roundtable (ICMB-GMR), 2010 Ninth International Conference on. IEEE, June 2010, pp. 258-263.

[2] P. Lei, T. Shen, W. Peng, and I. Su. “Exploring Spatial-Temporal Trajectory Model for Location Prediction,” 12^th IEEE International Conference on Mobile Data Management, 2011.

[3] C. Hung, C. Chang, and W. Peng. “Mining Trajectory Profiles for Discovering User Communities,” ACM LBSN’09, Seattle, WA, USA, November 2009.

[4] C. Hung, W. Peng, and W. Lee. “Clustering and Aggregating clues for trajectories for mining trajetory patterns and routes,” The VLDB Journal, November 2011.

[5] L. Wei, W. Peng, B. Chen, and T. Lin. “PATS: A Framework of Pattern-Aware Trajectory Search,” 11^th IEEE International Conference on Mobile Data Management, 2010.

[6] G. Chittaranjan, J. Blom, and D. Gatica-Perez. “Mining large-scale smartphone data for personality studies,” IEEE International Symposium on Wearable Computers, San Francisco, CA, June 2011.

[7] L. Xie, X. Zhang, J. Seifert, and S. Zhu. “pBMDS: A Behavior-based Malware Detection System for Cellphone Devices,” ACM WiSec’10, Hoboken, NJ, March 2010.

[8] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. “Crowdroid: Behavior-Based Malware Detection System for Android,” ACM SPSM’11, Chicago, IL, October 2011.

[9] O. Franko, and T. Tirrell. “Smartphone App Use Among Medical Providers in ACGME Training Programs,” Journal of Medical Systems, Volume 36 Issue 5, October 2012, pp.

3135-3139.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[10] T. Smura. “Access alternatives to mobile services and content: analysis of handset-based smartphone usage data,” ITS 17^th Biennial Conference, Montreal, Canada, June, 2008.

[11] H. Verkasalo. “Analysis of Smartphone User Behavior,” 2010 Ninth International Conference on Mobile Business / 2010 Ninth Global Mobility Roundtable, Athens, Greece, June, 2010.

[12] Q. Xu, J. Erman, A. Gerber, Z. Mao, J. Pang, and S. Venkataraman. “Identifying diverse usage behaviors of smartphone apps,” IMC’11 Proceedings of the 2011 ACM SIGCOMM Conference on Internet measurement conference, New York, USA, 2011, pp.

329-344.

[13] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govindan, and D. Estrin.

“Diversity in Smartphone Usage,” ACM MobiSys’10, June 2010, San Francisco, CA, June 2010.

[14] J. Kang, S. Seo, and J. Hong. “Usage Pattern Analysis of Smartphones,” Network Operations and Management Symposium (APNOMS), 2011 13^th Asia-Pacific, Taipei, Taiwan, September 2011.

[15] M. Chen, J. Han, and P. S. Yu. “Data Mining: An Overview from a Database Perspective,” IEEE Transaction on Knowledge and Data Engineering, Vol. 8, No. 6, December 1996.

[16] R. Agrawal, T. Imielinski, and A. Swami. “Mining Association Rules between Sets of Items in Large Databases,” ACM SIGMOD, Washington DC, USA, May 1993.

[17] R. Agrawal, and R. Srikant. “Fast Algorithms for Mining Association Rules,” in Proceedings of the 20^th International Conference on Very Large Data Bases (VLDB’94), San Francisco, CA, USA, 1994.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[18] R. Agrawal, and R. Srikant. “Mining Sequential Patterns,” in Proceedings of the 11^th International Conference on Data Engineering (ICDE’95), 1995.

[19] R, Iváncsy, and I. Vajk. “Frequent Pattern Mining in Web Log Data,” Journal of Applied Sciencces at Budapest Tech Hungary, Volume 3 Issue 1, 2006.

[20] N. Mabroukeh, and C. Ezeife. “A Taxonomy of Sequential Pattern Mining Algorithms,”

Journal of ACM Computing Surveys (CSUR), Volume 43 Issue 1, November 2010.

[21] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U.Dayal, and M. Hsu. “PrefixSpan:

Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” in Proceedings of the 2001 International Conference on Data Engineering (ICDE’01), Heidelberg, Germany, April 2001.

[22] T. Rincy. N, and Y. Pandey. “Perfermance Evaluation on State of the Art Sequential Pattern Mining Algorithms,” International Journal of Computer Applications, Volume 65 Number 14, 2013.

[23] J. Chen. “An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining,”

IEEE Transactions on Knowledge and Data Engineering, Volume 22 Issue 7, July 2010.

[24] P. Chen, C. Chen, W. Liao, and T. Li. “A Service Platform for Logging and Analyzing Mobile User Behaviors,” in Proceedings of Edutainment 2011, LNCS 6872, 2011.

[25] P. Chen, H. Wu, C. Hsu, W. Liao, and T. Li. “Logging and Analyzing Mobile User Behaviors,” International Symposium on Cyber Behavior, Taipei, Taiwan, February 2012.

[26] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. “The WEKA Data Mining Software: An Update,” SIGKDD Explorations, Volume 11 Issue 1, 2009.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[27] Viger Philippe Fournier. SPMF- A Sequential Pattern Mining Framework, http://www.philippe-fournier-viger.com/spmg/

[28] Q. Zhao, and S. S. Bhowmick. “Association Rule Mining: A Survey,” Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116, 2003.

[29] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2011.

[30] Q. Zhao, and S. S. Bhowmick. “Sequential Pattern Mining: A Survey,” Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003118, 2003.

‧

Appendix I. Rules generated from PrefixSpan. Min-Sup=0.01.

1 Facebook  Plurk SUP: 0.010046

17 QCustomShortcut  Packageinstaller SUP: 0.014219 18 QCustomShortcut  Privacy Blocker SUP: 0.015997

19 QCustomShortcut  Facebook SUP: 0.017504

20 QCustomShortcut  QCustomShortcut SUP: 0.037674

21 QCustomShortcut  Browser SUP: 0.024575

‧

47 WhatsApp Messenger  WhatsApp Messenger SUP: 0.037558

48 WhatsApp Messenger  Facebook SUP: 0.014606

49 WhatsApp Messenger  Gmail SUP: 0.015533

50 WhatsApp Messenger  android SUP: 0.011669

51 Privacy Blocker  Privacy Blocker SUP: 0.015881 52 Privacy Blocker  QCustomShortcut SUP: 0.015417

53 Privacy Blocker  Browser SUP: 0.010549

‧

69 htcdialer  QCustomShortcut SUP: 0.011476

70 htcdialer  Gmail SUP: 0.01051

76 Packageinstaller  QCustomShortcut SUP: 0.011592

77 Album  Album SUP: 0.011978

78 Album  Camera SUP: 0.010278

79 Mail  Gmail SUP: 0.010317

80 LauncherPro  com.google.android.gsf SUP: 0.010549

81 LauncherPro  LauncherPro SUP: 0.012751

93 QCustomShortcut  QCustomShortcut  QCustomShortcut SUP: 0.017465 94 QCustomShortcut  Browser  QCustomShortcut SUP: 0.011283

95 QCustomShortcut  Browser  Browser SUP: 0.010394

96 Phone  htcdialer  htcdialer SUP: 0.013872

97 Phone  htcdialer  Phone SUP: 0.033462

98 Phone  htccontacts  Phone SUP: 0.013872

‧

101 WhatsApp Messenger  WhatsApp Messenger  WhatsApp Messenger SUP: 0.021484

102 Gtalk  Gtalk  Gtalk SUP: 0.012094

113 QCustomShortcut  QCustomShortcut  QCustomShortcut  QCustomShortcut SUP: 0.010317

114 Phone  htcdialer  htcdialer  Phone SUP: 0.011901

Messenger SUP: 0.014297

119 htcdialer  htcdialer  Phone  Phone SUP: 0.010355

‧

Appendix II. Rules generated from PrefixSpan

with time constraints. Min-Sup=0.005.

‧

131 QCustomShortcut  Packageinstaller  QCustomShortcut - - 155 195 132 QCustomShortcut  QCustomShortcut  Packageinstaller - - - 158 133 QCustomShortcut  QCustomShortcut  QCustomShortcut - - 150 285 134 QCustomShortcut  Browser  QCustomShortcut - - - 145

143 WhatsApp Messenger WhatsApp Messenger WhatsApp Messenger - - - 330

144 Gtalk  Gtalk  Gtalk - - - 149

‧

在文檔中智慧型手機使用模式之探勘 - 政大學術集成 (頁 52-0)