本研究藉由比較直接隱私保護以及間接隱私保護,探討關聯式規則探勘之後 的隱私。本研究提出的方法架構,可以評估資料可用性的流失以及隱私性的提升 的變化。在實驗分析中,首先以固定最小支持度,加上變化最小可信度,其次以 固定最小可信度,加上變化最小支持度,來評估直接隱私保護和間接隱私保護的 資料可用性的流失以及隱私性的提升的變化。結果顯示,間接隱私保護所得到的 隱私性的提升較高,但也犧牲了較多的資料可用性的流失,而直接隱私保護雖然 有著較低的隱私性的提升,但是保留了較多的資料可用性的流失。
對於k 隱匿增加最小可信度,也會讓關聯式規則探勘後的資料可用性的流失 以及隱私性的提升出現變化,當最小支持度固定在 0.15 的時候,最小可信度值 為0.8~0.85 的時候,k 隱匿的變化最為顯著,可用性的流失以及隱私性的提升皆 為增加,當小支持度固定在0.1 的時候,可用性的流失以及隱私性的提升則是在 最小可信度 0.9 的的時候有最高值,變動最小可信度對於 DCDS、DCIS 的影響 就沒有那麼顯著。
對於k 隱匿以及 DCDS、DCIS 在固定最小可信度,變化最小支持度的時候,
可用性的流失以及隱私性的提升在0.05~0.09 的時候皆是顯著下降,但是 k 隱匿 和 DCDS、DCIS 之間還是有明顯的區隔,k 隱匿之可用性的流失以及隱私性的 提升都是比較高,DCDS、DCIS 之可用性的流失以及隱私性的提升皆比 k 隱匿 低。
未來能夠針對那些使得結果變化較大的可信度以及支持度來探討。也可以朝 著評估更多的保護資料的方法,以及將本研究的方法擴展到更多資料型態。讓使 用者可以在更多的資料形態中,對於可用性的流失與隱私性的提升做取捨,找出 一個資料可用性的流失高,且要保護的隱私也有保護到的方法,找到資料可用性 的流失以及隱私性的提升之間的平衡。
33
參考文獻
[1] Aggarwal, C. C. and Yu, P. S. 2008. Privacy-preserving data mining. Springer.
[2] Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., and Zhu, A. 2006. Achieving anonymity via clustering. In Proc. of ACM SIGMOD conference, 153-162.
[3] Agrawal, R., Imielinski, T., and Sawmi, A. 1993. Mining association rules between sets of items in large databases, In Proc. pf ACM SIGMOD conference, 207-216.
[4] Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proc. of ACM SIGMOD conference, 439-450.
[5] Friedman, A., Wolff, R., and Schuster, A. 2008. Providing k-anonymity in data mining. The VLDB Journal, 17, 4, 789-804.
[6] Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. 2010. Privacy-preserving data publishing: a survey on recent developments. ACM Computing Surveys, 42, 4, 2010.
[7] Ghinita, G., Kalnis, P., and Tao, Y. 2011. Anonymous publication of sensitive transactional data. IEEE Transactions on Knowledge and Data Engineering, 33, 2, 161-174.
[8] He, Y. and Naughton, J.F. 2009. Anonymization of set-valued data via top-down, local generalization. In Proc. of VLDB conference, 934-945.
[9] He, Y., Barman, S., Wang, D., & Naughton, J. F. 2011. On the complexity of privacy-preserving complex event processing, In Proc. of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp.
165-174). ACM.
[10] Hong, T. P., Lin, C. W., and Yang, K. T. 2013. Using TF-IDF to hide sensitive itemsets. Applied Intelligent, 2013, pp. 502-510.
[11] Krause, A., & Horvitz, E. 2008. A Utility-Theoretic Approach to Privacy and Personalization. In AAAI (Vol. 8, pp. 1181-1188).
[12] Li, N., Li, T., and Venkatasubramanian, S. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. of ICDE conference, 106-115.
[13] Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery in Data, 1, 1.
[14] Meyerson, A., and Williames, R. 2004. On the complexity of optimal k-anonymity.
In Proc. of ACM PODS conference, 223-228.
34
[15] Motwani, R. and Nabar, S. U. 2008. Anonymizing unstructured data. arXiv:
0810.5582v2, [cs.DB].
[16] Mir, D. J. 2012. Information-theoretic foundations of differential privacy, International Symposium on Foundations and Practice of Security (pp. 374-381).
Springer Berlin Heidelberg.
[17] Park, H., and Shim, K. 2007. Approximate algorithms for k-anonymity. In Proc.
of ACM SIGMOD conference, 67–78.
[18] Pontikakis, E. D., Tsitsonis, A. A., & Verykios, V. S. 2004. An experimental study of distortion-based techniques for association rule hiding. In Research Directions in Data and Applications Security XVIII (pp. 325-339). Springer US.
[19] Samarati, P. and Sweeny, L. 1998. Generalizing data to provide anonymity when disclosing information. In Proc. of ACM Symposium on Principles of Database Systems, 188.
[20] Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13, 6, 1010-1027.
[21] Sramka, M., Reihaneh, S. N., Denzinger, J., and Askari, M. 2010. A practice-oriented framework for measuring privacy and utility in data sanitization systems.
In Proc. of the EDBT/ICDT Workshops, Article No. 27.
[22] Sankar, L., Rajagopalan, S. R., & Poor, H. V. , 2010. A theory of utility and privacy of data sources. In ISIT (pp. 2642-2646).
[23] Sankar, L., Rajagopalan, S. R., & Poor, H. V. 2010. An information-theoretic approach to privacy, 2010 48th Annual Allerton Conference on (pp. 1220-1227).
IEEE.
[24] Sweeny, L. 2002. K-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10, 5, 557–570.
[25] Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10, 5, 571-588.
[26] Verykios, V. S., Elmagarmid, A., Bertino, E., Saygin, Y., and Dasseni, E. 2004.
Association rules hiding, IEEE Transactions on Knowledge and Data Engineering, 16, 4, 434-447.
[27] Wang, S. L., Patel, D., Jafari, A., and Hong, T. P. 2007. Hiding collaborative recommendation association rules, Applied Intelligent, 27, 1, 67-77.
[28] Wang, S. L., Tsai, Y. C., Kao, H. Y., and Hong, T. P. 2014. On Anonymizing Transactions with Sensitive Items, Applied Intelligent, 41, 4, 1043-1058.
[29] Xia, W., Kantarcioglu, M., Wan, Z., Heatherly R., Vorobeychik, Y., and Malin, 2015. B. Process-driven data privacy, In Proc. of CIKM conference, 1021-1030.
[30] Yates, B. and Neto, R. 1999. Modern Information Retrieval, Addison Wesley.
35
[31] Y. He, S. Barman, D. Wang, and J. Naughton. 2011. On the complexity of privacy-preserving complex event processing, In Proc. of PODS.