7.1 結論
本論文針對標籤資料庫提出一個近似查詢的搜尋系統。本系統使用 [7] 的多
階層式標籤集索引結構對系統標籤資料庫內的資料物件建立索引結構。使用者能
給予一個查詢標籤集以及常數 k 值,找出前 k 名近似查詢標籤集的資料物件。本
論文提出修正傑卡德距離公式及修正重疊距離兩個對標籤集的距離評估方式,並
根據 [7] 索引結構標籤集聚落的特徵,推導採用不同距離評估方式時和查詢標籤
集合之距離上下限值的的估算方法。
本論文 Top-k 近似查詢的方法使用者不用輸入門檻值來篩選查詢結果,改採
用輸入一個常數 k 值,系統內部會動態設定一個門檻值 δd來對索引結構內的資
料物件進行篩選,直到查詢結果中的資料物件個數大於等於 k ,再輸出前 k 個資
料物件。
實驗結果針對原型以及修正的距離評估公式來進行分析評估,試著去分析各
個距離評估方式的優缺點。此外本論文使用的 Top-k 近似查詢處理方法在實際標
籤資料庫中進行前 k 名的查詢,與基本方法相比可有效提高執行效率。
7.2 未來展望
由於本論文進行 Top-k 近似查詢方法時,篩選出的資料物件個數如果過多會
增加計算成本,如果能針對此索引結構的群集 ( 或批次集) 邊界估算方式修改,
或是另外推導其他搭配此索引結構的距離估算方式,可以使群集 ( 或批次集) 的
估算範圍能夠更精確,必能夠再提高搜尋的效率。
由於本論文針對的對象為標籤集合,未來亦可擴展此方法針對範例查詢,使
用多個物件的標籤來當做查詢標籤集合,使查詢標籤集合內的標籤具有權重值大
小的區別,更能反應使用者的查詢意圖以求搜尋品質之提昇。
參考文獻
[1] J.-C. Chuang, C.-W. Cho, and A. L.-P. Chen. Similarity search in transaction databases with a two-level bounding mechanism. In M. Lee, K.-L. Tan, and V. Wu-wongse, editors, Database Systems for Advanced Applications, volume 3882 of Lecture Notes in Computer Science, pages 572--586. Springer Berlin Heidelberg, 2006.
[2] B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang. Optimizing index for taxonomy keyword search. In Proceedings of the 2012 ACM SIGMOD International Con-ference on Management of Data, SIGMOD '12, pages 493--504, New York, NY, USA, 2012. ACM.
[3] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems.
J. Inf. Sci., 32(2):198--208, April 2006.
[4] M. Gupta, R. Li, Z. Yin, and J. Han. Survey on social tagging techniques. SIGKDD Explor. Newsl., 12(1):58--72, November 2010.
[5] C.-C. Hsieh and J. Cho. Finding similar items by leveraging social tag clouds. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12, pages 644--651, New York, NY, USA, 2012. ACM.
[6] J.-L. Koh, G.-T. Chiang, and I.-C. Chiu. The strategies for supporting query special-ization and query generalspecial-ization in social tagging system. Proceedings of the 4th International Workshop on Social Networks and Social Web Mining(SNSM), 2013.
[7] J.-L. Koh, N. Shongwe, and C.-W. Cho. A multi-level hierarchical index structure for supporting efficient similarity search on tag sets. 2012 Sixth International Con-ference on Research Challenges in Information Science (RCIS), pages 1--12, May 2012.
[8] K.-P. Lee, H.-G. Kim, and H.-J. Kim. A social inverted index for social-tagging-based information retrieval. J. Inf. Sci., 38(4):313--332, August 2012.
[9] J. I. Lopez-Veyna, V. J. Sosa-Sosa, and I. Lopez-Arevalo. Kesosd: keyword search over structured data. In Proceedings of the Third International Workshop on
Key-word Search on Structured Data, KEYS '12, pages 23--31, New York, NY, USA, 2012. ACM.
[10] G. S. Manku, A. Jain, and A. Das Sarma. Detecting near-duplicates for web crawl-ing. In Proceedings of the 16th international conference on World Wide Web, WWW '07, pages 141--150, New York, NY, USA, 2007. ACM.
[11] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluat-ing similarity measures for emergent semantics of social taggEvaluat-ing. In ProceedEvaluat-ings of the 18th international conference on World wide web, WWW '09, pages 641--650, New York, NY, USA, 2009. ACM.
[12] A. Mathes. Folksonomies - cooperative classification and communication through shared metadata. http:// www.adammathes.com/ academic/ computer-mediated-communication/folksonomies.html, 2004.
[13] C. Ordonez, E. Omiecinski, and N. Ezquerra. A fast algorithm to cluster high di-mensional basket data. Proceedings 2001 IEEE International Conference on Data Mining, pages 633--636, 2001.
[14] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J. X. Parreira, and G. Weikum. Efficient top-k querying over social-tagging networks. In Proceedings of the 31st annual international ACM SIGIR conference on Research and develop-ment in information retrieval, SIGIR '08, pages 523--530, New York, NY, USA, 2008. ACM.
[15] B. Spell. Java api for wordnet searching. http://lyle.smu.edu/ tspell/jaws/, 2008.
[16] V. Zanardi and L. Capra. Social ranking: uncovering relevant content using tag-based recommender systems. In Proceedings of the 2008 ACM conference on Rec-ommender systems, RecSys '08, pages 51--58, New York, NY, USA, 2008. ACM.
[17] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv.,