研究上,為了找出最適用的過濾法則,我們主要著重在法則精簡、法則擴充、法 則排序上;法則精簡、擴充後再加上法則排序的優先順序決定了其效能的好壞。而法 則衝突上,利用相關權重分析其從屬關係來加重判定郵件的結果。
5.1. 研究結果分析
現行垃圾郵件的過濾模式多採用數個過濾方式同時進行過濾以期提高過濾效果,
但過多的過濾方式反而會降低過濾效能。我們利用統計分析的方式在系統的眾多過濾 法則中先行篩選出適當的法則作過濾,以期利用較少的過濾法則提高過濾效能兼具提 高效果。
法則的前端管理以挑選出適當的法則,系統在經過精簡、擴充、排序過後,選擇 出唯一或是數個最接近使用者需要的過濾模式,在效能或是效果上都是最接近正確率 的。因此,利用統計方式分析出的最佳化過濾法則,會比多重法則判斷得到更佳的結 果。
就效能來說:
單一法則≥最佳化法則>多重法則。
就效果來說:
最佳化法則>多重法則>單一法則。
5.2. 未來展望
在目前的過濾軟體多數使用多重法則的情況下,勢必出現法則衝突的狀況,我們 試圖利用統計分析兩法則之間的從屬關係,藉由提高其判斷結果的可信度,也就是解 決多重法則時出現灰名單的問題。相信各種的過濾系統,接下來都會面臨相同的問題。
本研究中,相較之前文獻多著重在過濾器技術的進步,雖然其結果都宣稱可以達 到9 成以上的過濾效果,但就其使用者的差異性,效果其實沒有辦法滿足每個使用者 的需求。
因此,我們在過濾法則之前先透過統計檢定模式,分析使用者的對於過濾的系統 喜好差異,在法則管理上作精簡法則與擴充法則,就其差異性篩選出適當的法則,讓 使用者找出比較適當的法則,也同時提升過濾器的效能,減少過濾時間上和成本上無 謂的浪費。經過多個法則判定後,我們再利用統計分析模式,依各種變數的權重,分 析其效益,決策的結果勢必比只單靠法則作決策還要佳。
參考文獻
[1] 吳文鋒,”中文郵件分類器之設計與實作”,碩士論文,逢甲大學資訊工程系研究 所,2001。
[2] 奇摩知識網,http://tw.knowledge.yahoo.com/
[3] 劉鼎康,”使用類神經網路進行垃圾郵件過濾之研究”,碩士論文,中原大學資訊 管理學系,2005。
[4] 賴谷鑫、周照偉、陳嘉玫, "結合資料探勘與統計檢定之垃圾郵件過濾器之研究,"
電子商務學報, 已接受, 2009。
[5] 鍾樹人譯, “終結垃圾信:對於防堵垃圾郵件,我們能夠做些什麼?”,科學人雜 誌,May.2005。
[6] 闕豪恩,”模糊相關應用於文件多重分類問題”,碩士論文,淡江大學資訊工程系 研究所,2000。
[7] 羅華強,”類神經網路:Matlab 的應用”,清蔚科技出版,2001。
[8] B. Yu and Z. B. Xu, “A comparative study for content-based dynamic spam classification using four machine learning algorithms”, Knowledge-Based Systems, vol.21, pp.355-362, 2008.
[9] C. C. Wang and S. Y. Chen, “Using header session messages to anti-spamming”, Computers & Security, vol. 26, pp.381-390, Aug. 2007.
[10] C. P. Wei, H. C. Chen and T. H. Cheng, “Effective spam filtering A single-class learning and ensemble approach”, Decision Support Systems, vol.45, pp.491-503, Jun.
2008.
[11] D. H. Shih, H. S. Chiang and B. Lin, ”Collaborative spam filtering with heterogeneous agents”, Expert Systems with Applications, vol.35, pp.1555-1566, Nov 2008.
[12] D. Puniškis and R. Laurutis, “Artificial Intelligence for Greylisting Anti-spam”, Electronics and Electrical Engineering, vol.5, 2008.
[13] G. Robinson,”A statistical approach to the spam problem”, Linux Journal vol.2003, p.3, Mar. 2003.
[14] G. Schryen,”The impact that placing email addresses on the Internet has on the receipt of spam- An empirical analysis”, Computers & Security,vol.26, pp.361-372, Aug.
2007.
[15] H. Drucker, D. Wu and V.N. Vapnik,” Support vector machines for spam categorization”, IEEE Transactions on Neural Networks, pp.1048-1054,1999.
[16] http://www.bizjournals.com [17] http://www.symantec.com
[18] I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos and P.
Stamatopoulos,” Learning to filter spam e-mail: a comparison of a Naive Bayesian and a memory-based approach”, In 4th PKDD’s Workshop on Machine Learning and Textual Information Access, pp. 1-13, 2000.
[19] J. Clark, I. Koprinska and J. Poon,” A neural network based approach to automated e-mail classification”, IEEE/WIC International Conference on Web Intelligence, pp.702 -705, 2003.
[20] J. Goodman, G. V. Cormack and D. Heckerman, “Spam and the ongoing battle for the inbox”, Communications of the ACM, vol.50, pp.24-33, 2007.
[21] J. Kim, K. Chung, K. Choi,” Spam Filtering With Dynamically Updated URL Statistics”IEEE Security and Privacy, vol. 5, no. 4, pp. 33-39, July/Aug. 2007.
[22] J. Quittek, S. Niccolini, S. Tartarelli and R. Schlegel, “Prevention of spam over IP telephony (SPIT)”, NEC Technical Journal, vol.1, 2006.
[23] K. Li and H. Huang, “ An architecture of active learning SVMs for spam”, 6th
[24] K. R. Gee, “ Using latent semantic indexing to filter spam”, Proceedings of the 2003 ACM symposium on Applied computing, pp.460-464, 2003.
[25] M. C. Chen, L. S. Chen, C. C. Hsu and W. R. Zeng,“An information granulation based data mining approach for classifying imbalanced data” Information Sciences: an International Journal, vol.178 , pp. 3214-3227, Aug. 2008.
[26] M. R. Islam, W. Zhou, M. Guo, Y. Xiang, “An innovative analyser for multi-cla sifier e-mail classification based on grey list analysis”,Journal of Network and Computer Applications vol.32 pp. 357–366,2009.
[27] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian approach to filtering junk e-mail”, In Proceedings of Workshop on Learning for Text Categorization,1998.
[28] M. Woitaszek, M. Shaaban and R. Czernikowski, “Identifying junk electronic mail in Microsoft outlook with a support vector machine”, Symposium on Applications and the Internet, pp.66 -169,2003.
[29] N. Jamali and H. Geng, “A mailbox ownership based mechanism for curbingspam”, Computer Communications, vol.31, pp.3586-3593, Sep. 2008.
[30] S.J. Delanya, P. Cunninghamb, A. Tsymbalb, L. Coyle, “A case-based technique for tracking concept drift in spam filtering, Knowledge-Based Systems” pp.187-195,2005.
[31] T. Bass and G. Watt, “A simple framework for filtering queued SMTP mail” (cyberwar countermeasures), IEEE Military Communications Conference, pp.1140-1144,1997.
[32] W. Zhao and Y. Zhu, “An Email Classification Scheme Based on Decision-Theoretic Rough Set Theory and Analysis of Email Security” IEEE TENCON, pp.1-6,2005.
[33] W. Zhao, and Z. Zhang “ An email classification model based on rough set theory”
Proceedings of the International Conference on Active Media Technology, pp.403-408,2005.
[34] Wikipedia: http://zh.wikipedia.org/wiki/SMTP
[35] X. Carreras, L. Marquez, “Boosting Trees for Anti-Spam Email Filtering” 4th International Conference on Recent Advances in Natural Language Processing,2001.
[36] X. Carreras, L. Marquez,” Boosting Trees for Anti-Spam Email Filtering”, 4th International Conference on Recent Advances in Natural Language Processing, 2001.
[37] X. Yue, A. Abraham, Z. X. Chi, Y. Y. Hao,H. Mo, “Artificial immune system inspired behavior-based anti-spam filter”, Soft Computing, vol.11, pp.729-740, Jun. 2007.
[38] Y. Xie, F. Yu, K. Achan, E. Gillum, M. Goldszmidt, and T. Wobber , “How dynamic are IP addresses”, Sigcomm, pp.27-31, Aug. 2007.
[39] Z. Jorgensen,Y. Zhou and M.Inge,”A Multiple Instance Learning Strategy forCombating Good Word Attacks on Spam Filters”, The Journal of MachineLearning Research, vol. 9, pp.1115-1146 ,Jun. 2008.