流量變數效力重要程度評估

為了瞭解本研究中所使用的時間及空間相關變數在對於偵測惡意網站的分類預測的效力，並供日後延續研究時在變數選擇有取捨依據，本研究選用在偵測模組實證分類結果中，分類預測效果最好的時空混合模組資料搭配決策樹分類演算法的組合進行評估，其中時空混合模組包含了本研究使用的所有時間及空間相關變數共計十六個，評估方法則採用各變數的資訊獲利率(Information Gain Ratio)來計算變數分類效力的重要程度。

本研究分別計算十組訓練資料集合的資訊獲利率，依照各個變數名稱分別累加十組資訊獲利率並計算其平均值進行重要程度排名，其排序結果如表 5-6 所示。

表 5-6. 流量變數分類效力程度排序 排名平均資訊獲利率變數名稱

1 0.273 TS _OctetsPerFlows 2 0.247 T_OctetsPerPkts 3 0.202 TS _PktsPerFlows 4 0.169 T_OctetsPerFlows 5 0.160 T_OctetsPerFlows_std 6 0.156 TS _OctetsPerPkts 7 0.138 TS _Flows

8 0.128 T_PktsPerFlows_std 9 0.126 T_Octets

10 0.111 TS _Pkts 11 0.104 TS_Octets 12 0.102 T_PktsPerFlows 13 0.098 T_Session_length 14 0.084 T_Flows

15 0.072 T_Pkts

16 0.015 TS _ActiveIP_ratio

在表 5-6 中，平均資訊獲利率越大者，排名數字越小重要性越高。由表中我們可以發現，平均資訊獲利率 0.15 以上之變數有 6 個，在這些較為重要的變數中皆為封包大小、

封包數量、Flow 數在比例上的相關變數，在此我們推測這與惡意網站雖然頻繁的更換惡意程式，但惡意程式更新列表上的惡意程式在數量、大小上的變動幅度較小，並且每當惡意程式更新行為啟動後，所下載的內容皆會照惡意程式列表所列的內容下載與執行，猶如行為劇本般的固定模式活動有關，使得在惡意程式的更新 Session 中，網路連線品質相近的情況下，惡意網站與受感染電腦間的流量互動上，具有封包大小、封包數

量、Flow 數間的比例及變動幅度趨於相近的行為模式。進一步觀察，平均資訊獲利率在 0.15 以上之變數共有 6 個，其中時間相關變數與空間相關變數各佔 3 個，平均資訊獲利率在 0.1 以上之變數共有 12 個，其中時間相關變數與空間相關變數各佔 6 個，時間與空間兩大類相關變數皆各佔一半，由此可見本研究中所提出的時間與空間兩類相關變數在偵測該類型惡意網站時的重要性，並且其中時間相關變數與空間相關變數所佔的比例一致，所以可得時間及空間兩大類相關變數在本研究中對於惡意網站偵測的分類效力是重要而有效的。

第6章結論與未來研究方向

由於網路技術日新月異，惡意程式應用的攻擊手法也不斷快速地演進，為有效防禦惡意程式的攻擊，研究惡意程式攻擊手法並擬定因應策略是有其必要性的。在本研究中發現並分析新型態的惡意程式攻擊方式，同時依據其特性提出空間及時間區域性參照的概念，利用資料探勘技術建立惡意網站分類偵測模組，經實證評估證實此一概念方法確實能加強分類方法的偵測效能，提高從網路 NetFlow 的流量資料中，偵測發掘潛藏的惡意網站的準確性。

本研究提出了以時間區域性及空間區域性概念建立惡意網站偵測模組，並實際收集了惡意網站以及正常網站的 NetFlow 流量資料，利用資料探勘技術中的貝式分類法、決策樹分類法與支援向量機分類法進行惡意網站網路流量偵測的分類分析研究，研究中分析了空間模組、時間模組與時空混合模組的分類偵測效能，經過實際驗證後所得到的數據顯示，在三種分類法中，各分類偵測模組間分類效能的相對趨勢是一致的，以決策樹分類法為例，在時空混合模組的分類效能表現最佳，正確率高達 90%，比對照組模組高出了近 17 個百分點；支援向量機分類法的分類效能表現次之，正確率約為 81%，比對照組模組高出近 25 個百分點；而貝式分類法的表現較差，且各模組間無明顯差異。依此可歸納出決策樹分類法的分類偵測效能最好、支援向量機分類法次之，而貝式分類法表現最差，依貝氏分類法與其他兩種分類法的特徵差異推測，貝氏分類法各變數間互相獨立且權重比例相同的假設不適用於 NetFlow 流量分類偵測。在分類模組的效能表現部分在各分類法中亦有相同趨勢，即時空混合模組效能最高、時間模組效能次之、空間模組第三、對照組模組較差。是故傳統 NetFlow 資料探勘方式加上適當之時間性與空間性變數有助於提高偵測的效率。

本研究雖然發展了基於 NetFlow 流量資料的時間與空間概念分類偵測模組，建立了隱藏之惡意網站偵測機制，但其研究過程中仍有許多的限制，本研究提出以下可供後續研究參考的未來研究方向：(1)本研究的實證結果顯示，空間模組的預測效能改善並不如預期，推測其主要原因可能在空間模組的網路空間區域範圍定義的假設還有修正空間，

未來延續研究時應多方蒐集詳細的網段切割資訊後加以修正，再針對子網路分佈情形進行空間區域性變數彙整資料，再來重新評估此一因素對空間模組的偵測效能之影響。(2)

本研究僅針對惡意網站的流量進行偵測，但本研究的主要研究成果在於對 NetFlow 流量資料進行時間性與空間性的分析偵測，故舉凡具有時間與空間行為特性的流量資訊皆可利用本研究的方法進行相關研究。(3)可利用本研究所建立的隱藏之惡意網站偵測機制，

作為未來發展網路安全的自動化防禦及反制機制的基礎，例如以誘捕技術(Honeynet)自動化尋找入侵者的行為特徵，結合本研究提出的偵測機制進行防禦與反制。

參考文獻

1. Bernardo, J.M., Smith A.F.M. (2001), “Bayesian theory”, Measurement Science and Technology.

2. Boser, B.E., Guyon, I.M., Vapnik, V.N. (1992), “A training algorithm for optimal margin classifiers”, Proceedings of the fifth annual workshop on Computational learning theory, ACM New York, NY, USA, pp.144-152.

3. CA, Inc. Global Security Advisor Team (2008), “2008 Internet Security Outlook”, CA White Paper.

4. CA, Inc. Security Advisor Team (2007), “2007 Internet Security Outlook”, CA White Paper.

5. Cisco NetFlow (2007), “NetFlow Services Solutions Guide”, Cisco System, Inc., http://www.cisco.com/en/US/docs/ios/solutions_docs/netflow/nfwhite.html.

6. Cortes C., Vapnik V. N. (1995), “Support vector networks,” Machine Learning, vol. 20, pp.273-297.

7. Elson, D. (2000), “Intrusion Detection, Theory and Practice”, Security Focus.

8. Fullmer, M., Romig, S. (2000), “The osu flow-tools package and cisco netflow logs”, In Proceedings of USENIX LISA’2000, New Orleans, LA.

9. Gelman, A. (2004), "Bayesian data analysis", CRC press.

10. Gordon, L.A., Loeb, M.P., Lucyshyn, W., and Richardson, R. (2006), “2006 CSI/FBI Computer Crime and Security Survey”, Computer Security Journal.

11. Han, J., Kamber, M. (2006), “Data Mining: Concepts and Techniques”, Second Edition, Morgan Kaufmann Publishers.

12. Ianelli, N., &Hackworth, A. (2005), “Botnets as a vehicle for a online crime”, CERT/CC, http://www.cert.org/archive/pdf/Botnets.pdf

13. Jeffrey, C.M. (1992), “Network locality at the scale of processes”, ACM Transactions on Computer Systems (TOCS), Vol. 10, Issue 2, pp.91-109.

14. Mauro, D.R., Schmidt, K.J. (2001), “Essential SNMP”, O'Reilly.

15. Pfleeger, C.P., & Pfleeger, S.L. (2003), “Security in computing”, Third Edition, Upper Saddle River, NJ, Prentice Hall PTR.

16. Postel, J. (1981), “Internet protocol”, RFC 791.

17. Provos, N., McNamee, D., Mavrommatis, P., Wang, K., and Modadugu, N. (2007), “The Ghost in the Browser: Analysis of Web-based Malware”, In Proceedings of the 2007

Workshop on Hot Topics in Understanding Botnets (HotBots).

18. Qattan, F., Thernelius, F. (2004), “Deficiencies in Current Software Protection Mechanisms and Alternatives for Securing Computer Integrity”, Master thesis, Department of Computer and Systems Sciences Stockholm University - Royal Institute of Technology.

19. Quinlan, J.R. (1986), “Induction of Decision Trees”, Machine Learning, Vol. 1, pp.81-106.

20. Quinlan, J.R. (1993), “C4.5: Programs for Machine Learning, Morgan Kaufmann”, San Mateo, CA.

21. Rahmani, C., Sharifi, M., and Tafazzoli, T. (2004), “An Experimental Analysis of Proactive Detection of Distributed Denial of Service Attacks,” Proceedings of the IIT Kanpur Hackers’ Workshop 2004 (IITKHACK04).

22. Richard, P. (1999), “1999 CSI/FBI Computer Crime and Security Survey”, Computer Security Journal.

23. Richardson, R. (2007), “2007 CSI Computer Crime and Security Survey”, Computer Security Journal.

24. Skoudis E., Zeltser, L., (2003), “Malware: Fighting Malicious Code”, Prentice Hall PTR, Upper Saddle River, NJ, USA.

25. Vapnik V.N. (1999), “The Nature of Statistical Learning Theory”, Second Edition, New York: Springer.

26. W3C (1999), HTML 4.01 Specification, Retrieved May 20, 2009, from http://www.w3.org/TR/REC-html40/

27. Wagner, R. (2001), “Address Resolution Protocol Spoofing and Man-in-the-Middle Attacks”, SANS Institute.

28. Williamson, C.L. (1989), “Dynamic transport-level connection management in a distributed system”, In Proceedings of the 14th Conference on Local Computer Networks, Minneapolis, Minn, pp.315-322.

29. Witten, I.H., Frank E. (2005), “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, ISBN 0-12-088407-0.

30. Zimmermann, H. (1980), “OSI reference model--The ISO model of architecture for open systems interconnection”, Communications, IEEE Transactions on [legacy, pre-1988], pp.425-432.

31. 丁一賢、陳牧言 (2006)，「資料探勘」，初版，台中：滄海書局，頁 4-34。

32. 蕭漢威、曾金山、魏志平、楊竹星 (2004)，「以網際網路流量進行網路服務分類預測之研究」，網際網路技術學刊(Journal of Internet Technology)，Vol. 5, No. 1, pp.49-55.

在文檔中以網路資料探勘技術偵測隱藏惡意網站之研究 (頁 47-54)

第6章 結論與未來研究方向

參考文獻

第6章結論與未來研究方向