• 沒有找到結果。

原始流量資料轉換(Raw Traffic Transformation)

這邊的原始流量資料轉換,即是將連線流量的 Raw Data,轉換成 Netflow 的 格式,再對個別的 Netflow,提取出他們的特徵資料,接著以這些特徵資料作為

25

機器學習的模型建立之用,而如何轉換並提取出 Netflow 之特徵值,就是本節的 主要內容。

5.3.1 網路流(Netflow)

Netflow 是由 Cisco 所定義的格式,是將在某段時間內,擁有相同連線 狀態如 IP、Port、Protocol 的所有連線資料統整,將其視為一個 Flow,比起 鬆散零碎的封包資料,更方便對一些服務或特定用途之連線做到控管以及處 理之用,本研究所採用的是 Netflow V5 的格式,其 Flow 由以下七個連線資 料來做處理。

1. Ingress interface (SNMP ifIndex) 2. Source IP address

3. Destination IP address 4. IP protocol

5. Source port for UDP or TCP, 0 for other protocols

6. Destination port for UDP or TCP, type and code for ICMP, or 0 for other protocols

7. IP Type of Service

而在將連線資料處理後,Version 5 的 Header 及 Format 如圖 19 所示

26

圖 19 Cisco 所定義的 Netflow Version5 format(來源:[21] )

5.3.2 Softflowd 工具

Softflowd[22] 此一開放原始碼工具其作用為將由測錄工具所紀錄之原 始網路流量資料,如 nfdump、pcap 檔案格式之資料,轉換為 Netflow 之資 料型態,而其轉換的依據,則是根據 Cisco 的 netflow 定義,並依照其格式 將資料匯出,轉換前及轉換後的資料如以下圖示。

圖 20 tcpdump file 之 raw data

27

圖 21 經由 softflowd 轉換後之 netflow 資料

5.3.3 將 DARPA 之原始流量資料進行特徵擷取

本研究中,利用了 Softflowd 對原始連線資料進行 Netflow 處理,而配 合修改其原始碼,將可以一併在處理為 Netflow 時,將個別 Netflow 之特徵 匯出成特徵資料集,將整理後的 Netflow 資料傾印出其特徵資料,如 Packet 數、Bytes 大小等等特徵資料,皆可在 softflowd 對個別封包連線進行過濾處 理時,一併處理匯出。另外為了配合建立第四章所提的 HMM,在匯出特徵 值時,也特地將 Netflow 中,每一個 Packet 之大小及 Internal Arrival Time(IAT) 也一併匯出,作為連續性之特徵作為建立 HMM 所用。

圖 22 Softflowd 中之部分新增程式碼

28

圖 23 使用修改後之 Softflowd 所印出之 flow 及其特徵資料集

5.4. 特徵(Feature)

特徵值是用來提供機器學習演算法建立模型之用的數據,而在本研究中主要 有兩組的特徵值,一組為 NSL-KDD 原先即定義好之特徵,另一組則為自行定義 的 DARPA 特徵值,由於 DARPA 僅為原始的流量連線資料,在轉換為 netflow 後,我們參考了其他資料集的特徵並自行定義了 DARPA 的特徵,除了基本的 Packets、Bytes、Protocol 等等,另外也加入了 Time-based 及 Connection-based 的 Feature,用來判斷過去某段時間或一定數量的連線數時,重複的 IP 及 Port 有哪 些,最後,也為了 HMM 建立模型所用,定義了連續性之 Packet Feature,匯出 一個 Flow 中每個 Packet 大小及 Internal Arrival Time(IAT)。

5.4.1 NSL-KDD

根據官方所給的特徵,可分為三種主要類別,分別為 Basic feature、

Content features 以及 Traffic features,其特徵名稱及描述如下。

feature name description type

duration length (number of seconds) of the connection continuous protocol_type type of the protocol, e.g. tcp, udp, etc. discrete service network service on the destination, e.g., http, discrete

29

telnet, etc.

src_bytes number of data bytes from source to

destination continuous dst_bytes number of data bytes from destination to

source continuous

flag normal or error status of the connection discrete land 1 if connection is from/to the same host/port; 0

otherwise discrete

wrong_fragment number of ``wrong'' fragments continuous urgent number of urgent packets continuous

表 4 Basic features

feature name description type

hot number of ``hot'' indicators continuous num_failed_logins number of failed login attempts continuous logged_in 1 if successfully logged in; 0 otherwise discrete num_compromised number of ``compromised'' conditions continuous root_shell 1 if root shell is obtained; 0 otherwise discrete su_attempted 1 if ``su root'' command attempted; 0

otherwise discrete num_root number of ``root'' accesses continuous num_file_creations number of file creation operations continuous num_shells number of shell prompts continuous num_access_files number of operations on access control

files continuous

num_outbound_cmds number of outbound commands in an ftp

session continuous

is_hot_login 1 if the login belongs to the ``hot'' list; 0

otherwise discrete is_guest_login 1 if the login is a ``guest''login; 0

otherwise discrete 表 5 Content features

30

feature name description type

count

number of connections to the same host as the current connection in the past two seconds

continuous

serror_rate % of connections that have ``SYN'' errors continuous rerror_rate % of connections that have ``REJ'' errors continuous same_srv_rate % of connections to the same service continuous diff_srv_rate % of connections to different services continuous

srv_count

number of connections to the same service as the current connection in the past two seconds

continuous

srv_serror_rate % of connections that have ``SYN'' errors continuous srv_rerror_rate % of connections that have ``REJ'' errors continuous srv_diff_host_rate % of connections to different hosts continuous

表 6 Traffic features

5.4.2 DARPA

DARPA 資料集只是單純的原始連線流量資料,本研究參考 NDS-KDD 以及 P.Gogoi[23] 的研究中所定義的特徵,在這些原始的流量資料轉換為網 路流(Network Flow)後,對每個樣本處理出下表所列的特徵值。

其中比較特別的是從編號 30 之後的特徵值,為該 Netflow 中,第一個 封包的大小、內部到達時間,接著是第二個封包的大小、內部到達時間,依 序直到第 n 個封包為止,此連續性的封包特徵是作為第四章的隱藏馬可夫模 型建立之用。

No. Feature name Description Type 1 duration length of the flow continuous 2 protocol-type type of the protocol, e.g. tcp, udp, etc. discrete 3 src-packets number of packets from source to destination continuous 4 dst-packets number of packets from destination to source continuous

31

5 #pkts number of all packets continuous 6 #pkt-p number of all packets with payload continuous 7 src-byte number of data bytes from source to

destination

continuous

8 dst-byte number of data bytes from destination to source

continuous

9 bytes number of all data bytes continuous 10 pay _bytes number of all payload bytes continuous 11 maxsz maximum size of all packets continuous 12 minsz minimum size of all packets continuous 13 avgsz average size of all packets continuous 14 stdsz standard Deviation of all packets continuous 15 maxpy maximum size of all payload continuous 16 minpy minimum size of all payload continuous 17 avgpy average size of all payload continuous 18 stdpy standard Deviation of all payload continuous 19 avgIAT average IAT of all packets continuous 20 src_flag connection flags from source to destination continuous 21 dst _flag connection flags from destination to source continuous 22 srcip-time number of flows from the same src IP in the

last T sec

continuous

23 dstip-time number of flows to the same dst IP in the last T sec

continuous

24 srcport-time number of flows from the same src port in the last T sec

continuous

25 dstport-time number of flows to the same dst port in the last T sec

continuous

26 srcip-conn number of flows from the same src IP in the last N flows

continuous

27 dstip-conn number of flows to the same dst IP in the last N flows

continuous

28 srcport-conn number of flows from the same src port in the last N flows

continuous

32

29 dstport-conn number of flows to the same dst port in the last N flows

continuous

30 szX1 size of X1 packet continuous

Weka 為 University of Waikato 的 Machine Learning Group 所開發的開放 原始碼軟體,以 Java 實作了完整的機器學習方法,包含有 Data 的

Pre-Processing、 Classification, Regression, Clustering, Association rules, Visualization 等各類的功能及演算法實作,除了提供給一般使用者的圖形介 面及命令列模式,更提供該函式庫的 Java API 作為自行撰寫程式碼之用,另 由於 Weka 為開放原始碼軟體,若使用者有自身的任何需求,也可透過自行 修改其原始碼,來作為編寫開發機器學習演算法之用。

圖 24 weka 之標誌

相關文件