偵測網際網路攻擊之基於熵的網路行為模式建立演算法

(1)

୯ ҥ Ҭ ೯ ε Ꮲ

ႝߞπำᏢس

ᅺ γ ፕ Ў

ୀ ෳ ᆛ ሞ ᆛ ၡ װ ᔐ ϐ ୷ ܭ ⪖ ޑ

ᆛ ၡ Չ ࣁ ኳ Ԅ ࡌ ҥ ᄽ ᆉ ݤ

Entropy-Based Profiling of Network Traffic

for Detection of Security Attacks

ࣴ ز ғǺ Ֆ ߪ ቺ

ࡰᏤ௲௤Ǻ ׵ ำ ፵ റ γ

(2)

ୀෳᆛሞᆛၡװᔐϐ୷ܭ⪖ޑᆛၡՉࣁኳԄࡌҥ

ᄽᆉݤ

Entropy-Based Profiling of Network Traffic

for Detection of Security Attacks

ࣴ ز ғǺ Ֆ ߪ ቺ Student: Jyun-De He

ࡰᏤ௲௤Ǻ ׵ ำ ፵ റ γ Advisor: Dr. Tsern-Huei Lee ୯ ҥ Ҭ ೯ ε Ꮲ

ႝ ߞ π ำ Ꮲ س ᅺ γ ੤ ᅺ γ ፕ Ў

A Thesis

Submitted to Department of Communication Engineering College of Electrical and Computer Engineering

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master of Science in Communication Engineering June 2009 Hsinchu, Taiwan

ύ! ๮! ҇! ୯! ΐ! Μ! Ζ! ԃ! Ϥ! Д

(3)

Chinese Abstract

ୀෳᆛሞᆛၡװᔐϐ୷ܭ⪖ޑᆛၡՉࣁኳԄࡌҥ

ᄽᆉݤ

ࣴ ز ғǺ Ֆ ߪ ቺ ࡰᏤ௲௤Ǻ ׵ ำ ፵ റ γ ୯ҥҬ೯εᏢႝߞπำᏢسᅺγ੤ ύЎᄔा ! ! җܭᆛሞᆛၡޑזೲว৖Ǵ߈ԃٰᆛၡӼӄςԋࣁεৎ܌ᜢݙޑ ЬाሦୱǶࣁΑගϲᆛၡװᔐޑୀෳਏ౗ǴӧԜࣴزύךॺගр୷ܭ ⪖ (Entropy) ޑᆛၡՉࣁኳԄࡌҥᄽᆉݤǶԜᄽᆉݤх֖ٿঁ໘ࢤǺ ಃ΋໘ࢤҞޑࢂǴаس಍ϯޑБԄӃஒ҅தᆛၡՉࣁޑ࠾хǴᙯඤԋ ΋ঁȨ࣬ᔈόዴۓ܄ȩ (Relative Uncertainty) ޑਔ໔ׇӈǴӆ૶ᒵԜ ׇӈޑᐒ౗ϩթ (Probability Distribution)ǹӧಃΒ໘ࢤǴ٬ҔьБ፾ӝ ࡋᔠᡍݤ (Chi-Square Goodness-of-Fit Test) ୀෳ౦தᆛၡՉࣁǴҁ໘ ࢤ཮ᢀෳอයᆛၡՉࣁ܌ࡌҥޑᐒ౗ϩթǴ٠ᆶಃ΋໘ࢤ܌ࡌᄬрޑ ߏයᆛၡՉࣁКၨǴҗܭьБ፾ӝࡋᔠᡍݤࢂໆෳٿঁᐒ౗ϩթৡ౦ ำࡋޑ΋ᅿБݤǴࡺᔈҔԜݤӧ೭ঁ໘ࢤǶനࡕ٬Ҕ KDD CUP 1999 ޑኧᏵٰᡍ᛾ҁࣴز܌ගрϐᄽᆉݤǴჴᡍ่݀ᡉҢԜᄽᆉݤǴӧᒧ ᏷፾྽੝ቻ໣ӝޑ߻ගΠǴёၲډଯྗዴ౗Ϸեीᆉፄᚇࡋޑୀෳ่ ݀Ƕ

(4)

English Abstract

Entropy-Based Profiling of Network Traffic

for Detection of Security Attacks

Student: Jyun-De He Advisor: Dr. Tsern-Huei Lee

Department of Communication Engineering National Chiao Tung University

Abstract

Network security has become a major concern in recent years. In this research, we present an entropy-based network traffic profiling scheme for detecting security attacks. The proposed scheme consists of two stages. The purpose of the first stage is to systematically construct the probability distribution of Relative Uncertainty for normal network traffic behavior. In the second stage, we use the Chi-Square Goodness-of-Fit Test, a calculation that measures the level of difference of two probability distributions, to detect abnormal network activities. The probability distribution of the Relative Uncertainty for short-term network behavior is compared with that of the long-term profile constructed in the first stage. We demonstrate the performance of our proposed scheme for DoS attacks with the dataset derived from KDD CUP 1999. Experimental results show that our proposed scheme achieves high accuracy and low computation complexity if the features are selected appropriately.

(5)

Acknowledgement

ᇞ! ! ᖴ

! ! २ӃाགᖴךޑࡰᏤ௲௤! ׵ำ፵ԴৣǶ೭ٿԃࣴز܌ғࢲύǴ Դৣ๏ϒךᐒ཮ѐ࿶ᐕֹ᏾ޑࣴزၸำǴόჇځྠӦࡰᏤךှ،ୢ ᚒǴᡣךᏢಞډᑞஏޑᡄᒠࡘᒣǵᝄᙣޑࣴزᄊࡋϷ҅ዴޑፕЎኗቪ БݤǴ٠Ъ٬ךૈ୼ӧ೭ٿԃύᕇள೚ӭΜϩᜤૈё຦ޑࣴز࿶ᡍǶ ! ! ௗ๱ाགᖴǺ! ךޑРᒃ! Ֆ௴ྤӃғᆶ҆ᒃ! ഋျೱζγǴག ᖴР҆ჹךޑᎦػϐৱǴ٠Ъ௲ᏤךȨոΚόᏮǴόۙόኙȩޑΓғ ᄊࡋǶаϷགᖴךޑٿՏۊۊ! ٵފǵٵ޹ჹךޑྣ៝ᆶࠀᓰǶჴᡍ ࠻ޑ! ඳᑼᏢߏǵॕЎᏢߏǵଐউᏢۆǵШѶᏢߏǵࠂЎᏢߏǵഩЎ ᏢߏǵᝬፉᏢߏǴགᖴᏢߏۆॺӧךࣴزғࢲύ๏ϒךࡰᏤǶӕืӳ ϶! ໋ണǵՙߞǵৎᇬǵ݊ਖǶᏢ׌ۂॺ! ࠧ⪮ǵॡᏂǵࡌᅺǵ᐀ᖚǴ གᖴգॺӧᏢ཰ϷВதғࢲύჹךޑᔅշᆶႴᓰǶќѦाགᖴךޑε Ꮲշ௲! ೾ࡏ׊ᏢߏǴᏢߏӧךΓғᙯ௬ᗺ๏ϒךख़ाࡌ᝼ϩ݋Ǵך ωԖᐒ཮ӧ೭္ֹԋךޑᏢ཰Ƕ ! ! നࡕाགᖴךޑёངζ϶! றֱλۆǴӧךॺᇡ᛽ΜԃҬ۳Ϥԃ ޑၸำύǴགᖴیޑഉՔᆶЍ࡭ǶԖیբࣁךޑЈᡫЍࢊǴךωԖ߿ ਻य़ჹ೚ӭᖑᜤࡷᏯǴ҂ٰᗋഞྠیᝩុӭӭࡰ௲Ƕ ! ! ໻ஒԜጇፕЎ᝘๏܌ԖᜢЈྣ៝ךޑᒃܻӳ϶Ƕ Ֆߪቺ! 2009 ԃ 6 Д! ܭ॥ࠤҬε

(6)

Contents

List of Tables

TABLE 1.23FEATURES OF THE DATASET. ... 15

TABLE 2.CONFUSION MATRIX AND PERFORMANCE-EVALUATION METHOD. ... 18

TABLE 3.THE MAXIMUM ACCURACY OF FEATURES LARGER THAN 90%. ... 19

TABLE 4.CORRELATION COEFFICIENT MATRIX. ... 20

(8)

List of Figures

List of Figures

FIG.1.CHI-SQUARE DISTRIBUTION WITH DF =7 AND LPHA =0.05. ... 7

FIG.2.THE RELATIVE UNCERTAINTY BASED DISTRIBUTION OF PROTOCOL TYPES.

... 9

FIG.3.THE PROCESS OF CHI-SQUARE TEST BASED ANOMALY DETECTION. ... 12

FIG. 4. MEAN MANHATTAN DISTANCE VS. LENGTH OF RELATIVE UNCERTAINTY

MONITOR-WINDOW... 17

FIG.5.ACCURACY RATE AT DIFFERENT SIGNIFICANCE LEVEL. ... 26

FIG.6.TRUE POSITIVE RATE AT DIFFERENT SIGNIFICANCE LEVEL. ... 27

FIG.7.FALSE POSITIVE RATE AT DIFFERENT SIGNIFICANCE LEVEL. ... 28

FIG.8.0.5%SIGNIFICANCE LEVEL. ... 29 FIG.9.0.1%SIGNIFICANCE LEVEL. ... 30 FIG.10.0.01%SIGNIFICANCE LEVEL. ... 31

FIG.11.RECEIVER OPERATING CHARACTERISTIC CURVE. ... 32

FIG.12.ROCCURVE OF FEATURE C. ... 33

FIG.13.ROCCURVE OF FEATURE D. ... 33

FIG.14.ROCCURVE OF FEATURE M. ... 34

FIG.15.ROCCURVE OF FEATURE N. ... 34

FIG.16.ROCCURVE OF FEATURE R. ... 35

(9)

Chapter 1. Introduction

Chapter 1. Introduction

With the rapid growth of Internet, there is increasing size and complexity of Internet traffic data. In the meanwhile, the damage of cyber attacks on the Internet is getting more and more severe. Therefore, network security is becoming an important issue for network users. Traditional network protection mechanism such as firewall is not enough to detect fast-changing attacks at the present time. Intrusion detection system is one of the major devices that has recently developed to detect and prevent different types of attacks.

The techniques adopted in intrusion detection are generally classified into two types: misuse detection and anomaly detection. Misuse detection is a technique which detects attacks with signatures. For accurate detection, the signature database of misuse detection systems must be updated frequently. Misuse detection systems are in general unable to detect new security attacks. Anomaly detection is a technique which

(10)

profiles normal behaviors at the beginning, and compares network activities with normal behavior profiles to detect possible security attacks. Anomaly detection is based on the observation that the network activities during attacks are often quite different from the activities under normal usage. Statistics such as mean, variance, or even probability distribution were adopted as metrics for detecting attacks [1]. Compared with misuse detection, the major advantage of anomaly detection is that it does not require a database of signatures and can detect and prevent the outbreak of new attacks.

Kim et al. [2] proposed an optimized intrusion detection system using Principle Component Analysis (PCA) and Back-propagation Neural Network (BNN) based on Genetic Algorithm (GA). The research seeks to not only decrease dimension of features but also figure out intrinsic feature set. They used the KDD CUP 1999 data to validate the proposed approach for detecting DoS attack. The results show that the feature dimension decreases to 10 dimensions and the highest detection rate is about 91.00%. In this thesis, we present a new scheme which achieves higher accuracy with lower complexity.

(11)

One important step of anomaly detection is data-processing (or data-profiling), a process which transforms original Internet packet information (e.g. protocol type, service type, port number, IP address) into “traffic behavior patterns” [3]. There are many possible methods for data-profiling. In our research, we use “entropy-based scheme” to create traffic behavior patterns for our data-profiling system and analyze them with “Chi-Square Goodness-of-Fit Test” [4]. The “particular distribution” of a specific number of packets is described using the entropy-based scheme. And the detection module detects attacks with the famous Chi-Square Goodness-of-Fit Test after the data-profiling process.

In Chapter 2 we introduce background including entropy, Relative Uncertainty, and Chi-Square Goodness-of-Fit Test. In Chapter 3 we present our proposed scheme and explain the details of the procedure. The dataset used in experiments is described in Chapter 4. Simulation results are contained in Chapter 5. Finally, we draw conclusion in Chapter 6.

(12)

Chapter 2. Background

Chapter 2. Background

A. Entropy and Relative Uncertainty

Entropy is an indication which measures the “observational variety” contained in the data [5]. Consider a random variable X that may have NX

discrete values. If we randomly observe X for m times, there would generate a probability distribution on X,

( )_i _i/ , _i

p x =m m x Î X (1) where m_i represents the number of times we observe taking the value X x_i.

The entropy of X is defined as

2 ( ) ( ) log ( ) i i i x X H X p x p x Î = -

å

(2) max 2 0£H X( )£H ( )X =log min{N_X, }m (3)

where ( )H_max X is the maximum entropy and by convention

0log 0 0(unobserved possibilities do not enter the measure).

In [6], the Relative Uncertainty (RU) is defined as the standardized entropy and is given by

(13)

Chapter 2. Background max ( ) ( ) ( ) ( ) log min{ _X, } H X H X RU X H X N m = = , 0£RU X( )£1. (4) Obviously, if all the observed values are the same, i.e.,p x( )=1 , for some xÎX , then we have RU X( )=0. On the other hand, if all the observed values are different, meaning that there is the highest level of variety in the observed data, then it holds that ( )RU X =1. In general, RU X( )1 indicates that the data distribution is more skewed, and RU X( )@1 means that the values of the observed data are close to being uniformly distributed. In Chapter 3 we use above definitions and properties to convert original packet information into behavior profiles.

B. Chi-Square Goodness-of-Fit Test

The Chi-Square Goodness-of-Fit Test is a hypothesis test which compares two probability distributions to decide the degree of difference [4]. Some definitions as needed for our study.

The null and alternative hypotheses for the test are: H0: The variable has the specified distribution, and

Ha: The variable does not have the specified distribution.

The Chi-Square Goodness-of-Fit Test is to compute the test statistics expressed as

(14)

Chapter 2. Background 2 2 ( _i _i) / _i i O E E

¦

F

(5)

where Oi is the observed frequency and Ei is the expected frequency from

the regular distribution for event i. The significance level is a threshold which is decided based on the extent of computer vulnerability. For highly secure computer networks, is chosen to be small so that results are

statistically significant at level. The degree of freedom is given by

1

df I , where I is the number of possible values for the variable. The degree of freedom determines the exact shape of a chi-square distribution. Given a significance level, there is a corresponding threshold which is the decisive value to determine if the null hypothesis is true. In other words, if the chi-square value of two distributions computed by equation (5) exceeds the threshold, then the distributions of two observed data are declared to be different at significance level .

(15)

Chapter 2. Background

Fig. 1. Chi-Square Distribution with df = 7 and = 0.05.

For example, in Fig. 1 the degree of freedom is 7, the significance level is 0.05, and the corresponding threshold is 14.067. If we perform the Chi-Square Test and obtain a value 20.0, then, at the 5% significance level, the data provide sufficient evidence to conclude that the observed distribution differs from the expected distribution.

(16)

Chapter 3. Our Proposed Scheme

Chapter 3. Our Proposed Scheme

This Chapter contains the main ideas of data-profiling and anomaly detection with Chi-Square Goodness-of-Fit Test. A general Internet packet header includes information such as protocol type, port number, and IP address. Such information can be used to derive statistics that are related to the behaviors of Internet network, and such behaviors can be categorized as normal behaviors or abnormal behaviors. The abnormal behaviors may be regarded as attacks or intrusions. There are two issues. How do we scientifically build the profiles of the behaviors of a network system? And how do we identify attacks from the profiled data?

A. Relative Uncertainty Based Distribution

To address the first issue, we develop a methodology which uses the concept of Relative Uncertainty. As an example, assume that there are three types of protocols: TCP, UDP, and ICMP, they are recorded in packet headers. Hence, we observe sequential protocol types as a time series for

(17)

a series of packets. This time series of protocol types can be transformed into a time series of Relative Uncertainty. As mentioned before, the Relative Uncertainty represents the observational variety in the network traffic. A major advantage of using Relative Uncertainty for data profiling is that it can find the same messages hiding in many features simultaneously without concerning the different units of the features.

Fig. 2. The Relative Uncertainty Based Distribution of Protocol Types.

In Fig. 2, there is a protocol type series extracted from the header of observed packets. The Relative Uncertainty is calculated every N

packets. (In Fig. 2, we have N .) A series of Relative Uncertainty is 9 obtained after this process. The value of N is determined as follows.

(18)

We assume that the two adjacent values in the Relative Uncertainty series should not differ a lot for normal behavior. Therefore, the Mean Manhattan Distance, which can describe the absolute difference of the adjacent values in the Relative Uncertainty series, is adopted in determining the value of N. Define the Manhattan Distance of the Relative Uncertainty series as

( , 1) 1 K k k j j j j k MD

¦

RU RU (6) where K is the number of dimension (or features), and the Mean Manhattan Distance as 1 ( , 1) 1 1 J j j j Mean MD MD J

¦

(7) where J is the total number of index. The value of N is selected to minimize the Mean Manhattan Distance. After the RU series is generated, we construct the probability distribution of the series as the long-term profile of network behavior.

B. Chi-Square Test Based Anomaly Detection

To decide whether or not network behavior is normal during a specific time period, we collect network activities during that period, construct its

(19)

profile (i.e., the distribution of the RU series), and compare the profile with that derived from normal behavior. We adopt the well-known Chi-Square Test to compare two distributions.

In our proposed approach, the normal behavior profile is constructed off-line with data collected for a long period of time without any attack. Since it is constructed by a long time observation, the profile is likely to be a stable distribution. The meaning of short-term profile is a model of dynamic behaviors which is generated by monitoring the short time behaviors of a network system during a specific period of time.

(20)

Fig. 3. The Process of Chi-Square Test Based Anomaly Detection.

Fig. 3 shows the technique of Chi-Square Test based anomaly detection. The expected distribution is equivalent to the lone-term profile in this case and the observed distribution is the same as the short-term profile. Assume the expected distribution has been generated by observing long time behaviors of normal activities of a network system. We first apply a sliding window to compute the values of Relative Uncertainty which are transformed from online collection of network activities. The computed Relative Uncertainty values are then used to construct the observed

(21)

distributions by using Chi-Square Goodness-of-Fit Test. Clearly, the process gives a sequence of chi-square values. If a chi-square value is greater than the pre-determined threshold, the activities during the period of time the chi-square value is computed are regarded as abnormal.

(22)

Chapter 4. Data Set

Chapter 4. Data Set

We use the data set of KDDCUP 1999 [7] built for the world-wide competition of designing intrusion detection systems. The data set has 41 features which can be grouped into 3 categories, namely, Basic Feature: those which can be extracted from packet header without inspecting the payload; Content Feature: those generated by accessing the payload of the original packet; and Time based Traffic Feature: those traffic features computed using a 2 second time window.

In our study, we focus on the denial-of-service attack (DoS attack) that is characterized by an obvious attempt by attackers to prevent legitimate users of a service from using that service. The basic and time based traffic features are suitable to detect the DoS attacks [8]. Therefore, we select 23 features that are chosen from the basic features and time based traffic features, as indicated in Table 1.

(23)

Chapter 4. Data Set

Table 1. 23 Features of the Dataset.

Label Feature Type of

attribute A protocol_type symbolic B Service symbolic C src_bytes numerical D dst_bytes numerical E count numerical F srv_count numerical G serror_rate numerical H srv_serror_rate numerical I rerror_rate numerical J srv_rerror_rate numerical K same_srv_rate numerical L diff_srv_rate numerical M srv_diff_host_rate numerical N dst_host_count numerical O dst_host_srv_count numerical P dst_host_same_srv_rate numerical Q dst_host_diff_srv_rate numerical R dst_host_same_src_port_rate numerical S dst_host_srv_diff_host_rate numerical T dst_host_serror_rate numerical U dst_host_srv_serror_rate numerical V dst_host_rerror_rate numerical W dst_host_srv_rerror_rate numerical

(24)

Chapter 5. Simulation Results

Chapter 5. Simulation Results

In this Chapter, we evaluate the performance of our proposed behavior-based anomaly detection algorithm for KDD 1999 data set. First at all, we decide the size of N that minimizes the Mean Manhattan Distance. We request the number of elements in the Relative Uncertainty series of the long-term profile to be at least 100, because the Chi-Square Goodness-of-Fit Test is based on an assumption of large sample size. The result is N 24.

(25)

Chapter 5. Simulation Results 0 24 100 200 300 400 500 600 700 800 900 1000 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12

Length of RU. Monitor-Window

M ean M an h attan D is tan ce

The Mean Manhattan Distance of 23 features

Fig. 4. Mean Manhattan Distance vs. the Length of Relative Uncertainty Monitor-Window.

(26)

Table 2. Confusion Matrix and Performance-Evaluation Method.

Actual value

Bad Good total Prediction

outcome

Bad (A) True Positive (C) False Positive (A) + (C) Good (B) False Negative (D) True Negative (B) + (D)

total (A) + (B) (C) + (D) (A)+(B) +(C)+(D) True Positive Rate (TPR) = A / (A+B)

False Positive Rate (FPR) = C / (C+D) Accuracy (ACC) = (A+D) / (A+B+C+D)

In Table 2, there are the definitions of True Positive, False Positive, False Negative, True Negative, True Positive Rate (detection rate), False Positive Rate, and Accuracy. To evaluate our proposed scheme, we select one feature of the set at a time in this simulation. The top six features ranked by the accuracy are src_bytes (C), dst_bytes (D), srv_diff_host_rate (M), dst_host_count (N), dst_host_same_src_port_rate (R), and dst_host_srv_diff_host_rate (S). These features can be used to detect DoS attacks effectively.

(27)

Table 3. The Maximum Accuracy of Features Larger Than 90%.

Feature 0.5% 0.1% 0.01% ACC (%) TPR (%) FPR (%) ACC (%) TPR (%) FPR (%) ACC (%) TPR (%) FPR (%) C 94.28 98.45 22.60 94.89 98.21 18.56 95.43 98.24 15.94 D 95.17 98.94 20.02 95.91 98.71 15.35 96.55 98.64 11.86 M 94.03 97.68 20.74 94.59 97.69 18.01 95.18 97.55 14.44 N 94.80 97.00 14.33 94.95 96.57 11.76 95.00 95.65 7.71 R 95.97 98.11 12.71 96.01 97.93 11.81 96.19 98.51 13.26 S 94.20 97.30 18.63 94.40 96.92 16.03 94.44 96.60 14.51

Table 3 shows the accuracy, true positive rate, and false positive rate of the features at different significance levels. We study the accuracy for different significance levels. Results show that the accuracy increases while the significance level decreases. Note that a smaller significance level results in a larger threshold, which decreases false positive rate and increases false negative rate. In our experiment, the false negative rate increases by r1%_{and the false positive rate reduces by 3 ~ 4% .}

(28)

Table 4. Correlation Coefficient Matrix.

C D M N R S C 1.0000 0.7448 0.6512 0.8037 0.7739 0.7082 D 0.7448 1.0000 0.8192 0.7259 0.6960 0.6242 M 0.6512 0.8192 1.0000 0.6717 0.6366 0.5863 N 0.8037 0.7259 0.6717 1.0000 0.9036 0.8684 R 0.7739 0.6960 0.6366 0.9036 1.0000 0.8483 S 0.7082 0.6242 0.5863 0.8684 0.8483 1.0000

Table 4 shows the correlation coefficient matrix evaluated from the Relative Uncertainty time series of the six features listed in Table 3. They are highly correlated with each other. In other words, using a single feature with the highest accuracy should suffice for detection of DoS attacks.

The true positive rate of our proposed scheme is higher than that (i.e., 91%) of the scheme presented in [2]. Besides, our scheme uses only one feature. Our study shows that transforming the original data sequence into a sequence of Relative Uncertainties could be an effective solution for detecting network attacks with low computation complexity.

(29)

Chapter 6. Conclusion

Chapter 6. Conclusion

In this thesis, we proposed a novel, two-stage approach for detecting network attacks. In the first stage, normal behavior profiles are constructed based on Relative Uncertainty. In the second stage, the Chi-Square Goodness-of-Fit Test is performed for the distributions obtained from behavior profiling and network activities collected online. We demonstrated the effectiveness of our proposed scheme with the KDD 1999 dataset for DoS attacks. Simulation results show that our proposed scheme achieves lower complexity and higher accuracy than previous schemes. Based on the experimental results, we believe that the proposed scheme could be a good choice for network behavior profiling and attack detection.

(30)

Bibliography

Bibliography

[1] T.-Q. Zhu and P. Xiong, “Optimization of membership functions in anomaly detection based on fuzzy data mining,” in Proc. ICMLC

International Conference Machine Learning and Cybernetics, 2005.

[2] D. S. Kim, H.-N. Nguyen, T. Thein, and J. S. Park, “An Optimized Intrusion Detection System Using PCA and BNN,” in Proc.

Information and Telecommunication Technologies, 6th Asia-Pacific

Symposium, p.p. 356-359, 10-10 Nov. 2005

[3] K. Xu, F. Wangm S. Bhattacharyya, and Z.-L. Zhang, “A Real-time Network Traffic Profiling System,” in Proc. DSN Dependable Systems

and Networks, 2007.

[4] R. Goonatilake, A. Herath, S. Herath, and J. Herath, “Intrusion Detection Using the Chi-square Goodness-of-fit Test for Information Assurance, Network, Forensics and Software Security,” JCSC Journal

of Computing Sciences in Colleges, VOL. 23, p.p. 255-263, issue 1,

October 2007.

[5] T. Cover and J. Thomas, “Elements of Information Theory,” ser. Wiley Series in Telecommunications, New York, Wiley, 1991.

(31)

Bibliography

[6] K. Xu and Z.-L. Zhang, “Internet Traffic Behavior Profiling for Network Security Monitoring,” IEEE Transactions on Networking, VOL. 16, NO. 6, December 2008.

[7] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

[8] M. F. Abdollah, A. H. Yaacob, S. Sahib, I. Mohamad, and M. F. Iskandar, “Revealing the Influence of Feature Selection for Fast Attack Detection,” IJCSNS International Journal of Computer Science and

(32)

Appendix

Appendix

Table 5. Brief Description of the Feature Set. Label Name of

attribute Description

Type of attribute A protocol_type Protocol type (TCP or UDP) symbolic B Service Network servcie on the destination(eg.

HTTP, FTP, etc.) symbolic C src_bytes Number of source bytes transferred numerical D dst_bytes Number of destination bytes transferred numerical

E count

Number of connections to the

same-host as the current connect in the past two seconds

numerical

F srv_count

Number of connections to the

same-service as the current connection in the past two seconds

numerical

G serror_rate Percent of connection to the same-host

that have “SYN” errors numerical H srv_serror_rate Percent of connection to the

same-service that have “SYN” errors numerical I rerror_rate Percent of same-host connections that

have “REJ” (reject) errors numerical J srv_rerror_rate Percentage of same-service connections

that have “REJ” errors numerical K same_srv_rate Percent of same-host connections to the

same service numerical

L diff_srv_rate Percent of same-host connections to

(33)

Appendix Table 5. (Continued) Label Name of attribute Description Type of attribute M srv_diff_host_rat e

Percent of same-service connections to

different hosts numerical N dst_host_count

Number of connections to the same host (as the current connection) in the past two seconds, from destination to host

numerical

O dst_host_srv_co unt

Number of connections to the same service (as the current connection) in the pass two seconds, from the same destination to host

numerical

P dst_host_same_s rv_rate

Percent of same host-to-destination

connections to the same service numerical Q dst_host_diff_sr

v_rate

Percent of same host-to-destination

connections to different services numerical R dst_host_same_s

rc_port_rate

Percent of the same host-to-destination

connections to same source port numerical S dst_host_srv_dif

f_host_rate

Percent of connections to the same

service coming from different hosts numerical

T dst_host_serror_ rate

Percent of connection to the same host (as the current connection), from the same destination that have “SYN” errors

numerical

U dst_host_srv_ser ror_rate

Percent of connection to the same service (as the current connection), from the same destination that have “SYN” errors

numerical

V dst_host_rerror_r ate

Percent of connection to the same host (as the current connection), from the same destination that have “REJ” errors

numerical

W dst_host_srv_rer ror_rate

Percent of connection to the same service (as the current connection), from the same destination that have “REJ” errors

(34)

Appendix

Fig. 5. Accuracy Rate at Different Significance Level.

(35)

Appendix

Fig. 6. True Positive Rate at Different Significance Level.

(36)

Appendix

Fig. 7. False Positive Rate at Different Significance Level.

(37)

Appendix

Fig. 8. 0.5% Significance Level.

(38)

Appendix

(39)

Appendix

(40)

Appendix

Fig. 11. Receiver Operating Characteristic (ROC) Curve.

In Fig. 11, the diagonal line divides the ROC space in areas of better or worse classification. Points above the diagonal line indicate good classification results, while points below the line indicate worse results.

(41)

Appendix 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False Positive Rate

Tr ue Po si tiv e R ate 0.5% 0.1% 0.01%

Fig. 12. ROC Curve of Feature C.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False Positive Rate

(42)

Appendix 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False Positive Rate

Fig. 14. ROC Curve of Feature M.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1