Simulation Results - 偵測網際網路攻擊之基於熵的網路行為模式建立演算法

In this Chapter, we evaluate the performance of our proposed behavior-based anomaly detection algorithm for KDD 1999 data set. First at all, we decide the size of N that minimizes the Mean Manhattan Distance.

We request the number of elements in the Relative Uncertainty series of the long-term profile to be at least 100, because the Chi-Square Goodness-of-Fit Test is based on an assumption of large sample size. The result is N 24.

Chapter 5. Simulation Results

0 24 100 200 300 400 500 600 700 800 900 1000

0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12

Length of RU. Monitor-Window

Mean Manhattan Distance

The Mean Manhattan Distance of 23 features

Fig. 4. Mean Manhattan Distance vs. the Length of Relative Uncertainty Monitor-Window.

Chapter 5. Simulation Results

In Table 2, there are the definitions of True Positive, False Positive, False Negative, True Negative, True Positive Rate (detection rate), False Positive Rate, and Accuracy. To evaluate our proposed scheme, we select one feature of the set at a time in this simulation. The top six features ranked by the accuracy are src_bytes (C), dst_bytes (D), srv_diff_host_rate (M), dst_host_count (N), dst_host_same_src_port_rate (R), and dst_host_srv_diff_host_rate (S). These features can be used to detect DoS attacks effectively.

Chapter 5. Simulation Results

Table 3. The Maximum Accuracy of Features Larger Than 90%.

D 95.17 98.94 20.02 95.91 98.71 15.35 96.55 98.64 11.86 M 94.03 97.68 20.74 94.59 97.69 18.01 95.18 97.55 14.44 N 94.80 97.00 14.33 94.95 96.57 11.76 95.00 95.65 7.71 R 95.97 98.11 12.71 96.01 97.93 11.81 96.19 98.51 13.26 S 94.20 97.30 18.63 94.40 96.92 16.03 94.44 96.60 14.51

Table 3 shows the accuracy, true positive rate, and false positive rate of the features at different significance levels. We study the accuracy for different significance levels. Results show that the accuracy increases while the significance level decreases. Note that a smaller significance level results in a larger threshold, which decreases false positive rate and increases false negative rate. In our experiment, the false negative rate increases by r1% and the false positive rate reduces by 3 ~ 4% .

Chapter 5. Simulation Results

Table 4. Correlation Coefficient Matrix.

C D M N R S C 1.0000 0.7448 0.6512 0.8037 0.7739 0.7082 D 0.7448 1.0000 0.8192 0.7259 0.6960 0.6242 M 0.6512 0.8192 1.0000 0.6717 0.6366 0.5863 N 0.8037 0.7259 0.6717 1.0000 0.9036 0.8684 R 0.7739 0.6960 0.6366 0.9036 1.0000 0.8483 S 0.7082 0.6242 0.5863 0.8684 0.8483 1.0000

Table 4 shows the correlation coefficient matrix evaluated from the Relative Uncertainty time series of the six features listed in Table 3. They are highly correlated with each other. In other words, using a single feature with the highest accuracy should suffice for detection of DoS attacks.

The true positive rate of our proposed scheme is higher than that (i.e., 91%) of the scheme presented in [2]. Besides, our scheme uses only one feature. Our study shows that transforming the original data sequence into a sequence of Relative Uncertainties could be an effective solution for detecting network attacks with low computation complexity.

Chapter 6. Conclusion

Chapter 6. Conclusion

In this thesis, we proposed a novel, two-stage approach for detecting network attacks. In the first stage, normal behavior profiles are constructed based on Relative Uncertainty. In the second stage, the Chi-Square Goodness-of-Fit Test is performed for the distributions obtained from behavior profiling and network activities collected online.

We demonstrated the effectiveness of our proposed scheme with the KDD 1999 dataset for DoS attacks. Simulation results show that our proposed scheme achieves lower complexity and higher accuracy than previous schemes. Based on the experimental results, we believe that the proposed scheme could be a good choice for network behavior profiling and attack detection.

Bibliography

Bibliography

[1] T.-Q. Zhu and P. Xiong, “Optimization of membership functions in anomaly detection based on fuzzy data mining,” in Proc. ICMLC International Conference Machine Learning and Cybernetics, 2005.

[2] D. S. Kim, H.-N. Nguyen, T. Thein, and J. S. Park, “An Optimized Intrusion Detection System Using PCA and BNN,” in Proc.

Information and Telecommunication Technologies, 6th Asia-Pacific Symposium, p.p. 356-359, 10-10 Nov. 2005

[3] K. Xu, F. Wangm S. Bhattacharyya, and Z.-L. Zhang, “A Real-time Network Traffic Profiling System,” in Proc. DSN Dependable Systems and Networks, 2007.

[4] R. Goonatilake, A. Herath, S. Herath, and J. Herath, “Intrusion Detection Using the Chi-square Goodness-of-fit Test for Information Assurance, Network, Forensics and Software Security,” JCSC Journal of Computing Sciences in Colleges, VOL. 23, p.p. 255-263, issue 1, October 2007.

[5] T. Cover and J. Thomas, “Elements of Information Theory,” ser. Wiley Series in Telecommunications, New York, Wiley, 1991.

Bibliography

[6] K. Xu and Z.-L. Zhang, “Internet Traffic Behavior Profiling for Network Security Monitoring,” IEEE Transactions on Networking, VOL. 16, NO. 6, December 2008.

[7] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

[8] M. F. Abdollah, A. H. Yaacob, S. Sahib, I. Mohamad, and M. F.

Iskandar, “Revealing the Influence of Feature Selection for Fast Attack Detection,” IJCSNS International Journal of Computer Science and Network Security, VOL.8, No.8, August 2008.

Appendix

Appendix

Table 5. Brief Description of the Feature Set.

Label Name of

attribute Description Type of

attribute A protocol_type Protocol type (TCP or UDP) symbolic B Service Network servcie on the destination(eg.

HTTP, FTP, etc.) symbolic

C src_bytes Number of source bytes transferred numerical D dst_bytes Number of destination bytes transferred numerical

E count

G serror_rate Percent of connection to the same-host

that have “SYN” errors numerical H srv_serror_rate Percent of connection to the

same-service that have “SYN” errors numerical I rerror_rate Percent of same-host connections that

have “REJ” (reject) errors numerical J srv_rerror_rate Percentage of same-service connections

that have “REJ” errors numerical K same_srv_rate Percent of same-host connections to the

same service numerical

L diff_srv_rate Percent of same-host connections to

different services numerical

Appendix

connections to the same service numerical Q dst_host_diff_sr

v_rate

Percent of same host-to-destination

connections to different services numerical R dst_host_same_s

rc_port_rate

Percent of the same host-to-destination

connections to same source port numerical S dst_host_srv_dif

f_host_rate

Percent of connections to the same

service coming from different hosts numerical

T dst_host_serror_

Appendix

Fig. 5. Accuracy Rate at Different Significance Level.

Fig. 5. (Continued)

Appendix

Fig. 6. True Positive Rate at Different Significance Level.

Fig. 6. (Continued)

Appendix

Fig. 7. False Positive Rate at Different Significance Level.

Fig. 7. (Continued)

Appendix

Fig. 8. 0.5% Significance Level.

Fig. 8. (Continued)

Appendix

Fig. 9. 0.1% Significance Level.

Fig. 9. (Continued)

Appendix

Fig. 10. 0.01% Significance Level.

Fig. 10. (Continued)

Appendix

Fig. 11. Receiver Operating Characteristic (ROC) Curve.

In Fig. 11, the diagonal line divides the ROC space in areas of better or worse classification. Points above the diagonal line indicate good classification results, while points below the line indicate worse results.

Appendix

在文檔中偵測網際網路攻擊之基於熵的網路行為模式建立演算法 (頁 24-43)