INTRODUCTION - 適用於動態環境中偵測離群值之決策支援機制

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

CHAPTER 1 INTRODUCTION

1.1 Background and Motivation

Achieving brain-like intelligence is one of the fundamental goals in computational intelligence. Two remarkable properties of which are the ability to adapt to the non-stationary environment and the ability to learn from noisy and incomplete data incrementally (Sendhoff et al., 2009). The other two aspects of brain-like intelligence mentioned by Bezdek (1994) are high computation speed and less error rate like human beings. Nowadays many companies face the challenge of dynamic market environment whose characteristic of data nature is concept drifting and non-stationary, like the stock market index, the traffic flow in computer networking, and the purchase order from customers, etc.

The word “concept drifting” means the concepts are not stable and changing with time (Tsymbal, 2004). That is, as the time passes, the trend embedded in the observation data usually changes. Tsymbal (2004) mentioned the concept drifting environment makes learning a model from data a complicated task. Masud et al.

(2010) also point out data feature not only becomes concept drifting but also potentially infinite, especially in time series data. He (2011) claims that an intelligence model should have the capability to modify its knowledge or concept based on new data distribution. Nevertheless, Bifet et al. (2011) remind us that the model should adapt to concept drift as soon as possible and do not affect by noise.

From the literature review, many scholars have proposed the incremental learning approaches to cope with changing environment (Buschermöhle, Schoenke &

Brockmann, 2012). Elwell and Polikar (2011) point out that the incremental learning

‧

technique can not only learn new concept but also keep the existing and still relevant concept in the training model. In addition, incremental learning also tries to drop the unrelated concept away. Thus, Widmer & Kubat (1996) solve the concept drifting problem with incremental learning strategy via moving window which keeps the latest and relevant data in the window. However, dealing with outliers in concept drifting environment makes this work more complicated to achieve.

Tsaih and Cheng (2009, page 162) define outliers as “the observations far away from the fitting function deduced from a subset of the given observations.” The side-effect of the outliers has been discussed for a long time and in many fields. For instance, Chen and Liu (1993) point out that the side-effect of outliers would diminish forecast accuracy in time series data. Tolvi (2002) takes the outliers’ side-effect into consideration while predicting monthly stock market index return via ARMA model.

The result shows that the data sets without outliers, distinguished by autoregressive, get better predictions. Fitting the observations with outliers could decrease the effectiveness of the fitting function, because the outliers have a large influence on model estimation with their high fitting deviances. Thus, Olson and Shi (2007) point out outlier detection is a critical process in data cleansing and the step of data cleansing is very important before modelling the data.

Hodge and Austin (2004) have defined the term “outlier detection” as detecting and removing anomalous instances from data. A variety of outlier detection techniques aim to identify the instances which deviate considerably from the most of data and then purify the data.

Outlier detection have also been discussed in many fields like intrusion detection in network security, fraud detection in financial analysis, and fault detection in

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

engineering systems. In the network management, it’s necessary to protect the unauthorized individual intrude private network or web server to destroy the information system and steal confidential documents.

1.2 Research Question

Zimek, Campello and Sander (2014) point out that outlier detection in the concept drifting environment where the fitting function form is unknown is really a challenging problem. Some scholars use distance-based method and others develop their models based on the clustering or density feature. Nevertheless, most of these methods do not solve the concept drifting environment problem.

This study derives a decision support mechanism (DSM) for effectively detecting outliers in the concept drifting environment. Specifically, the derived DSM is designed to help detect the intrusions detection in network security. The DSM will identify the resulted type of all of the instances and then output a small amount of outlier candidates such that the decision maker to further double check that the outlier candidate is true or not. Due to only outputting a small amount of outlier candidates, it’s expected to achieve time-saving goal for decision maker.

1.3 Research Method

This study first derives an outlier detection DSM based upon the work of Huang et al. (2014). They propose an envelope module adopting not only the deviance information but also the order information to distinguish potential outliers. In brief,

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

their study has implemented an algorithm to detect anomalous patterns effectively in non-changing environment where there is no assumption about the fitting function form.

This study adopts not only the resistant learning (with the envelope module) (Huang et al., 2014), but also the incremental learning strategy (with the moving window technique). With implementing the incremental learning strategy, we use the moving window technique such that we can integrate the resistant learning with incremental learning for solving the outlier detection problem in the concept drifting environment. The term “the resistant learning and incremental learning” in this research means a resistant learning algorithm is used to learn training set to learn the trend of the data stream, where the training set is changing while time passing.

1.4 Purpose and Contribution

This study proposes an outlier detection DSM that helps coping with the intrusion detection in network security in a concept drifting environment. This work places great emphasis on implementing incremental learning strategy since it needs to cope with the concept drifting problem in time series data, like networking flow logs, financial data, etc. Furthermore, this DSM also implements unsupervised learning technique where fitting function and target value is unknown. Here the target value is the resulted type, either non-outlier or outlier. We expect this work become a solid foundation for future researchers or applications. Not only improve the efficiency of adapting to the concept drifting environment but also increase the accuracy of detecting the anomalous patterns.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

In the future work, the proposed DSM will be implemented to cope with the real-world application. Like the previous elaboration, we hope this DSM can be adopted in intrusion detection system (IDS). The intrusion detection system is the system with functions of detecting, identifying and responding to unauthorized or abnormal behaviors on an information system (Joo, Hong & Han, 2003). The purpose of the application in IDS is expected to distinguish between instructions and normal activities. Especially, a fatal problem may arise a huge impact, so the IDS with a proper algorithm or DSM is very necessary to the application of network intrusion detection. Particularly in this era, while the sophisticated IDS had been developed to detect intrusion, the new challenges may come up with the new network behaviors which may be unknown. This problem cause we need to build new approaches to cope with the new threats. Also Maggi et al. (2009) point out the web applications have encountered the concept drifting problem nowadays. In the concept drifting environment, the zero-day attack is more critically challengeable problem to IDS (Bilge & Dumitras, 2012). The behavior of web application will change frequently and significantly. So there is really a very desirable need for developing a proper method to meet this challenge. In sum, this proposed DSM is expected to apply to real applications as a DSM, and this DSM is expect to help decision maker detect outlier in effective and efficient way.

Furthermore, it can be combined with other tools as an ensemble detector (Zimek, Campello and Sander, 2014). Or work with semi-supervised learning strategy based on the historical decision maker’s determination or pre-defined target value. If it does well, we expect it can apply to more fields as a DSM.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

1.5 Content Organization

In chapter 1, we elaborate the current research problem about providing DSM with outlier detection problem in a concept drifting environment. Chapter 1 also consists of research background, motivation and research problem definition, propose and proposed contribution. In chapter 2, which is the literature review section, we will discuss some related work by others. The investigated fields of this study includes concept drifting, outlier detection, resistant learning, envelope module, moving window technique, and zero-day attack. In chapter 3, we present the proposed DSM in detail. The elaboration also reveals what the decision maker need to operate in coordination with the proposed DSM. In chapter 4, both the experiment design and the experiment result are described. In this section, we also try to evaluate the performance of this DSM. The final chapter, chapter 5, is consist of conclusion, our study’s contribution and future research topics.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中適用於動態環境中偵測離群值之決策支援機制 - 政大學術集成 (頁 9-15)