適用於動態環境中偵測離群值之決策支援機制 - 政大學術集成

全文

(1)國立政治大學資訊管理學系. 碩士學位論文指導教授：蔡瑞煌博士. 立. 政治大. ‧ 國. 學. 適用於動態環境中偵測離群值之決策支援機制. ‧. sit. y. Nat. A Decision Support Mechanism for Outlier. n. er. io. Detection in the Concept Drifting Environment al v i n Ch engchi U. 研究生：林哲緯中華民國 104 年 6 月.

(2) A Decision Support Mechanism for Outlier Detection in the Concept Drifting Environment Abstract Outliers are observations far away from the fitting function that is deduced from the bulk of the given observations. Recently, to detect them has become an important issue. Since the data nature in the current era has become more concept-drifting, the outlier detection has become more challenging. To address this challenging issue, this. 政治大 detection problem in the concept-drifting environment. Specifically, this study wants 立. study develops a decision support mechanism (DSM) for coping with the outlier. ‧ 國. 學. to derive a DSM for identifying the potential intrusion detection in network security. The proposed DSM has the following features: (1) the implementation of the resistant. ‧. learning concept via the adaptive single-hidden layer feed-forward neural networks,. sit. y. Nat. (2) the implementation of the incremental learning concept via the moving window. n. al. er. io. technique, and (3) the efficiency and effectiveness in terms of having to review a. i n U. v. much less amount of sample and getting a better accuracy of outlier detection. An. Ch. engchi. experiment is designed to justify the proposed DSM. Experiment results show that the performance of proposed DSM is very promising.. Keywords: outlier detection, concept drifting, moving window, neural networks, decision support. 1.

(3) 適用於動態環境中偵測離群值之決策支援機制摘要近來，偵測離群值已成為一個重要且具有挑戰性的研究議題。從給定之觀察值中我們可以推導出一個適配函數(fitting function)，並依照距離此適配函數之距離決定出離群值(outlier)。而此議題在現今的環境中，更為困難：因現今之資料來源多為動態性且不穩定的環境，造成現在的資料具有概念飄移(concept drifting) 之特性。. 政治大飄移的特性之資料偵測出離群值。具體而言，本研究希望在網路安全的領域，透立. 因此本研究提出一個創新的決策支援機制，幫助決策者於動態環境且具概念. ‧ 國. 學. 過推導出的決策支援機制找出潛在的異常或具攻擊的行為。本研究推導出的決策支援機制具有下列特點：. ‧. (1) 使用自適應的單一隱藏層倒傳遞神經網路 (single-hidden layer. y. sit. al. er. io. 之概念；. Nat. feed-forward neural networks, SLFN)來實作出穩健學習(resistant learning). v. n. (2) 透過移動視窗(moving window)機制實現增量學習(incremental learning)之策略；. Ch. engchi. i n U. (3) 兼具效率及效能的決策支援：具備良好的偵測結果，且僅列舉出少量的潛在離群值給決策者。此研究同時具有實驗進行驗證，實驗結果顯示此決策支援機制是非常具有前途的。關鍵字：離群值偵測，概念飄移，移動視窗，神經網路，決策支援. 2.

(4) 謝辭在政大這兩年，讓我的生命有著很大的啟發，生命中很多美好的花朵也一一綻放。因為很多良師益友的陪伴，讓我的生命可以不斷地往著最高遠且美好的高峰奔馳著。論文的完成最先要感謝蔡瑞煌老師，在學習的每一步老師都是很有耐心地帶著我完成。付出了很多時間與心力，就是要讓作為學生的我，能夠最快速的學習。很多時候當我的執著著自己那很不成熟的想法或錯誤的見解時，老師總是很有耐. 政治大動的過程，除了完成這一篇的論文，也教我學習怎麼當一個學生。立. 心地把我的廢土倒光，於是一點一滴的把寶貴的知識交付給我。許許多多令人感. 論文口試期間，感謝莊皓鈞老師、黃彥男老師及黃馨瑩學姊給予許多建議，. ‧ 國. 學. 讓我在撰寫論文中更有信心，也幫助我在實驗設計的過程有著很大的突破。也很. ‧. 感謝李有仁老師、陳春龍老師、陳恭老師、蔡炎龍老師、郁方老師、周彥君老師. y. Nat. 等在專業領域上的教導，使我受益良多，向您們致上萬分謝意。. er. io. sit. 這一路走來，謝謝研究所同行們的陪伴。特別是維正、永樂，你們真是良師、亦是益友，在學習與研究生涯，給予我很多的引導與建言，讓我生命可以多元的. al. n. v i n Ch 提升。也很感謝偉大的班代-小巫與承霖，為了大家付出了自己許多的心力與汗 engchi U 水。還有同門的冠伶，曾經一起奮鬥的齊佑、Hazel，同屬半個 LAB 的彥亨、. 雅筠，很挺的瑞祥，讓我受益良多的柏崴兄等，在學習的路上有你們的陪伴與鼓勵，一起留下很多美好的足跡。也很感謝許多在政大遇到的恩人：書勳學長、慈君學長、曉菁學姊、遠方的 KES、銘棋、智惠、蕙榕、管姊、世慶哥、詩晴助教、雨儒助教，很感謝您們這一路的引導。還有從小到大一路上許多親愛的老師與朋友們：呂德珊老師、康道賢老師、李文郎老師、馬堪泉老師、洪淑玲老師、呂世宏老師、陳育毅老師、贊名、立根、嘉宏、學群、宜娟等，感謝您們生命中的引導與陪伴！ 3.

(5) 此外，非常感謝政大、中興福青社及大專班老師們宛如爸爸媽媽般的付出與照顧：金枝老師、雅芬老師、凱煌老師、阮芬老師、惠娥老師、儷文老師、慧姿老師、素君老師、玟湘老師、清茵老師、思文老師、惠卿老師、慧青老師、建華班長等許多老師們，引導著我生命學習與成長；也很感謝許多同行善友一同走在這條幸福到底的路上：雪如姐、玉婷姐、佳容姐、維騰、志宏、昱升、禹丞、柏凱、國弼、宸宇、彥澄、柏宏、宗平、旭杰、志賢、元翰、善閔、忠奎、曼芸、玠穎、奕廷、其佑等許多同行們，有您們一起學習真的很幸福。最重要的是感謝我的家人，親愛的父親、母親、弟弟、妹妹、阿公、阿嬤，. 政治大陪伴與鼓勵。生命中所有的成長都是有您在背後默默的支撐，也因為有您們，我立. 謝謝您們在生命成長的旅程中無微不至的呵護與關心，無論何時都給我最溫暖的. 才有這美好的學習環境；也因為有您們，才有這一步步珍貴的成長；也因為有您. ‧ 國. 學. 們，才有今天的我！一路上，不知道您在背後留下多少的淚水與汗水，都是為了. ‧. 給我最美好的一切，很感謝您們，希望我可以真實地報答您們養育的深恩。. y. Nat. 最後，至誠地感謝我最重要的常師父、上師及諸佛菩薩，感謝您們長遠來的. er. io. sit. 付出，讓我的生命得以翻轉。也感謝這一路上的師長，及一切對我有恩的如母有情。願所有的眾生與正在參閱本論文的您，找到生命中的老師與方向，願您們生. n. al. 命圓滿！. Ch. engchi. 4. i n U. v. 林哲緯. 敬筆.

(6) INDEX FIGURE INDEX ............................................................................................................ 6 TABLE INDEX.............................................................................................................. 7 CHAPTER 1 INTRODUCTION ................................................................................... 8 1.1 Background and Motivation ............................................................................ 8 1.2 Research Question ......................................................................................... 10 1.3 Research Method ........................................................................................... 10 1.4 Purpose and Contribution .............................................................................. 11 1.5 Content Organization ..................................................................................... 13. 立. 政治大. CHAPTER 2 LITERATURE REVIEW....................................................................... 14. ‧. ‧ 國. 學. 2.1 Concept Drifting ............................................................................................ 14 2.2 Outlier Detection ............................................................................................ 20 2.3 Envelope Module ........................................................................................... 26 2.4 Moving Window ............................................................................................ 30 2.5 Zero-Day Attack............................................................................................. 32. y. Nat. er. io. sit. CHAPTER 3 THE PROPOSED DECISION SUPPORT MECHANISM ................... 34 CHAPTER 4 EXPERIMENT DESIGN AND RESULTS ........................................... 41. n. al. Ch. i n U. v. 4.1 Experiment Design......................................................................................... 41 4.2 Performance Evaluation ................................................................................. 46. engchi. CHAPTER 5 CONCLUSION & FUTURE WORK .................................................... 62 REFERENCE............................................................................................................... 66. 5.

(7) FIGURE INDEX Figure 1: Moving window concept in our research ..................................... 32 Figure 2: The proposed DSM’s moving window ......................................... 37 Figure 3: The 95th simulation set ................................................................. 43 Figure 4: The implementation of moving windows in our experiment ....... 44 Figure 5: The 24th simulation set. ................................................................ 46 Figure 6: The 23rd simulation set. ................................................................ 52 Figure 7: The 37th simulation set. ................................................................ 53. 政治大 simulation set. ................................................................ 54 Figure 9: The 56立 Figure 8: The 45th simulation set. ................................................................ 53 th. ‧ 國. 學. Figure 10: The 49th simulation set. .............................................................. 56 Figure 11: The 97th simulation set................................................................ 56. ‧. Figure 12: The 95th simulation set’s moving windows when M=2 .............. 58. sit. y. Nat. Figure 13: The 95th simulation set’s moving windows when M=13 ............ 59. n. al. er. io. Figure 14: The 95th simulation set’s moving windows when M=16 ............ 60. Ch. engchi. 6. i n U. v.

(8) TABLE INDEX Table 1: Classification of concept drifting method ...................................... 16 Table 2: The outlier detection with the envelope module ............................ 27 Table 3: The Proposed DSM ........................................................................ 34 Table 4: The possible experiment outcome measurement ........................... 47 Table 5: The experiment result of the 95th simulation set ............................ 48 Table 6: The whole experiments’ performance ............................................ 49 Table 7: The detecting zero-day attack’s performance ................................ 61. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 7. i n U. v.

(9) CHAPTER 1 INTRODUCTION. 1.1 Background and Motivation Achieving brain-like intelligence is one of the fundamental goals in computational intelligence. Two remarkable properties of which are the ability to adapt to the non-stationary environment and the ability to learn from noisy and incomplete data incrementally (Sendhoff et al., 2009). The other two aspects of brain-like intelligence mentioned by Bezdek (1994) are high computation speed and. 治政 less error rate like human beings. Nowadays many companies 大 face the challenge of 立 dynamic market environment whose characteristic of data nature is concept drifting ‧ 國. 學. and non-stationary, like the stock market index, the traffic flow in computer. ‧. networking, and the purchase order from customers, etc.. Nat. sit. y. The word “concept drifting” means the concepts are not stable and changing with. n. al. er. io. time (Tsymbal, 2004). That is, as the time passes, the trend embedded in the. i n U. v. observation data usually changes. Tsymbal (2004) mentioned the concept drifting. Ch. engchi. environment makes learning a model from data a complicated task. Masud et al. (2010) also point out data feature not only becomes concept drifting but also potentially infinite, especially in time series data. He (2011) claims that an intelligence model should have the capability to modify its knowledge or concept based on new data distribution. Nevertheless, Bifet et al. (2011) remind us that the model should adapt to concept drift as soon as possible and do not affect by noise.. From the literature review, many scholars have proposed the incremental learning approaches to cope with changing environment (Buschermöhle, Schoenke & Brockmann, 2012). Elwell and Polikar (2011) point out that the incremental learning 8.

(10) technique can not only learn new concept but also keep the existing and still relevant concept in the training model. In addition, incremental learning also tries to drop the unrelated concept away. Thus, Widmer & Kubat (1996) solve the concept drifting problem with incremental learning strategy via moving window which keeps the latest and relevant data in the window. However, dealing with outliers in concept drifting environment makes this work more complicated to achieve.. Tsaih and Cheng (2009, page 162) define outliers as “the observations far away from the fitting function deduced from a subset of the given observations.” The. 政治大. side-effect of the outliers has been discussed for a long time and in many fields. For. 立. instance, Chen and Liu (1993) point out that the side-effect of outliers would diminish. ‧ 國. 學. forecast accuracy in time series data. Tolvi (2002) takes the outliers’ side-effect into consideration while predicting monthly stock market index return via ARMA model.. ‧. The result shows that the data sets without outliers, distinguished by autoregressive,. y. Nat. sit. get better predictions. Fitting the observations with outliers could decrease the. n. al. er. io. effectiveness of the fitting function, because the outliers have a large influence on. i n U. v. model estimation with their high fitting deviances. Thus, Olson and Shi (2007) point. Ch. engchi. out outlier detection is a critical process in data cleansing and the step of data cleansing is very important before modelling the data.. Hodge and Austin (2004) have defined the term “outlier detection” as detecting and removing anomalous instances from data. A variety of outlier detection techniques aim to identify the instances which deviate considerably from the most of data and then purify the data.. Outlier detection have also been discussed in many fields like intrusion detection in network security, fraud detection in financial analysis, and fault detection in 9.

(11) engineering systems. In the network management, it’s necessary to protect the unauthorized individual intrude private network or web server to destroy the information system and steal confidential documents.. 1.2 Research Question Zimek, Campello and Sander (2014) point out that outlier detection in the concept drifting environment where the fitting function form is unknown is really a. 政治大. challenging problem. Some scholars use distance-based method and others develop. 立. their models based on the clustering or density feature. Nevertheless, most of these. ‧ 國. 學. methods do not solve the concept drifting environment problem.. ‧. This study derives a decision support mechanism (DSM) for effectively detecting. sit. y. Nat. outliers in the concept drifting environment. Specifically, the derived DSM is. io. er. designed to help detect the intrusions detection in network security. The DSM will. al. identify the resulted type of all of the instances and then output a small amount of. n. v i n outlier candidates such that the C decision to further double check that the outlier U h e nmaker i h gc candidate is true or not. Due to only outputting a small amount of outlier candidates, it’s expected to achieve time-saving goal for decision maker.. 1.3 Research Method This study first derives an outlier detection DSM based upon the work of Huang et al. (2014). They propose an envelope module adopting not only the deviance information but also the order information to distinguish potential outliers. In brief, 10.

(12) their study has implemented an algorithm to detect anomalous patterns effectively in non-changing environment where there is no assumption about the fitting function form.. This study adopts not only the resistant learning (with the envelope module) (Huang et al., 2014), but also the incremental learning strategy (with the moving window technique). With implementing the incremental learning strategy, we use the moving window technique such that we can integrate the resistant learning with incremental learning for solving the outlier detection problem in the concept drifting. 政治大. environment. The term “the resistant learning and incremental learning” in this. 立. research means a resistant learning algorithm is used to learn training set to learn the. ‧ 國. 學. trend of the data stream, where the training set is changing while time passing.. ‧ y. Nat. er. io. sit. 1.4 Purpose and Contribution. al. This study proposes an outlier detection DSM that helps coping with the. n. v i n C h in a concept drifting intrusion detection in network security e n g c h i U environment. This work places great emphasis on implementing incremental learning strategy since it needs to cope with the concept drifting problem in time series data, like networking flow logs, financial data, etc. Furthermore, this DSM also implements unsupervised learning technique where fitting function and target value is unknown. Here the target value is the resulted type, either non-outlier or outlier. We expect this work become a solid foundation for future researchers or applications. Not only improve the efficiency of adapting to the concept drifting environment but also increase the accuracy of detecting the anomalous patterns. 11.

(13) In the future work, the proposed DSM will be implemented to cope with the real-world application. Like the previous elaboration, we hope this DSM can be adopted in intrusion detection system (IDS). The intrusion detection system is the system with functions of detecting, identifying and responding to unauthorized or abnormal behaviors on an information system (Joo, Hong & Han, 2003). The purpose of the application in IDS is expected to distinguish between instructions and normal activities. Especially, a fatal problem may arise a huge impact, so the IDS with a proper algorithm or DSM is very necessary to the application of network intrusion. 政治大 detect intrusion, the new challenges may come up with the new network behaviors 立. detection. Particularly in this era, while the sophisticated IDS had been developed to. which may be unknown. This problem cause we need to build new approaches to. ‧ 國. 學. cope with the new threats. Also Maggi et al. (2009) point out the web applications. ‧. have encountered the concept drifting problem nowadays. In the concept drifting. sit. y. Nat. environment, the zero-day attack is more critically challengeable problem to IDS. io. er. (Bilge & Dumitras, 2012). The behavior of web application will change frequently and significantly. So there is really a very desirable need for developing a proper. al. n. v i n Csum, method to meet this challenge. In proposed DSM is expected to apply to real h ethis ngchi U. applications as a DSM, and this DSM is expect to help decision maker detect outlier in effective and efficient way.. Furthermore, it can be combined with other tools as an ensemble detector (Zimek, Campello and Sander, 2014). Or work with semi-supervised learning strategy based on the historical decision maker’s determination or pre-defined target value. If it does well, we expect it can apply to more fields as a DSM.. 12.

(14) 1.5 Content Organization In chapter 1, we elaborate the current research problem about providing DSM with outlier detection problem in a concept drifting environment. Chapter 1 also consists of research background, motivation and research problem definition, propose and proposed contribution. In chapter 2, which is the literature review section, we will discuss some related work by others. The investigated fields of this study includes concept drifting, outlier detection, resistant learning, envelope module, moving window technique, and zero-day attack. In chapter 3, we present the proposed DSM in. 政治大. detail. The elaboration also reveals what the decision maker need to operate in. 立. coordination with the proposed DSM. In chapter 4, both the experiment design and. ‧ 國. 學. the experiment result are described. In this section, we also try to evaluate the performance of this DSM. The final chapter, chapter 5, is consist of conclusion, our. ‧. study’s contribution and future research topics.. n. er. io. sit. y. Nat. al. Ch. engchi. 13. i n U. v.

(15) CHAPTER 2 LITERATURE REVIEW. 2.1 Concept Drifting The definition of concept drifting is the concepts are not stable and changing with time (Tsymbal, 2004). Some scholars try to divide this problem into many types. For example, Stanley (2003) classifies concept drifting into two types: One type is sudden concept drifting, and another type is gradual concept drifting. Furthermore, according to the changing rate, gradual concept drifting can divide into moderate and. 治政 slow drifts. Sometimes, scholars may think a novel class 大evolving in the data stream 立 as concept evolution problem (Masud et al., 2010). Each type may have its proper ‧ 國. 學. methods to solve it. In Sum, the challenge of concept drifting or concept evolution is. ‧. that most of the algorithms are difficult to identify the hidden text in the time-evolving trends of data stream (Tsymbal, 2004), occurring while the implicit concepts of the. y. Nat. io. sit. data stream changes through time. In brief, there are four desired properties of a. n. al. er. system to handle with concept drift problem should have (Bifet et al., 2011): (1). Ch. i n U. v. Adapt to concept drift as soon as possible. (2) Do not affect by noise. It means the. engchi. system should distinguish noise from changes. Furthermore, the system should be robust to noise, but be adaptive to changes. (3) Fit the data nature. Some data nature may be reoccurring contexts. So the system have to recognize and react to its data nature. (4) Adapt with limited resources. The system have the limitation, like time constrains, memory constrains. In sum, we have each objective is necessary to fulfill those objectives mentioned before.. In order to build model in concept drifting environment, Gama et al. (2014) think learning under concept drifting environment requires not only updating the predictive 14.

(16) model with new pattern but also forgetting the old information. The reason we try to forget or exclude old information is that some old information may face data expiration problem (Wang et al. 2003). Thus Widmer & Kubat (1996) suggest that we should use incremental learning to deal with this problem, because incremental algorithm only looks at the new observations to modify its current hypothesis. Since incremental learning can not only learn new concept but also keep the existing and still relevant concept, Elwell and Polikar (2011) integrate incremental learning with ensemble classifier system to deal with concept drifting problem. Krawczyk & Woźniak (2014) not only implement incremental learning but also implement. 政治大 forgetting mechanism for building classifier model, dealing with concept drifting 立 problem in data stream environment.. ‧ 國. 學. Because there are multiple solutions used for a wide range of situations and. ‧. problems, Bifet et al. (2011) categorize the methods to handle with concept drifting. y. Nat. sit. problem into four types as shown in following Table 1. This four types’ main goal is. n. al. er. io. selecting the right training data and retrain or adjust the model incrementally. Both. i n U. v. forgetting method and detector method use single classifier, and others use ensemble. Ch. engchi. which maintains some memory. On the other hand, both forgetting method and dynamic ensemble method adapt at every step, and others detect change and make a follow up detection with a trigger. Here we review each type’s meaning and feature.. 15.

(17) Table 1: Classification of concept drifting method. Forgetting method. Meaning. Feature. This method will adapt itself at every step with fixed size window. In other words, this method only retains fixed amount data to build model.. Memory and computational complexity are very crucial to machine learning system. With discarding out-of-date data, we can lower memory and computational complexity. (Krawczyk & Woźniak, 2014) This method give higher weights to new data (Lin, 2013), so this method is appropriate for sudden drift (Bifet et al., 2011).. io. sit. y. ‧. Nat. al. n. This method implements dynamic integration or meta-learning strategies, Contextual and uses the result of method sub-classifier of an ensemble to measure whether to update the model.. Dynamic ensemble method. an appropriate length based on the incoming data with relevance and importance. For instance, this model can straightforwardly shrink training set. However, windowing may fail under the circumstance that slow change lasts longer than the window size. (Gama et al., 2014) Obviously, this method is appropriate for sudden drift (Bifet et al., 2011).. 學. ‧ 國. 立. Detector method. method can adjust the window size to 政 This治大. If the detector indicates there has changed in observation data, this mode will adapt itself with sliding windows of variable size.. Ch. er. Type (Bifet et al., 2011). The method should be able to develop an expectation which is likely relevant to next concept. (Widmer & Kubat, 1996) This method is suitable for dealing with reoccurring concept drifting environment, like four season cycle. (Bifet et al., 2011; Gama et al., 2014). engchi. This method use an adaptive and dynamic ensemble which contains many models and makes. i n U. v. This method features for diversity control mechanism, and implement an internal drift detection to speed up adaptation. (Gama et al., 2014) Owing to adapting in decisions by weighted every step, this method is suitable for dealing gradual drift. (Bifet et al., 2011) voting dynamically.. 16.

(18) With a view to excluding the out-of-date concept, Widmer & Kubat (1996) adopt the forgetting method. They try to solve the incremental learning problem with moving window technique that makes keep only the latest and relevant in the window. Storkey (2009) advices to use transfer learning method, because this method only considers partially related training scenarios. Due to associating the training set with only related data, it’s expected to provide better prediction. Moreover, if the training set contains anomaly or outlier may cause false judgment or even have negatively impact on the accuracy (Castelo-Fernández et al., 2010). So that, outlier detection is. 政治大 the impact of outliers in Section 2.2 and its associated techniques. For this paragraph, 立. an instantly important problem also need to be solve. We will try to place emphasis on. we discuss lots of methods of incremental learning and adaptive learning. Here we try. ‧ 國. 學. to review some related work.. ‧. In financial markets, most of the data stream is not stationary and infinite (Basu. y. Nat. sit. and Meckesheimer, 2007). Some domains name it “time series data”, as the word says. n. al. er. io. a sequence of continuous data consist in the order of time, consider as an example the. i n U. v. stock price in stock market or for a company. In other words, the data stream may. Ch. engchi. match with time stamp (date, hour, minute, etc.).. The feature of the time series data may encounter the fitting function is not clear, even it’s unknown and changeable frequently. But fortunately we can consider the data stream closer in timeline may be more correlated to the newer time-evolving trend (Basu and Meckesheimer, 2007). This concept is similar to the forgetting method in Table 1 mentioned before. Because of the higher similarity to current concept, we can give the higher weight to the data closer in timeline. Due to this feature, Basu and Meckesheimer (2007) used the median from the fixed amount 17.

(19) neighborhood elements and a threshold to judge whether this element is outlier or not.. Another popular method is to use classifier method to identify or detect concept drift. Here we review some classifier methods. Most of ensemble classifiers discuss below are built with multiple methods.. In order to solve concept drifting problem, Wang et al. (2003) think the concept drifting problem owing to data expiration problem which means the model build by training set isn’t consistent to current concepts. That’s way some data are expired. 政治大 classifier, which is based on C4.5, the RIPPER rule learner and the Naïve Bayesian 立. needed to be discarded. Owing to this problem, they proposed use weighted ensemble. method, and implement the classifier to credit card fraud data. Then, the output result. ‧ 國. 學. show their solution can reduce the error rate to approximately 11%.. ‧. Masud et al. (2010) propose a two-phased method to handle the. y. Nat. sit. concept-evolution problem. Their k-NN (k-nearest neighbor) based classifier trains. n. al. er. io. the data stream into n classification models. Then they try to judge the other element. i n U. v. in data stream whether it’s an outlier or not by the weight between the classifiers and. Ch. engchi. the test element. If the element is an outlier, marking as F-outlier, then the element will temporarily append into a buffer. Especially, they claim that they performed a slack place and decision boundary as an adaptive threshold. With this adaptive threshold, the false alarm rate could be lower. The second phase, if the total quantity of the buffer meet a criteria, the classifier invoke a novel class detection procedure using Gini Coefficient to distinguish whether a novel class emerge or not. Regrettably, this way can only handle with concept evolution well, and can’t handle well with concept drifting. Further, this method can’t tell whether this F-outlier is noise or not.. 18.

(20) In continuous research, Masud et al. (2011) add time constraint to wait for more test instances to discover similarities, then they decided whether to perform the correlation function and classify as a novel class. The method to observe the F-outliers similarity in this research has changed to q-neighborhood silhouette coefficient, q-NSC, a unified measure of cohesion and separation. But this method still can only handle with concept evolution problem.. Some scholars build some criteria to update model. Lanquillon and Renz (1999) proposed that we should adapt the model while the prediction output doesn’t meet the. 政治大. statistical quality control. The first one criteria is evaluating an expected error rate.. 立. The second one is observing the classification decision’s fraction whether or not. ‧ 國. 學. below a given threshold. This research also implemented a mechanism as retraining the model regularly no matter if the data stream’s concepts change or not. But this. ‧. method in their research can only fit in not radical changes and only take experiment. y. Nat. sit. with topic detection and tracking which means detect input text whether or not. er. io. correlate to the model trained before.. al. n. v i n As shown before, we haveC many methods to deal h e n g c h i Uwith concept drifting problem. in many kinds problem. But we still need to face a problem that if the training set. contains anomaly or outlier may cause false judgment. So that, outlier detection is an instantly important problem also need to be solve. Tsymbal (2004) argue that some algorithms may overreact to outliers or noise. In some cases, algorithms may erroneously discovery outliers or noise as a concept. In order to solve this problem, many researchers come up with many kinds of techniques as solutions.. 19.

(21) 2.2 Outlier Detection In this study, the definition of outlier is the observations far away from the fitting function deduced from a subset of the given observations, and the fitting function form is adaptive during the learning process (Tsaih and Cheng, 2009, page 162), where the fitting function is adaptive is caused by the resistant learning mechanism.. In addition, there are two classical definitions and we summarizes the feature about the outliers when they defined it. Hawkins (1980) thinks outlier was generated. 政治大 feature of outlier is inconsistent with the remainder of that main set of data. However, 立. by a different mechanism. Then, Barnett and Lewis (1994) consider that the main. the definition of outliers varies from the respective domains. Each definition is. ‧ 國. 學. depending on the particular problem, algorithm or application, and outlier detection is. ‧. also an important challenging task for a wide range of applications including fraud. sit. y. Nat. protection, intrusion detection, target marketing, statics outlier detection, etc. The. io. er. reason why outlier detection become an urgent need we have discussed in Chapter 1. Mostly, the side-effect of outliers may affect our prediction or classification model. In. n. al. i n C hdetection technique. this section, we focus on the outlier engchi U. v. In academic fields, tons of researches use many techniques to their researches in order to classify data as outliers or non-outliers. The followings is related work about outlier detection technique. In each domain has many solutions to overcome this problem. We attempt to survey artificial intelligence technique solutions. Here we try to split into evolutionary algorithms, clustering technique and artificial neural network.. With a view to evolutionary algorithms, for example, Crawford and Wainwright 20.

(22) (1995) and Banerjee (2012) both use genetic algorithm to detect outliers, Crawford and Wainwright combined genetic algorithm with three outlier diagnostics to solve this problem. The result showed the best combination is genetic algorithm and Cook’s squared distance formula (Cook and Weisberg, 1982). But this experiment only performed in small case. Besides, one of the data set perform not very well. Crawford and Wainwright inferred that genetic algorithm with Cook’s squared distance formula didn’t work exactly in some data sets. Banerjee’s (2012) continuous research has combined genetic algorithm with Euclidean distance due to the feature of density-based distance.. 立. 政治大. Srinoy (2007) implemented a supervised two-phased method to cope with. ‧ 國. 學. intrusion detection in networking security. This method use particle swarm optimization to select the feature for support vector machine to classify the intrusion. ‧. from others. The result shows this method has better performance than Fuzzy c-Mean,. y. Nat. sit. pure particle swarm optimization and pure support vector machine. Although this. n. al. er. io. method perform well with labeled data, it still can’t cope with unlabeled data. In other. i n U. words, it may not be used under unsupervised problems.. Ch. engchi. v. There are still several supervised learning method for outlier detection problem, but unsupervised method is still behind (Ferdous and Maeda, 2006). Next section, we try to focus on unsupervised learning method for outlier detection problem.. Many scholars solve outlier detection with clustering technique, the main reason is Barnett and Lewis (1994) mentioned before that the main feature of outlier is inconsistent with the remainder of that set of data. So the data’s distribution may let the majority of normal data closer and the suspicious outlier far away from the majority. Ferdous and Maeda (2006) implement peer group analysis (PGA) to cope 21.

(23) with fraud detection in financial time series data. It’s an unsupervised technique featuring its mechanism as identifying peer groups for all the target object. Furthermore, PGA concentrates more on the local patterns than the global models. They take a series of experiments and also show some results connection to visual evidence.. Yoon, Kwon and Bae (2007) tried to use k-means clustering method to detect outliers in software measurement data. Their approach uses k-means clustering method to categorize the data into k groups where the value k is decided by Cubic. 政治大. Clustering Criterion in Warren’s research (1983). The last process of this approach. 立. will export an outlier candidate report, this report is still need to be reviews by the. ‧ 國. 學. domain expert.. ‧. Lots of researches try to deal outlier detection problems with artificial neural. sit. y. Nat. network (ANN) technique. Sykacek (1997) used a neural network and sigmoid. io. er. activations to cope with outlier detection problems, where neural network is trained by Bayesian interface. Hawkins, Williams and Baxter (2002) use replicator neural. al. n. v i n C h the instance is anUoutlier or not. The RNN which networks (RNN) to measure whether engchi they proposed is a feed-forward multi-layer perception with three hidden layers. between input and output layer. Before training the RNN, Williams and Baxter transfer all columns to quantitative measure. During the RNN is training, the weights of RNN are adjusted to minimize the mean square error.. The survey mentioned above, their neural network don’t deal with the resistant learning problems. The most technique they used is pre-specified and fixed during the training process. It gives rise to that the neural network can only adjust or tuning the. 22.

(24) weights rather than our proposed algorithm can add other hidden node to pick up the trend of data. So their works do not solve the resistant learning problems.. When we mention resistant learning, it’s largely similar to robust learning. Both robust learning and resistant learning are expected to cope with uncertainty situations. The following paragraph we intend to discuss the differences and similarities between robust learning and resistant learning.. With the resistant learning mechanism the fitting function form is adaptive. 政治大 tune the neural network model’s weight and threshold, but this robust procedures do 立 during the learning process. In the artificial neural network field, robust procedures. not change the neural network model’s structure. In other words, the neural network. ‧ 國. 學. model is fixed with the robust procedures. Contrasts to the robust procedures, resistant. ‧. learning can not only tune the network model’s weight and threshold but also modify. sit. y. Nat. the neural network model’s structure. Tsaih and Cheng (2009) summarize that the. io. er. robust procedures are those whose results are not influenced significantly by violations of the model assumptions (such as when the errors are normally distributed),. al. n. v i n C those and the resistant procedures are numerical results are not influenced U h e nwhose i h gc significantly by outlying observations.. Usually, when estimation is mentioned, the response 𝑦𝑦 is modeled as the. function form 𝑓𝑓(𝐱𝐱, 𝐰𝐰) + 𝛿𝛿 , where 𝐰𝐰 is the parameter vector and 𝛿𝛿 is the error. term. And this function 𝑓𝑓 is pre-defined and fixed during the process of deriving. values for its associated 𝐰𝐰 from a set of given observations {(𝐱𝐱1 , 𝑦𝑦 1 ), … , (𝐱𝐱 𝑁𝑁 , 𝑦𝑦 𝑁𝑁 )} ,. with 𝑦𝑦 𝑐𝑐 being the observed response corresponding to the cth observation with. explanatory variables 𝐱𝐱 𝑐𝑐 . The least squares estimator (LSE) is a popular method for performing the estimation.. 23.

(25) � signifies as the estimation value of 𝐰𝐰, then LSE is defined to be the 𝒘𝒘 If 𝑾𝑾 �.. The object is to minimizes ∑Nc=1(ec )2, where. ec = 𝑦𝑦 𝑐𝑐 − 𝑓𝑓(𝐱𝐱 𝒄𝒄 , 𝐰𝐰). (1). It’s obviously that outliers may gain a larger error ec , and it also means this 𝐱𝐱 𝑐𝑐. is far away from the fitting function 𝑓𝑓. And they give a tiny pre-specified error value ε as 10−6 .. In order to deal the outlier detection with resistant learning, Tsaih and Cheng. 政治大. (2009) implement an adaptive SLFN to solve it. The SLFN’s fitting function define. 𝐻𝐻 𝑡𝑡𝑡𝑡𝑡𝑡ℎ �𝑤𝑤𝑖𝑖0. where 𝑡𝑡𝑡𝑡𝑡𝑡ℎ(𝑥𝑥) ≡. 𝑒𝑒 𝑥𝑥 −𝑒𝑒 −𝑥𝑥 𝑒𝑒 𝑥𝑥 +𝑒𝑒 −𝑥𝑥. (2). 𝑚𝑚. + � 𝑤𝑤𝑖𝑖𝑖𝑖𝐻𝐻 𝑥𝑥𝑗𝑗 � 𝑗𝑗=1. y. 𝑗𝑗=1. n. al. � 𝑤𝑤𝑖𝑖𝑜𝑜 𝑖𝑖=1. + � 𝑤𝑤𝑖𝑖𝑖𝑖𝐻𝐻 𝑥𝑥𝑗𝑗 �. sit. ‧ 國. io. +. 𝑝𝑝. 𝑚𝑚. ‧. Nat. 𝑓𝑓(𝐱𝐱) ≡. 𝑤𝑤0𝑜𝑜. 學. 𝑎𝑎𝑖𝑖 (𝐱𝐱) ≡. 𝐻𝐻 𝑡𝑡𝑡𝑡𝑡𝑡ℎ �𝑤𝑤𝑖𝑖0. (3). er. 立. as:. i n U. v. ; 𝑚𝑚 is the number of explanatory variables 𝑥𝑥𝑗𝑗 ’s;. Ch. engchi. 𝐻𝐻 𝐱𝐱 ≡ (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑚𝑚 )𝑇𝑇 ; 𝑝𝑝 is the number of adopted hidden nodes; 𝑤𝑤𝑖𝑖0 is the bias. value of the 𝑖𝑖 th hidden node; the superscript 𝐻𝐻 throughout the paper refers to quantities related to the hidden layer; 𝑤𝑤𝑖𝑖𝑖𝑖𝐻𝐻 is the weight between the 𝑗𝑗th explanatory. variable 𝑥𝑥𝑗𝑗 and the 𝑖𝑖 th hidden node; 𝑤𝑤0𝑜𝑜 is the bias value of the output node; the superscript 𝑜𝑜 throughout the paper refers to quantities related to the output layer; and. 𝑤𝑤𝑖𝑖𝑜𝑜 is the weight between the 𝑖𝑖 th hidden node and the output node. In this study, a. character in bold represents a column vector, a matrix, or a set, and the superscript T 𝐻𝐻 𝐻𝐻 𝐻𝐻 𝐻𝐻 T ) ; indicates the transposition. Furthermore, let 𝐰𝐰𝑖𝑖𝐻𝐻 ≡ (𝑤𝑤𝑖𝑖0 , 𝑤𝑤𝑖𝑖1 , 𝑤𝑤𝑖𝑖2 , … , 𝑤𝑤𝑖𝑖𝑖𝑖. 24.

(26) 𝐰𝐰1𝐻𝐻 𝐻𝐻 T 𝐰𝐰 𝐻𝐻 𝐰𝐰𝑖𝑖𝑜𝑜 ≡ �𝑤𝑤0𝑜𝑜 , 𝑤𝑤1𝑜𝑜 , 𝑤𝑤2𝑜𝑜 , … , 𝑤𝑤𝑝𝑝𝑜𝑜 � ; 𝐰𝐰 𝐻𝐻 ≡ ⎛ 2 ⎞ ; and 𝐰𝐰 ≡ �𝐰𝐰 𝑜𝑜 �. ⋮ 𝐰𝐰 𝐻𝐻 ⎝𝐰𝐰𝑝𝑝 ⎠. Through this SLFN, the input information x is first transformed into ≡ T. �𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑝𝑝 � , and the corresponding value of 𝑓𝑓 is generated by a rather than x.. In other words, given the observation x, all of the corresponding values of hidden 𝐻𝐻 𝐻𝐻 nodes are first calculated with 𝑎𝑎𝑖𝑖 ≡ 𝑡𝑡𝑡𝑡𝑡𝑡ℎ�𝑤𝑤𝑖𝑖0 + ∑𝑚𝑚 𝑗𝑗=1 𝑤𝑤𝑖𝑖𝑖𝑖 𝑥𝑥𝑗𝑗 � for all i and the 𝑝𝑝. corresponding value 𝑓𝑓(𝐱𝐱) is then calculated as 𝑓𝑓(𝐱𝐱) = 𝑔𝑔(𝐚𝐚) ≡ 𝑤𝑤0𝑜𝑜 + ∑𝑖𝑖=1 𝑤𝑤𝑖𝑖𝑜𝑜 𝑎𝑎𝑖𝑖 .. 政治大 Tsaih and Cheng (2009) propose that a resistant learning outlier detection 立. ‧ 國. 學. algorithm with a tiny pre-specified ε value as 10−6 and they deduced a function. form via SLFN which is trained by a given subset. With the resistant learning. ‧. procedure, they allowed the SLFN can adapt the weight dynamically during training.. sit. y. Nat. By the way, they also performed both robustness analysis and deletion diagnostics.. io. er. The ideas of robustness analysis is proposed by Rousseeuw and Van Driessen (2006) features for deriving an (initial) subset of m+1 reference observations to fit the linear. al. n. v i n regression model, ordering the C residuals of all N observations at each stage and then hengchi U. augmenting the reference subset gradually based upon the smallest trimmed sum of squared residuals principle. In deletion diagnostics section, this idea is employed with the diagnostic quantity as the number of pruned hidden nodes when one observation is excluded from the reference pool. That means this SLFN will exclude the potential outlier at early stage prevent the SLFN from learning it.. Above all, the weight-tuning mechanism, the recruiting mechanism, and the reasoning mechanism allow the SLFN to adapt dynamically during the process at the. 25.

(27) same time and explore an acceptable nonlinear relationship between explanatory variables and the response in the presence of outliers.. 2.3 Envelope Module Huang et al. (2014) propose an envelope bulk mechanism integrated with the SLFN to cope with outlier detection problem. This outlier detection algorithm is performed with an envelope bulk whose width is 2𝜀𝜀. The 𝜀𝜀 is changed from a tiny. 政治大. value (10−6) to a non-tiny value (1.96) due to the envelope module. The value. 立. changes to 1.96 similarly according to the 5% significance level in given the. ‧ 國. 學. distribution is normal. The standard to distinguish whether the instance is outlier or. ‧. not is the instance’s residual is greater than ε ∗ γ ∗ σ, where σ is the standard. y. Nat. deviation of the residual of the current reference observations and γ is a constant that. sit. is equal to or greater than 1.0, depending on the user’s stringency in the outlier. n. al. er. io. detection. The smaller the γ value is, the more stringent the outlier detection is.. i n U. v. Furthermore, if our requirements are stricter, we also can modify the ε value to an appropriate value.. Ch. engchi. In brief, this envelope module allow us to wrap the response elements seen as inliers in the envelope. Vice versa, the response as outliers won’t wrap in the envelope. The quantity of the inliers is decided by the ε and γ. The stricter parameter is, the. less inliers inside the envelope. In other aspect of outliers, there will be more potential outliers determined by the envelope module.. 26.

(28) Table 2: The outlier detection with the envelope module (Adapted from Huang et al., 2014). There are N training data. Step 1: Use the first m+1 reference observations in the training data set to set up an acceptable SLFN estimate with one hidden node. Set n = m+2. Step 2: If n > N*(1 – k), STOP. Step 3.1: Use the obtained SLFN to calculate the squared residuals regarding all N training data. Step 3.2: Present the n reference observations (xc, yc) that are the ones with the smallest n squared residuals among the current squared residuals of all N training data. Step 4: If all of the smallest n squared residuals are less than ε (the envelope. 政治大. width), then go to Step 7; otherwise, there is one and only one squared. 立. residual that is larger than ε.. ‧ 國. 學. Step 5: Set 𝐰𝐰 � = 𝐰𝐰.. Step 6: Apply the gradient descent mechanism to adjust weights w of SLFN. Use. ‧. the obtained SLFN to calculate the squared residuals regarding all training data. Then, either one of the following two cases occurs:. sit. y. Nat. (1) If the envelope of obtained SLFN does contain at least n observations, then go to Step 7.. io. n. al. er. (2) If the envelope of obtained SLFN does not contain at least n observations,. i n U. v. then set 𝐰𝐰 = 𝐰𝐰 � and apply the augmenting mechanism to add extra. Ch. engchi. hidden nodes to obtain an acceptable SLFN estimate.. Step 7: Implement the pruning mechanism to delete all of the potentially irrelevant hidden nodes; n + 1  n; go to Step 2.. As stated in Huang et al. (2014), the envelope module in Table 2 executes the following procedure:. In step 1, we try to use the first set up an acceptable SLFN. Then set n = m+2.. In step 2, there is a stopping criteria. In this adapted version, where k can be referred to the percentage of potential outlier. Clearly, at least (1-k) % data will be 27.

(29) wrapped into the envelope. For example, if there is approximately at least 95% non-outliers and at most 5% outliers, the SLFN will take 95% data into consideration while building the SLFN.. In step 3, we try to calculate the squared residuals and determines the input sequence of reference observations in this stage. Furthermore, the input sequence is determine by the residual between the observations with the current SLFN which has already learned n -1 data. The squared residuals will be calculated in every stage, and the input sequence of reference observations will changed according to the squared residuals.. 立. 政治大. The modeling procedure implemented by Step 6 and Step 7 that adjusts the. ‧ 國. 學. number of hidden nodes adopted in the SLFN estimate and the associated w to evolve. ‧. the fitting function f and its envelope to contain at least n observations at the nth stage.. Nat. sit. y. That is, at the nth stage, Step 3 presents the n reference observations that are the. n. al. er. io. observations with the smallest n squared residuals among the current N squared. i n U. v. residuals and are used to evolve the fitting function. Step 3 adopts the concept of. Ch. engchi. forward selection, ordering the residuals of all N observations and then augmenting the reference subset gradually by including extra observations one by one to determine the input sequence of the reference observations. With this, some of the reference observations at the early stages might not stay in the set of reference observations at the later stages, although most of them will.. The modeling procedure implemented by Step 6 to Step 7 requires proper values of w and p so that the obtained envelope contains at least n observations at the end of the nth stage. Specifically, at the beginning of Step 6, the gradient descent mechanism 28.

(30) is applied to adjust the weights w. In the step 6 (2) restores the 𝐰𝐰 � that is stored in. Step 5. Thus, we return to the previous SLFN estimate, and then the augmenting mechanism proposed by Tsaih and Cheng (2009) recruit two extra hidden nodes to obtain an acceptable SLFN estimate. In order to decrease the complexity of the fitting function f, the reasoning mechanism (Windham, 1995) is proposed in Step 7 to delete potentially irrelevant hidden nodes.. The envelope module results in a fitting function with an envelope that includes almost the majority of training data, and the outliers are expected to be included at. 政治大. later stages. In this study, we name the instances in this stage as potential outliers.. 立. Then, we use the deviance information as the extra information to define whether the. ‧ 國. 學. potential outliers is need to be the regarded as outlier candidates. Specifically, here we adopt both the deviance information and the order information to identify the potential. ‧. outliers.. sit. y. Nat. io. er. Regarding the order information for identifying the outliers, we treat the last k% data as potential outliers. Namely, if n ≥ N *(1 - k) AND the residual is greater than ε,. n. al. Ch. then this data is recorded as the outlier candidate.. engchi. i n U. v. The setting of the ε value depends on the user’s perception of the data and its associated outliers. For example, the perceptions are that the error is normally distributed, with a mean of 0 and a variance of 1, and the outliers are the points that have residuals that are greater than 1.96 (when the absolute value is taken). These perceptions are similar to the setting in the regression analysis that corresponds to a 5% significance level. Given that the error terms follow the normal distribution. Then, the user can set the ε value of the proposed envelope module to 1.96 and define the outliers as the points that have residuals that are greater than 1.96. 29.

(31) 2.4 Moving Window Looking back to concept drifting problem, many incremental learning method implement moving window technique. Like Widmer & Kubat (1996) solve the data expiration problem by moving window technique, and restrict querying to a window data in certain period of time. It implements when new data arrives, old data “expires.” This method is exactly suitable for continuous data streams. Note that the data should. 政治大. be inputted sequentially in chronological order.. 立. We also defined term “moving window” in Chapter 1 as choosing a specified. ‧ 國. 學. size n uniformly over a “moving window” of the last elements in a data stream (Babcock, Datar and Motwani, 2002). This former definition in Babcock, Datar and. ‧. Motwani (2002) is classified as sequence-based window. Another type is called. y. Nat. al. er. io. the data stream in the last n units of the time interval.. sit. timestamp-based window which is based on time interval, selecting all elements from. n. v i n C h has been adopted This moving window mechanism e n g c h i U in many fields. For example. (Babu and Widom, 2001), in database management system, which is composed largely of continuous data streams, use continuous query to prevent the database from overloading. In addition, this query method helps the network performance adjustments better.. Another example from Navvab et al. (2012), they used a robust artificial neural network technique with moving window concept to build a dynamic prediction model applied in crude oil fouling behavior prediction. They claimed the implementation of moving window is that the model been updated whenever a new data block is 30.

(32) achieved. This arrangement would help the model catch the slowly change of dynamic trend. They developed a reliable research model and also recommended this model can be applied for monitoring the concept or trend changed problem. But it is still a robust model, not a resistant model.. In sum, most of the moving windows store the most recent data in the first-in-first-out data structure. Gama et al. (2014) think this feature reflect two processes in incremental learning: (1) learning process: update the model based on the latest data. & (2) forgetting process: discard data that is moving out of the window.. 政治大. However, in concept drifting environment, the difficult challenge is to select an. 立. appropriate type of moving window and the proper window size. We can consider two. ‧ 國. 學. situations. (1) Is it suitable to adapt to high-changing rate environment with large window? The answer may be that large window is inappropriate, because the data. ‧. related to current may not last for a long time. (2) Is it proper to stable periods with. y. Nat. sit. short window? The answer we can’t sure, since a too short window worsens the. n. al. er. io. performance of the system. Relatively, a large window may give a better performance. i n U. v. in stable periods, but it reacts slowly to concept changes. So which one is the. Ch. engchi. appropriate method may mostly depend on the data nature and problem’s requirement.. 31.

(33) 政治大 Figure 1:立 Moving window concept in our research. ‧ 國. 學. In Figure 1, we show the conceptual moving window technique adopted in our research. We can clearly understand that some data will be discarded with time. ‧. passing. This mechanism will reflect in discarding the out-of-date data and retain the. y. Nat. sit. up-to-date data. In some applications, like in our research, the testing block may not. n. al. er. io. have union with training data. In our proposed DSM in Table 3, we consider some. i n U. v. outliers may present in training set, we try to exclude them at early stage in each. Ch. engchi. window. When we obtain the SLFN, we try to measure them with the fitting function. Then, we will output an outlier candidate list to decision maker.. 2.5 Zero-Day Attack. Another interesting issue in the computer application is zero-day attack or vulnerability. A zero-day attack is a cyber-attack exploiting a vulnerability that has 32.

(34) not been disclosed publicly (Bilge & Dumitras, 2012). In practical application, there is almost defense against a zero-day attack. Furthermore, the IDS with signature-based scanning method seems to be hard to detect it successfully while the attack still remains unknown openly.. With a view to concept drifting in network security problem, it’s like that the new attack with new tactics over time to invade the existing IDS. Because the existing IDS is ignorant about the new attacks or zero-day attacks, Song, Takakura & Kwon (2008) suggest that applying the unsupervised learning technique to this kind of. 政治大. problem can help us detect previously “unseen” attacks. In Song, Takakura & Kwon’s. 立. (2008) research, they try to summarize the new attacks’ feature and apply the. ‧ 國. 學. unsupervised learning technique to one-class SVM in order to detect the zero-day attack.. ‧. sit. y. Nat. Wrótniak & Woźniak (2013) argue that using the combined classifier is a better. io. er. solution based on rebuilding the IDS taking a long time and cost too much. They build an individual classifier while detecting a new concept or anomaly with sufficient. al. n. v i n number of examples. Then theyC add this new classifier h e n g c h i Uto the combined classifier as a new combined classifier. They think this way is a much faster and cheaper than rebuilding a new classifier, but this method based on an assumption that they need a sufficient amount of the same concept or anomaly data to build. However, the dynamic change of network intrusions really makes the task more complicated to solve.. 33.

(35) CHAPTER 3 THE PROPOSED DECISION SUPPORT MECHANISM. In order to provide decision support for outlier detection problem in concept drifting environment, we implement the incremental learning strategy with resistant learning algorithm via the envelope module to achieve our research goal. Furthermore, in implementation of incremental learning mechanism, we choose the sequence-based moving window in our DSM to help us handling the data expiration problem. With time passing, the older time series data won’t be learned in the envelope module. In. 政治大. contrast, the incoming time series data will be taken into consideration.. 立. Table 3: The Proposed DSM. There are one training block and one testing block in. ‧ 國. 學. each window. M is the index of current window, N is the sample size of training block,. ‧. and B is the sample size of testing block.. Step 1.1: Use the resistant learning algorithm with the envelope module stated in. Nat. sit. y. Table 2 to learn the training block {(x(M-1)*B+1, y(M-1)*B+1), (x(M-1)*B+2,. al. er. io. y(M-1)*B+2), …, (x(M-1)*B+N, y(M-1)*B+N)} to obtain an acceptable SLFN.. n. Step 1.2: Output the outlier candidate.. Ch. i n U. v. Step 2.1: Use the obtained SLFN to detect whether there are outliers within the testing. data. (M-1)*B+N+2. block. engchiy. {(x(M-1)*B+N+1,. (M-1)*B+N+1. ),. (x(M-1)*B+N+2,. y. ), …, (xM*B+N, yM*B+N)}.. Step 2.2: Output the outlier candidate. Step 3:. For more data, M  M+1 and GOTO Step 1; otherwise, stop the DSM.. Note that (1) we have adapted the envelope module’s k as the percentage of potential outlier in this study and (2) we set the first window’s M to 1.. In Step 1.1, we use N training data of the current window to obtain an acceptable SLFN via the envelope module of Huang et al. (2014). This step will let the SLFN to learn a fitting function 𝑓𝑓 wrapped with an envelope and the obtained fitting function, 34.

(36) a non-linear function. Furthermore, we also get the training block’s result, including order information and deviance information. Here, the “order information” is the order of learning sequence by SLFN in the n-1 stage, where n is minimized and n >N*(1 – k). We sure that there will n-1 data will be wrapped into the envelope.. In Step 1.2, the last N*k data, potential outlier, will be examined their deviance information. The “deviance information” is the distance between the fitting function 𝑓𝑓 and explanatory variable individually. Thus we can examine the last N*k data which may not be wrapped into the envelope. If the potential outlier’s deviation is. 政治大. larger than the acceptable threshold, ε, we will output this potential outlier as outlier. 立. candidate to decision maker.. ‧ 國. 學. Briefly, the DSM will examine the potential outlier’s deviation information to. ‧. determine whether output it as outlier candidate or not. If the potential outlier’s. sit. y. Nat. deviation information shows that the deviation is larger than ε, the DSM will output. io. n. al. er. this potential outlier as an outlier candidate to decision maker.. i n U. v. In Step 2.1, we use the SLFN which obtained by envelope module to test the. Ch. engchi. testing block’s instances, where the testing block’s size is B, to determine whether the instances in testing block is the outlier candidate or not? We distinguish the instances based on the distance between the fitting function 𝑓𝑓, obtained by the SLFN, and. explanatory variable individually. In Step 2.2, if the instance’s deviation is larger than the width of envelope, ε, the DSM will output this instance as an outlier candidate to the decision maker. Notice that the source of outlier candidate may come from not only training block but also testing block.. In Step 3, the stopping criteria. If there has more incoming data, this DSM will 35.

(37) slide to next window and go back to Step 1. On the other hand, there is no incoming data, and the DSM will be finished. Here, the time series data is inputted sequentially in chronological order.. In our proposed application, we think visualization approach can intuitively evaluate and easily compare the potential outlier with non-outlier. Under certain circumstances, the DSM try to plot the data with the envelope module together in each window for decision support. Generally speaking, in this case, the data usually can be displayed in numeric type. With the SLFN learned from training block, we can. 政治大. get a fitting function 𝑓𝑓. Then, we use the fitting function 𝑓𝑓 and the given ε to chart. 立. the envelope model, and plot all of the instances in current window on the chart,. ‧ 國. 學. including training block and testing block. In addition, the instance’s color and shape cab be different based on the distinction result by the envelope module. The first. ‧. training block is consisted of N elements, and the second testing block is consisted of. y. Nat. sit. B elements. Clearly, the testing block is just the incoming data whose size is the step. n. al. er. io. size, B. That’s why the training block at the Mth movement is {(x(M-1)*B+1, y(M-1)*B+1),. i n U. v. (x(M-1)*B+2, y(M-1)*B+2), …, (x(M-1)*B+N, y(M-1)*B+N)} and the testing block is {(x(M-1)*B+N+1, (M-1)*B+N+1. y. (M-1)*B+N+2. ), (x. ,y. Ch. (M-1)*B+N+2. engchi. ), …, (xM*B+N, yM*B+N)}. Assume that xi ≠ xj when. i ≠ j. With the above material, the envelope will wrap the non-outliers into the envelope scope. In contrast, if the residual is bigger than the ε, it will be lay out of. envelope. Besides, it will be treated as the outlier candidate no matter which block it belongs. In the end, the DSM will report this outlier candidate to the decision maker.. 36.

(38) Figure 2: The proposed DSM’s moving window. 政治大. Here we use Figure 2 to illustrate the moving window mechanism in our. 立. proposed DSM. In Step 1, we use the first N items as training block in each window. ‧ 國. 學. to obtain a SLFN with envelope module. This envelope will wrap at least first N*(1-k) instances inside the boundary of envelope as non-outliers. The envelope module also. ‧. outputs the order information and the deviance information. Based on the order. y. Nat. sit. information and deviance information, DSM will examine the potential outlier and. n. al. er. io. output the outlier candidate to the decision maker first. In Step 2, the DSM use the. i n U. v. SLFN which learn from the training block via envelope module to distinguish whether. Ch. engchi. or not the instance is outlier candidate in testing block.. With a view to decision support, the main job of this DSM is to provide the outlier candidate list, which identified by this DSM, to the decision maker as early as possible. Then, the decision maker will need to double check it. That means the decision maker merely needs to check the outlier candidates output by this DSM, and omits the other non-outlier identified by the DSM.. In this study, we suppose that if any outliers involve in the time series data may impose a tremendous impact to the system or result. Due to this reason, we try to 37.

(39) develop a properly strict rule so as to achieve the proposed scenario. For example, in network security, if there any suspicious intrusions involve into the system, the consequence may be very irrecusably serious. Nevertheless, the manager may sometimes face the user’s behavior may change with time, and that means the concept drifting environment also make this work more difficult to solve. Considering the fatal outcome by the flagrant attack, it’s certainly important to meet this challenge, distinguishing between suspicious intrusions and normal activities.. The goal in our study is still making effort to provide a decision-making support. 政治大. to the decision maker for outlier detection in concept drifting environment. We. 立. confidently believe that we can lower the amount that the outlier candidate list. ‧ 國. 學. decision maker need to review. In other words, it will improve not only the efficiency but also the effectiveness of detecting the outlier in concept drifting environment.. ‧. sit. y. Nat. With a view to meeting the supposed condition we discussed before, we develop. io. n. al. er. a decision support rule below:. i n U. v. Any instance that has been identified as the outlier candidate will be recognized as the outlier candidate forever.. Ch. engchi. Now, we elaborate more detailed description on this proposed rule. Owing to the concept drifting environment problem, we use the moving window technique to cope with the data expiration problem. Take the effect of moving window into consideration, the sliding size will determine the number of times of the instance involving in the training block may be greater than one. So an instance may be identified as the outlier candidate many times. In our proposed DSM, if the DSM once distinguishes an instance as an outlier candidate, no matter this instance is first time in 38.

(40) the either testing block or training block. This outlier candidate will be output to the decision maker as early as possible. More clearly, DSM output the outlier candidate to decision maker in the earliest founded time, only this time. For example, an instance has been distinguished as an outlier candidate in the testing stage, we will report it to the decision maker in current window. However, in the next window, this instance may still be distinguished as outlier candidate by proposed DSM, but it doesn’t matter. We won’t report this instance again, because decision maker has checked it before. Based on the decision maker’s previous determination, the system may have already. 政治大 an outlier candidate while the instance is in testing block. In next window, this 立. reacted to this outlier. Considering another situation, one instance has been output as. instance may not be distinguished as outlier candidate again. However, we won’t. ‧ 國. 學. output this instance to the decision maker again. The reason is same as before, since. ‧. decision maker has reviewed it before.. y. Nat. sit. Take another situation as an example. The DSM didn’t identify an instance as. n. al. er. io. outlier candidate before. If the DSM identifies an instance as outlier candidate in. i n U. v. current window, the DSM will output it as outlier candidate to decision maker no. Ch. engchi. matter how much times this instance has been regarded as non-outlier before.. In each stage, the decision maker will need to double check merely the outlier candidates and evaluate which one should be real outlier or which one is the just a normal instance. According to the decision maker’s evaluation result, the system may have to do some reactions to the real outlier.. Concisely, this section provides the detailed proposed DSM. There are many parameters like N, B, ε, and k … needed to be set depending on the data nature and the specific requirements regarding the detection 39.

(41) . Actually, the kind of moving window is also can be changed. For example, it can build with time stamp window with flexible length window. Even, if the user adopts the detector method to cope with concept drifting problem, the evaluating model may have to adjust the window size dynamically. So most of the parameters can differ from each demand or data feature for different applications.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 40. i n U. v.

(42) CHAPTER 4 EXPERIMENT DESIGN AND RESULTS. 4.1 Experiment Design This research makes a lot of effort to achieve decision support in dealing with outlier detection problem in concept drifting environment. Thus we take this experiment to justify this proposed DSM and evaluate the result. In order to correspond to the proposed application environment, we try to build a concept drifting environment and the data is consisted of time series data. Then, the proposed DSM. 治政 will apply to 100 simulation sets so as to evaluate 大 the effectiveness of detecting 立 outlier. For each simulation run, we use the geometric Brownian motion models (Tsay, ‧ 國. 學. 2014) stated in (4) to generate a former period of first 150 instances and a later period. ‧. of next 50 instances, respectively. The 200 instances will be in chronological order, so. Nat. er. io. sit. total 100 simulation sets, we have gained total 20000 instances.. y. it can be viewed as time series data. With 200 instances in each simulation set and. n. al. i n U. v. With a view to evaluating the performance of detecting outliers with this. Ch. engchi. proposed DSM in concept drifting environment, we make each simulation data at least contains 2 theoretical outliers while generating the experiment data. One in the first 150 instances set, and the other one is in the next 50 instances set. This 2 theoretical outliers belong different concept separately.. 𝑋𝑋𝑡𝑡+1 = 𝑋𝑋𝑡𝑡 ∗ exp � 0.005 −. 𝑋𝑋𝑡𝑡+1 = 𝑋𝑋𝑡𝑡 ∗ exp �−0.003 −. 0.000012 + 0.00001 ∗ 𝑊𝑊𝑡𝑡 � ,1 ≤ t ≤ 150 2. 0.000022 + 0.00002 ∗ 𝑊𝑊𝑡𝑡 � ,151 ≤ t ≤ 200 2. 𝑌𝑌𝑡𝑡 = 𝑋𝑋𝑡𝑡 + 𝜖𝜖𝑡𝑡 , t ≥ 0 41. (4) (5) (6).

(43) First of all, we have to denote initial value as 𝑆𝑆0 , and we set this value to 5. 𝑊𝑊𝑡𝑡 ,. called Wiener random process, is the random noise generated by N(0,1), which means. three things: (a) the distribution type is normal distribution, (b) the mean of the random noise will be close to 0, (c) last, the standard deviation will be near to 1. As stated in equation (6), we add an error term 𝜖𝜖𝑡𝑡 to 𝑋𝑋𝑡𝑡 to get 𝑌𝑌𝑡𝑡 . 𝜖𝜖𝑡𝑡 has a. normal distribution with the mean 0 and the standard deviation 2. In this simulation. experiment, the reason we set the error term’s standard deviation to 2 is based on the 𝑋𝑋𝑡𝑡 ‘s standard deviation almost near 1.4. In order to prevent the proposed DSM from. 政治大. the similarity between non-outlier and outlier, we set the standard deviation to 2.. 立. Due to the setting of 2.5 standard deviation, we have 2*2.5=5. That means 2.5. ‧ 國. 學. times of the error term’s standard deviation. Based on the error term’s standard. ‧. deviation, we generate following rule for defining the theoretical outlier and. y. Nat. theoretical non-outlier. If 𝜖𝜖𝑡𝑡 < 5, the sample is treated as theoretical non-outliers. If. sit. er. io. 𝜖𝜖𝑡𝑡 ≥ 5, the sample is treated as theoretical outliers. Based on this criteria, we gain. total 281 theoretical outliers within total 20000 instances. Approximately 1.4%. Each. al. n. v i n C houtliers belong different simulation has at least 2 theoretical concept separately. That engchi U. means one in the first 150 instances and the other in the next 50 instances. In order to set up a concept drifting environment, the parameter value is given in difference value in different period. We try to set the former period where 1 ≤ t ≤ 150, and 𝑋𝑋𝑡𝑡 is generated by equation (4); thus the later period where 151 ≤ t ≤ 200, and 𝑋𝑋𝑡𝑡 is generated by equation (5). It means that the concept is changed. after t = 151. The former period’s trend seems to be rising with time. On the other hand, the volatility will become larger in later period. Note that the data we generated using python, version 2.7. 42.

(44) In Figure 3, we take the 95th simulation set as an example. The x-axis is the time, displaying in chronological order, and the y-axis is the Y as the value of each instance. There are 2-colored instances in figure, and the color we display is according to the value, 𝜖𝜖𝑡𝑡 . The blue ones reflect their 𝜖𝜖𝑡𝑡 do not exceed 5, as theoretical non-outliers. In contrast, the red ones obviously can be regarded as theoretical outliers, because the 𝜖𝜖𝑡𝑡 have already equaled or exceeded 5. In this 95th simulation set, we have 4. theoretical outliers; 3 theoretical outliers in former period and 1 theoretical outliers in later period. The figure apparently shows the trend is rising where 1 ≤ t ≤ 150, and. 政治大. the later period’s trend is dropping where 151 ≤ t ≤ 200. It’s obvious that there is a trend change.. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 3: The 95th simulation set. In the Figure 4 , we design our experiment’s sequence-based window slide 5 instances will be given in the sliding window sequentially, that is to say, we set B to 5. 43.