CHAPTER 2 LITERATURE REVIEW
2.1 Concept Drifting
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
CHAPTER 2 LITERATURE REVIEW
2.1 Concept Drifting
The definition of concept drifting is the concepts are not stable and changing with time (Tsymbal, 2004). Some scholars try to divide this problem into many types.
For example, Stanley (2003) classifies concept drifting into two types: One type is sudden concept drifting, and another type is gradual concept drifting. Furthermore, according to the changing rate, gradual concept drifting can divide into moderate and slow drifts. Sometimes, scholars may think a novel class evolving in the data stream as concept evolution problem (Masud et al., 2010). Each type may have its proper methods to solve it. In Sum, the challenge of concept drifting or concept evolution is that most of the algorithms are difficult to identify the hidden text in the time-evolving trends of data stream (Tsymbal, 2004), occurring while the implicit concepts of the data stream changes through time. In brief, there are four desired properties of a system to handle with concept drift problem should have (Bifet et al., 2011): (1) Adapt to concept drift as soon as possible. (2) Do not affect by noise. It means the system should distinguish noise from changes. Furthermore, the system should be robust to noise, but be adaptive to changes. (3) Fit the data nature. Some data nature may be reoccurring contexts. So the system have to recognize and react to its data nature. (4) Adapt with limited resources. The system have the limitation, like time constrains, memory constrains. In sum, we have each objective is necessary to fulfill those objectives mentioned before.
In order to build model in concept drifting environment, Gama et al. (2014) think learning under concept drifting environment requires not only updating the predictive
14
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
model with new pattern but also forgetting the old information. The reason we try to forget or exclude old information is that some old information may face data expiration problem (Wang et al. 2003). Thus Widmer & Kubat (1996) suggest that we should use incremental learning to deal with this problem, because incremental algorithm only looks at the new observations to modify its current hypothesis. Since incremental learning can not only learn new concept but also keep the existing and still relevant concept, Elwell and Polikar (2011) integrate incremental learning with ensemble classifier system to deal with concept drifting problem. Krawczyk &
Woźniak (2014) not only implement incremental learning but also implement forgetting mechanism for building classifier model, dealing with concept drifting problem in data stream environment.
Because there are multiple solutions used for a wide range of situations and problems, Bifet et al. (2011) categorize the methods to handle with concept drifting problem into four types as shown in following Table 1. This four types’ main goal is selecting the right training data and retrain or adjust the model incrementally. Both forgetting method and detector method use single classifier, and others use ensemble which maintains some memory. On the other hand, both forgetting method and dynamic ensemble method adapt at every step, and others detect change and make a follow up detection with a trigger. Here we review each type’s meaning and feature.
15
‧
Table 1: Classification of concept drifting method Type itself at every step with fixed size window. In other words, this method only retains fixed amount data to build model.
Memory and computational complexity are very crucial to machine learning system. With discarding out-of-date data, we can lower memory and computational complexity. (Krawczyk & Woźniak, 2014) This method give higher weights to new data (Lin, 2013), so this method is appropriate for sudden drift (Bifet et al., 2011).
Detector method
If the detector indicates there has changed in observation data, this mode will adapt itself with sliding windows of variable size.
This method can adjust the window size to an appropriate length based on the incoming data with relevance and importance. For instance, this model can straightforwardly shrink training set.
However, windowing may fail under the circumstance that slow change lasts longer than the window size. (Gama et al., 2014) Obviously, this method is appropriate for sudden drift (Bifet et al., 2011).
Contextual method
This method implements dynamic integration or meta-learning strategies, and uses the result of sub-classifier of an ensemble to measure whether to update the model.
The method should be able to develop an expectation which is likely relevant to next concept. (Widmer & Kubat, 1996) This method is suitable for dealing with reoccurring concept drifting environment, like four season cycle. (Bifet et al., 2011;
Gama et al., 2014)
Dynamic ensemble method
This method use an adaptive and dynamic ensemble which contains many models and makes decisions by weighted voting dynamically.
This method features for diversity control mechanism, and implement an internal drift detection to speed up adaptation.
(Gama et al., 2014) Owing to adapting in every step, this method is suitable for dealing gradual drift. (Bifet et al., 2011)
16
‧
With a view to excluding the out-of-date concept, Widmer & Kubat (1996) adopt the forgetting method. They try to solve the incremental learning problem with moving window technique that makes keep only the latest and relevant in the window.
Storkey (2009) advices to use transfer learning method, because this method only considers partially related training scenarios. Due to associating the training set with only related data, it’s expected to provide better prediction. Moreover, if the training set contains anomaly or outlier may cause false judgment or even have negatively impact on the accuracy (Castelo-Fernández et al., 2010). So that, outlier detection is an instantly important problem also need to be solve. We will try to place emphasis on the impact of outliers in Section 2.2 and its associated techniques. For this paragraph, we discuss lots of methods of incremental learning and adaptive learning. Here we try to review some related work.
In financial markets, most of the data stream is not stationary and infinite (Basu and Meckesheimer, 2007). Some domains name it “time series data”, as the word says a sequence of continuous data consist in the order of time, consider as an example the stock price in stock market or for a company. In other words, the data stream may match with time stamp (date, hour, minute, etc.).
The feature of the time series data may encounter the fitting function is not clear, even it’s unknown and changeable frequently. But fortunately we can consider the data stream closer in timeline may be more correlated to the newer time-evolving trend (Basu and Meckesheimer, 2007). This concept is similar to the forgetting method in Table 1 mentioned before. Because of the higher similarity to current concept, we can give the higher weight to the data closer in timeline. Due to this feature, Basu and Meckesheimer (2007) used the median from the fixed amount
17
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
neighborhood elements and a threshold to judge whether this element is outlier or not.
Another popular method is to use classifier method to identify or detect concept drift. Here we review some classifier methods. Most of ensemble classifiers discuss below are built with multiple methods.
In order to solve concept drifting problem, Wang et al. (2003) think the concept drifting problem owing to data expiration problem which means the model build by training set isn’t consistent to current concepts. That’s way some data are expired needed to be discarded. Owing to this problem, they proposed use weighted ensemble classifier, which is based on C4.5, the RIPPER rule learner and the Naïve Bayesian method, and implement the classifier to credit card fraud data. Then, the output result show their solution can reduce the error rate to approximately 11%.
Masud et al. (2010) propose a two-phased method to handle the concept-evolution problem. Their k-NN (k-nearest neighbor) based classifier trains the data stream into n classification models. Then they try to judge the other element in data stream whether it’s an outlier or not by the weight between the classifiers and the test element. If the element is an outlier, marking as F-outlier, then the element will temporarily append into a buffer. Especially, they claim that they performed a slack place and decision boundary as an adaptive threshold. With this adaptive threshold, the false alarm rate could be lower. The second phase, if the total quantity of the buffer meet a criteria, the classifier invoke a novel class detection procedure using Gini Coefficient to distinguish whether a novel class emerge or not. Regrettably, this way can only handle with concept evolution well, and can’t handle well with concept drifting. Further, this method can’t tell whether this F-outlier is noise or not.
18
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
In continuous research, Masud et al. (2011) add time constraint to wait for more test instances to discover similarities, then they decided whether to perform the correlation function and classify as a novel class. The method to observe the F-outliers similarity in this research has changed to q-neighborhood silhouette coefficient, q-NSC, a unified measure of cohesion and separation. But this method still can only handle with concept evolution problem.
Some scholars build some criteria to update model. Lanquillon and Renz (1999) proposed that we should adapt the model while the prediction output doesn’t meet the statistical quality control. The first one criteria is evaluating an expected error rate.
The second one is observing the classification decision’s fraction whether or not below a given threshold. This research also implemented a mechanism as retraining the model regularly no matter if the data stream’s concepts change or not. But this method in their research can only fit in not radical changes and only take experiment with topic detection and tracking which means detect input text whether or not correlate to the model trained before.
As shown before, we have many methods to deal with concept drifting problem in many kinds problem. But we still need to face a problem that if the training set contains anomaly or outlier may cause false judgment. So that, outlier detection is an instantly important problem also need to be solve. Tsymbal (2004) argue that some algorithms may overreact to outliers or noise. In some cases, algorithms may erroneously discovery outliers or noise as a concept. In order to solve this problem, many researchers come up with many kinds of techniques as solutions.
19
‧
國立 政 治 大 學