• 沒有找到結果。

CHAPTER 2 LITERATURE REVIEW

2.2 Outlier Detection

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

2.2 Outlier Detection

In this study, the definition of outlier is the observations far away from the fitting function deduced from a subset of the given observations, and the fitting function form is adaptive during the learning process (Tsaih and Cheng, 2009, page 162), where the fitting function is adaptive is caused by the resistant learning mechanism.

In addition, there are two classical definitions and we summarizes the feature about the outliers when they defined it. Hawkins (1980) thinks outlier was generated by a different mechanism. Then, Barnett and Lewis (1994) consider that the main feature of outlier is inconsistent with the remainder of that main set of data. However, the definition of outliers varies from the respective domains. Each definition is depending on the particular problem, algorithm or application, and outlier detection is also an important challenging task for a wide range of applications including fraud protection, intrusion detection, target marketing, statics outlier detection, etc. The reason why outlier detection become an urgent need we have discussed in Chapter 1.

Mostly, the side-effect of outliers may affect our prediction or classification model. In this section, we focus on the outlier detection technique.

In academic fields, tons of researches use many techniques to their researches in order to classify data as outliers or non-outliers. The followings is related work about outlier detection technique. In each domain has many solutions to overcome this problem. We attempt to survey artificial intelligence technique solutions. Here we try to split into evolutionary algorithms, clustering technique and artificial neural network.

With a view to evolutionary algorithms, for example, Crawford and Wainwright

20

(1995) and Banerjee (2012) both use genetic algorithm to detect outliers, Crawford and Wainwright combined genetic algorithm with three outlier diagnostics to solve this problem. The result showed the best combination is genetic algorithm and Cook’s squared distance formula (Cook and Weisberg, 1982). But this experiment only performed in small case. Besides, one of the data set perform not very well. Crawford and Wainwright inferred that genetic algorithm with Cook’s squared distance formula didn’t work exactly in some data sets. Banerjee’s (2012) continuous research has combined genetic algorithm with Euclidean distance due to the feature of density-based distance.

Srinoy (2007) implemented a supervised two-phased method to cope with intrusion detection in networking security. This method use particle swarm optimization to select the feature for support vector machine to classify the intrusion from others. The result shows this method has better performance than Fuzzy c-Mean, pure particle swarm optimization and pure support vector machine. Although this method perform well with labeled data, it still can’t cope with unlabeled data. In other words, it may not be used under unsupervised problems.

There are still several supervised learning method for outlier detection problem, but unsupervised method is still behind (Ferdous and Maeda, 2006). Next section, we try to focus on unsupervised learning method for outlier detection problem.

Many scholars solve outlier detection with clustering technique, the main reason is Barnett and Lewis (1994) mentioned before that the main feature of outlier is inconsistent with the remainder of that set of data. So the data’s distribution may let the majority of normal data closer and the suspicious outlier far away from the majority. Ferdous and Maeda (2006) implement peer group analysis (PGA) to cope

21

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

with fraud detection in financial time series data. It’s an unsupervised technique featuring its mechanism as identifying peer groups for all the target object.

Furthermore, PGA concentrates more on the local patterns than the global models.

They take a series of experiments and also show some results connection to visual evidence.

Yoon, Kwon and Bae (2007) tried to use k-means clustering method to detect outliers in software measurement data. Their approach uses k-means clustering method to categorize the data into k groups where the value k is decided by Cubic Clustering Criterion in Warren’s research (1983). The last process of this approach will export an outlier candidate report, this report is still need to be reviews by the domain expert.

Lots of researches try to deal outlier detection problems with artificial neural network (ANN) technique. Sykacek (1997) used a neural network and sigmoid activations to cope with outlier detection problems, where neural network is trained by Bayesian interface. Hawkins, Williams and Baxter (2002) use replicator neural networks (RNN) to measure whether the instance is an outlier or not. The RNN which they proposed is a feed-forward multi-layer perception with three hidden layers between input and output layer. Before training the RNN, Williams and Baxter transfer all columns to quantitative measure. During the RNN is training, the weights of RNN are adjusted to minimize the mean square error.

The survey mentioned above, their neural network don’t deal with the resistant learning problems. The most technique they used is pre-specified and fixed during the training process. It gives rise to that the neural network can only adjust or tuning the

22

weights rather than our proposed algorithm can add other hidden node to pick up the trend of data. So their works do not solve the resistant learning problems.

When we mention resistant learning, it’s largely similar to robust learning. Both robust learning and resistant learning are expected to cope with uncertainty situations.

The following paragraph we intend to discuss the differences and similarities between robust learning and resistant learning.

With the resistant learning mechanism the fitting function form is adaptive during the learning process. In the artificial neural network field, robust procedures tune the neural network model’s weight and threshold, but this robust procedures do not change the neural network model’s structure. In other words, the neural network model is fixed with the robust procedures. Contrasts to the robust procedures, resistant learning can not only tune the network model’s weight and threshold but also modify the neural network model’s structure. Tsaih and Cheng (2009) summarize that the robust procedures are those whose results are not influenced significantly by violations of the model assumptions (such as when the errors are normally distributed), and the resistant procedures are those whose numerical results are not influenced significantly by outlying observations.

Usually, when estimation is mentioned, the response 𝑦𝑦 is modeled as the function form 𝑓𝑓(𝐱𝐱, 𝐰𝐰) + 𝛿𝛿 , where 𝐰𝐰 is the parameter vector and 𝛿𝛿 is the error term. And this function 𝑓𝑓 is pre-defined and fixed during the process of deriving values for its associated 𝐰𝐰 from a set of given observations {(𝐱𝐱1, 𝑦𝑦1), … , (𝐱𝐱𝑁𝑁, 𝑦𝑦𝑁𝑁)} , with 𝑦𝑦𝑐𝑐 being the observed response corresponding to the cth observation with explanatory variables 𝐱𝐱𝑐𝑐. The least squares estimator (LSE) is a popular method for performing the estimation.

23

It’s obviously that outliers may gain a larger error ec, and it also means this 𝐱𝐱𝑐𝑐 is far away from the fitting function 𝑓𝑓. And they give a tiny pre-specified error value ε as 10−6.

In order to deal the outlier detection with resistant learning, Tsaih and Cheng (2009) implement an adaptive SLFN to solve it. The SLFN’s fitting function define as: superscript 𝑜𝑜 throughout the paper refers to quantities related to the output layer; and 𝑤𝑤𝑖𝑖𝑜𝑜 is the weight between the 𝑖𝑖th hidden node and the output node. In this study, a character in bold represents a column vector, a matrix, or a set, and the superscript T indicates the transposition. Furthermore, let 𝐰𝐰𝑖𝑖𝐻𝐻 ≡ (𝑤𝑤𝑖𝑖0𝐻𝐻, 𝑤𝑤𝑖𝑖1𝐻𝐻, 𝑤𝑤𝑖𝑖2𝐻𝐻, … , 𝑤𝑤𝑖𝑖𝑚𝑚𝐻𝐻 )T ;

24

Through this SLFN, the input information x is first transformed into ≡ �𝑎𝑎1, 𝑎𝑎2, … , 𝑎𝑎𝑝𝑝T , and the corresponding value of 𝑓𝑓 is generated by a rather than x.

In other words, given the observation x, all of the corresponding values of hidden nodes are first calculated with 𝑎𝑎𝑖𝑖 ≡ 𝑡𝑡𝑎𝑎𝑡𝑡ℎ�𝑤𝑤𝑖𝑖0𝐻𝐻+ ∑𝑚𝑚𝑖𝑖=1𝑤𝑤𝑖𝑖𝑖𝑖𝐻𝐻𝑥𝑥𝑖𝑖� for all i and the corresponding value 𝑓𝑓(𝐱𝐱) is then calculated as 𝑓𝑓(𝐱𝐱) = 𝑔𝑔(𝐚𝐚) ≡ 𝑤𝑤0𝑜𝑜+ ∑𝑝𝑝 𝑤𝑤𝑖𝑖𝑜𝑜𝑎𝑎𝑖𝑖

𝑖𝑖=1 .

Tsaih and Cheng (2009) propose that a resistant learning outlier detection algorithm with a tiny pre-specified ε value as 10−6 and they deduced a function form via SLFN which is trained by a given subset. With the resistant learning procedure, they allowed the SLFN can adapt the weight dynamically during training.

By the way, they also performed both robustness analysis and deletion diagnostics.

The ideas of robustness analysis is proposed by Rousseeuw and Van Driessen (2006) features for deriving an (initial) subset of m+1 reference observations to fit the linear regression model, ordering the residuals of all N observations at each stage and then augmenting the reference subset gradually based upon the smallest trimmed sum of squared residuals principle. In deletion diagnostics section, this idea is employed with the diagnostic quantity as the number of pruned hidden nodes when one observation is excluded from the reference pool. That means this SLFN will exclude the potential outlier at early stage prevent the SLFN from learning it.

Above all, the weight-tuning mechanism, the recruiting mechanism, and the reasoning mechanism allow the SLFN to adapt dynamically during the process at the

25

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

same time and explore an acceptable nonlinear relationship between explanatory variables and the response in the presence of outliers.

相關文件