Efficient and Accurate Anomal y Identification Using Reduce d Metric Space in Cloud Compu ting Systems
Qiang Guan, Ziming Zhang and Song Fu University of North Texas
Introduction
Anomaly detection is a vital eleme nt of operations in large scale da tacenter.
◦Detecting patterns in a given data se t that do not conform to an establish ed normal behavior.
Challenges
Continuous monitoring and large sy stem scale lead to the overwhelmin g volume of data collected by heal th monitoring tool.
The large number of metrics that a re measured make the data model ex tremely complex.
◦High metric dimensionality will cause low detection accuracy and high compu tational complexity.
This paper
Presents a metric selection framew ork for online anomaly detection i n utility cloud.
◦Select most essential metrics by appl ying metric selection and extraction methods.
◦Identify anomalies using an increment al clustering approach.
◦Implement a prototype and evaluate th e performance.
Dimensionality Reduction
Transforms the collected health-re lated performance data to a new me tric space with only the most impo rtant metrics preserved.
In this paper:
◦Metric selection using mutual informa tion.
◦Metric extraction by metric space com bination and separation.
Metric Selection
Select the best subset of the orig inal metric set based on mutual in formation.
◦The mutual information of two random variables is a quantity that measures the mutual dependence of the two rand om variables.
Metric Selection(Cont.)
S m
i
i
c m S I
relevance S
relevance 1 ( ; )
), ( max
S m m
j i
j i
m m S I
redundancy S
redundancy
,
)
; 1 (
), (
min 2
) ( )
( ),
(
maxdependency S dependency relevance S redundancy S
However, finding the optimal metric subset id NP-hard.
=
>
Incremental Search Method
Given Sk-1, try to select the kth m etric that maximizes dependency() from the remaining metrics in (M-S
k-1).
→S1 ⊂ S2 ⊂ ... S⊂ n
1 1
)
; 1 (
) 1
; ( max
k k j
i m S
j i
S i M
m I m m
c k m I
Incremental Search Method(Cont.)
Sn*
◦Find the range of i, where the cross- validation error erri has small mean and small variance.
◦err* = Min(erri)
◦n* equals to the smallest i, for whic h Si has err*.
Metric Extraction
Creates new metrics by transformat ion or combination of the original metrics.
Two methods:
◦Metric space combination
◦Metric space separation
Metric Space Combination
Dataset D = [x1, x2, …, xL]
Record xi = [x1,i, x2,i, …, xn,i] T
Covariance matrix of D: V=DDT
Calculate the eigenvalues {λi} of V and sort them in descending orde r.
Choose n’ by:
) 1 , 0 ( ,
1 '
1
n i
i n i
i
Metric Space Combination(Cont.)
The corresponding n’ eigenvectors are the new metrics.
Apply Gram-Schmidt orthogonalizat ion process to compute eigenvector s {ej}.
Metric Space Separation
Separate desired data from mixed d ata.
Record x = [x1, x2, …, xL] T
Component e =[e1, e2, …, en’] T
x = Ae → e = Wx
Find an optimal transformation mat rix W so that {ej} are maximally i ndependent.
Metric Space Separation(Co nt.)
Independent component analysis (IC A)
◦A computational method for separating a multivariate signal into additive s ubcomponents.
◦A special case of blind source separa tion.
Incremental Clustering
Data points are considered one at a time, and assigned to existing g roups without affecting the existi ng group significantly.
◦“A data point goes into the nearest group if the Euclidean distance betwe en this point and the centroid of the group smaller than δ; else create a new group.”
◦Update centroid after new point comes in.
◦Adjust δ if cloud operators find fal se-negative.
Normal but assigned to anomaly.
Experiment Setting
362 servers.
Each server hosts up to ten VMs.
Benchmarks:
◦RUBiS distributed online service benc hmark
◦MapReduce jobs
Fault injection
◦CPU, memory, disk, and network faults .
Experiment Setting(Cont.)
Monitoring tools
◦sysstat: runtime performance data in Do m0
◦Modified perf: performance counters fro m hypervisor.
Total 518 metrics.
◦182 + 336
◦However, only 406 non-constant.
Monitor every minute from 2011/01/20 to 2011/08/11.
Metric Selection Result
406→14
◦Metric space reduced by 96.6%
Metric Extraction Results
Metric extraction and metric selec tion v.s. Metric extraction only.
Detection Precision
Conclusion
Anomaly detection is important.
◦self-managing cloud resources and enh ancing system dependability.
They present a metric selection fr amework with metric selection and extraction mechanisms.
The selected and extracted metric set contributes to highly efficien t and accurate anomaly detection.