• 沒有找到結果。

Efficient and Accurate Anomal y Identification Using Reduce d Metric Space in Cloud Compu ting Systems

N/A
N/A
Protected

Academic year: 2022

Share "Efficient and Accurate Anomal y Identification Using Reduce d Metric Space in Cloud Compu ting Systems"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

Efficient and Accurate Anomal y Identification Using Reduce d Metric Space in Cloud Compu ting Systems

Qiang Guan, Ziming Zhang and Song Fu University of North Texas

(2)

Introduction

Anomaly detection is a vital eleme nt of operations in large scale da tacenter.

Detecting patterns in a given data se t that do not conform to an establish ed normal behavior.

(3)

Challenges

Continuous monitoring and large sy stem scale lead to the overwhelmin g volume of data collected by heal th monitoring tool.

The large number of metrics that a re measured make the data model ex tremely complex.

High metric dimensionality will cause low detection accuracy and high compu tational complexity.

(4)

This paper

Presents a metric selection framew ork for online anomaly detection i n utility cloud.

Select most essential metrics by appl ying metric selection and extraction methods.

Identify anomalies using an increment al clustering approach.

Implement a prototype and evaluate th e performance.

(5)

Dimensionality Reduction

Transforms the collected health-re lated performance data to a new me tric space with only the most impo rtant metrics preserved.

In this paper:

Metric selection using mutual informa tion.

Metric extraction by metric space com bination and separation.

(6)

Metric Selection

Select the best subset of the orig inal metric set based on mutual in formation.

The mutual information of two random variables is a quantity that measures the mutual dependence of the two rand om variables.

(7)

Metric Selection(Cont.)

S m

i

i

c m S I

relevance S

relevance 1 ( ; )

), ( max

S m m

j i

j i

m m S I

redundancy S

redundancy

,

)

; 1 (

), (

min 2

) ( )

( ),

(

maxdependency S dependency relevance S redundancy S

However, finding the optimal metric subset id NP-hard.

=

>

(8)

Incremental Search Method

Given Sk-1, try to select the kth m etric that maximizes dependency() from the remaining metrics in (M-S

k-1).

→S1 ⊂ S2 ⊂ ... S⊂ n





1 1

)

; 1 (

) 1

; ( max

k k j

i m S

j i

S i M

m I m m

c k m I

(9)

Incremental Search Method(Cont.)

Sn*

Find the range of i, where the cross- validation error erri has small mean and small variance.

err* = Min(erri)

n* equals to the smallest i, for whic h Si has err*.

(10)

Metric Extraction

Creates new metrics by transformat ion or combination of the original metrics.

Two methods:

Metric space combination

Metric space separation

(11)

Metric Space Combination

Dataset D = [x1, x2, …, xL]

Record xi = [x1,i, x2,i, …, xn,i] T

Covariance matrix of D: V=DDT

Calculate the eigenvalues {λi} of V and sort them in descending orde r.

Choose n’ by:

) 1 , 0 ( ,

1 '

1

n i

i n i

i

(12)

Metric Space Combination(Cont.)

The corresponding n’ eigenvectors are the new metrics.

Apply Gram-Schmidt orthogonalizat ion process to compute eigenvector s {ej}.

(13)

Metric Space Separation

Separate desired data from mixed d ata.

Record x = [x1, x2, …, xL] T

Component e =[e1, e2, …, en’] T

x = Ae → e = Wx

Find an optimal transformation mat rix W so that {ej} are maximally i ndependent.

(14)

Metric Space Separation(Co nt.)

Independent component analysis (IC A)

A computational method for separating a multivariate signal into additive s ubcomponents.

A special case of blind source separa tion.

(15)

Incremental Clustering

Data points are considered one at a time, and assigned to existing g roups without affecting the existi ng group significantly.

“A data point goes into the nearest group if the Euclidean distance betwe en this point and the centroid of the group smaller than δ; else create a new group.”

Update centroid after new point comes in.

Adjust δ if cloud operators find fal se-negative.

Normal but assigned to anomaly.

(16)

Experiment Setting

362 servers.

Each server hosts up to ten VMs.

Benchmarks:

RUBiS distributed online service benc hmark

MapReduce jobs

Fault injection

CPU, memory, disk, and network faults .

(17)

Experiment Setting(Cont.)

Monitoring tools

sysstat: runtime performance data in Do m0

Modified perf: performance counters fro m hypervisor.

Total 518 metrics.

182 + 336

However, only 406 non-constant.

Monitor every minute from 2011/01/20 to 2011/08/11.

(18)

Metric Selection Result

406→14

Metric space reduced by 96.6%

(19)

Metric Extraction Results

Metric extraction and metric selec tion v.s. Metric extraction only.

(20)

Detection Precision

(21)

Conclusion

Anomaly detection is important.

self-managing cloud resources and enh ancing system dependability.

They present a metric selection fr amework with metric selection and extraction mechanisms.

The selected and extracted metric set contributes to highly efficien t and accurate anomaly detection.

(22)
(23)
(24)

參考文獻

相關文件

Chang, Sym- metric Mendelsohn triple systems and large sets of disjoint Mendelsohn triple systems, in Combinatorial de- signs and applications (Lecture Notes in Pure and

In this paper, we evaluate whether adaptive penalty selection procedure proposed in Shen and Ye (2002) leads to a consistent model selector or just reduce the overfitting of

(4) A principal selection committee shall select in an open, fair and transparent manner a suitable person for recommendation under section 57 from candidates nominated in an open,

(4) A principal selection committee shall select in an open, fair and transparent manner a suitable person for recommendation under section 57 from candidates nominated in an open,

Proposition 3.2.21 以及 Proposition 3.2.22, metric space 的 compact subset closed bounded.. least upper bound 以及 greatest lower

This paper presents (i) a review of item selection algorithms from Robbins–Monro to Fred Lord; (ii) the establishment of a large sample foundation for Fred Lord’s maximum

We compare the results of analytical and numerical studies of lattice 2D quantum gravity, where the internal quantum metric is described by random (dynamical)

The MTMH problem is divided into three subproblems which are separately solved in the following three stages: (1) find a minimum set of tag SNPs based on pairwise perfect LD