• 沒有找到結果。

A Scalable Complex Event Analytical System with Incremental Episode Mining over Data Streams

N/A
N/A
Protected

Academic year: 2021

Share "A Scalable Complex Event Analytical System with Incremental Episode Mining over Data Streams"

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

A S

CALABLE

C

OMPLEX

E

VENT

A

NALYTICAL

S

YSTEM

with

I

NCREMENTAL

E

PISODE

M

INING

over

D

ATA

S

TREAMS

This study is conducted under the “Complex Event Processing Decision-Making Analysis Platform - Smart Sootblower System of Fired Power Plant” of the Institute for Information Industry, Taiwan.

JER R Y C.C. TSE NG, JIA-YUAN GU, VINC E NT S. TSE NG*

PF WANG, CHING-YU CHE N, CHU-FENG LI

NAT IO NAL CHE NG-KUNG UNIVE R SIT Y, TAIWAN

(2)

O

UTLINE

u Introduction u Related Work

u The Proposed System u Experimental Evaluation

u Conclusions & Future Works

(3)

I

NTRODUCTION

(4)

B

ACKGROUND OF

T

HE

W

ORK

u Tried to get more valuable information for finding

the best timing to trigger the sootblower

u A sootblower is a device for removing the soot that

is deposited on the furnace tubes of a boiler during combustion.

u Problems caused by soot n Reduced efficiency

n Soot fires

2016/10/7 National Cheng-Kung University, Taiwan 3

(5)

T

HE

G

OALS

(R

EQUIREMENTS

)

u Develop an integrated system for analyzing complex

event data streams.

u Utilize Lambda Architecture to take both factors of

latency and accuracy into consideration at the same time.

u Make the system scalable for the massive data

streams using Apache Spark and Apache Spark Streaming.

u Validate the system performance in terms of both

efficiency and accuracy with a real dataset

(6)

R

ELATED

W

ORK

(7)

E

PISODE

P

ATTERNS

2016/10/7 National Cheng-Kung University, Taiwan 6

H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering Frequent Episodes in Sequences,” Data Mining and Knowledge Discovery, Vol. 1, Issue 3, pp. 259-289, 1997.

(8)

CEAS (C

OMPLEX

E

VENT

A

NALYTICAL

S

YSTEM

)

2016/10/7 National Cheng-Kung University, Taiwan 7

CEAS (Complex Event Analytical System) Complex Event Sequences Data Preprocessing Data Collecting Data Cleaning Data Transforming Pattern Mining Sliding Window Episode Mining Pattern Update Rule Management Rule Detection Rule Pool Rule Evaluation Prediction Application Platform JSON Jerry C.C. Tseng, Jia-Yuan Gu, and Vincent S. Tseng, “A Novel Complex-Events Analytical System Using Episode Pattern Mining Techniques,” IScIDE 2015, Suzhou, China, June 14-16, 2015.

(9)

T

HE

L

AMBDA

A

RCHITECTURE

2016/10/7 National Cheng-Kung University, Taiwan 8

All Data Precompute Views

View 1 View 2 View N

View 1 View 2 View N

Stream

Processing Increment Views New Data Stream Query Incremental computing Batch update Real-time views Batch views BATCH LAYER SERVING LAYER SPEED LAYER Merge http://lambda-architecture.net/ http://en.wikipedia.org/wiki/Lambda_architecture

(10)

T

HE

P

ROPOSED

S

YSTEM

(11)

SICEM

(S

CALABLE

I

NCREMENTAL

C

OMPLEX

E

VENT

M

INING

)

2016/10/7 National Cheng-Kung University, Taiwan 10

Complex Event Data Streams Batch Layer Batch updating Batch Episode Mining Merge Layer Pattern Merging Rule Generation Candidate Patterns Δ1 Δ2 Δn Speed Layer Delta Episode Mining

Speed Updating Rule

Pool Application Platform JSON Preprocessing Layer Table Splitting Event Clustering Table Merging Dimension Reduction SAX Transformation Sequencing Segmentation

(12)

M

ODULES IN

P

REPROCESSING

L

AYER

u Event Clustering u Table Merging u Table Splitting u Dimension Reducing u SAX Transformation u Sequencing u Segmentation

(13)

M

ODULES IN

B

ATCH

L

AYER

u Batch Updating

u Batch Episode Mining

(14)

M

ODULES IN

S

PEED

L

AYER

u Speed Updating

u Delta Episode Mining

(15)

M

ODULES IN

M

ERGE

L

AYER

u Pattern Merging

u Rule Gengeration

2016/10/7 National Cheng-Kung University, Taiwan 14

{ "RULES": [ { "LHS": ["(A)", "(B)"], "RHS": ["(C)"], "sup": 0.50, "conf": 0.80 }, { "LHS": ["(B,C)", "(A,B,E)", "(D)"], "RHS": ["(A,D,E)"], "sup": 0.33, "conf": 0.75 }, ] } Sample rules in JSON format

(16)

A

LGORITHMS

u The algorithms of our proposed system comprises

three parts

n BatchEpisodeMining, the patterns of the whole data will

be found.

n DeltaEpisodeMining will find the delta patterns from the

new coming data in the speed layer.

n Once the DeltaEpisodeMining is done, then the

Pattern-merging will matches the patterns found in

BatchEpisodeMining and DeltaEpisodeMining.

(17)

DeltaEpisodeMining Pseudo Code

(18)

E

XPERIMENTAL

E

VALUATION

(19)

S

TREAMING

D

ATASETS

u The dataset consists of two parts

n sensor data, collected by a fire power plant during three months

n event data, including about 53 thousands of event records of sootblower operation

(20)

E

XPERIMENTAL

E

NVIRONMENT

Roles HW Spec.

Qty’s CPU Memory

master 1 @3.5GHz, 1600 MHz Intel i7-2600 32GB slave 6 @2.7GHz, 1600 MHz Intel Celeron G1620 8GB

2016/10/7 National Cheng-Kung University, Taiwan 19

• Linux Ubuntu 12.04.5 LTS

• Cloudera CDH 5.3.8 with Cloudera Manager 5.3 • Apache Spark and Apache Spark Streaming 1.2 • Java Programming

(21)

E

XPERIMENTAL

R

ESULTS

(1/2)

(22)

E

XPERIMENTAL

R

ESULTS

(2/2)

min_sup 0.8 0.75 0.7 0.65 0.6 Baseline 3820 5655 9968 16408 27457 Proposed 3820 5638 9952 16307 27253 Match Rate 100% 99.70% 99.84% 99.38% 99.26%

(23)

C

ONCLUSIONS

&

F

UTURE

W

ORKS

(24)

R

EVIEW OF

O

UR

W

ORKS

u We proposed an analytical system named SICEM for

complex event incremental mining with episode mining over data streams.

u We developed a series of modules and three algorithms

within the four major components, namely

n preprocessing,batch, speed and merge layer

u The experimental results showed that the proposed

system took both the efficiency and accuracy into account.

u We took use of Apache Spark and Apache Spark

Streaming as the development framework to make it

scalable for the massive complex event data streams.

(25)

F

UTURE

W

ORKS

u We plan to develop a module to have better control

for the process flow of the complex event sequences of data streams.

u We plan to design an interface for users to make

use of this system more easily.

u We will try to build a cluster of more nodes for

experiments.

(26)

Q & A

u More discussions are welcome via email.

n [email protected]

u If I can’t give you a good answer now, we may have

more discussions via email as well.

2016/10/7 National Cheng-Kung University, Taiwan 25

This study is conducted under the “Complex Event Processing Decision-Making Analysis Platform - Smart Sootblower System of Fired Power Plant” of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.

參考文獻

相關文件