A S
CALABLE
C
OMPLEX
E
VENT
A
NALYTICAL
S
YSTEM
with
I
NCREMENTAL
E
PISODE
M
INING
over
D
ATA
S
TREAMS
This study is conducted under the “Complex Event Processing Decision-Making Analysis Platform - Smart Sootblower System of Fired Power Plant” of the Institute for Information Industry, Taiwan.
JER R Y C.C. TSE NG, JIA-YUAN GU, VINC E NT S. TSE NG*
PF WANG, CHING-YU CHE N, CHU-FENG LI
NAT IO NAL CHE NG-KUNG UNIVE R SIT Y, TAIWAN
O
UTLINE
u Introduction u Related Work
u The Proposed System u Experimental Evaluation
u Conclusions & Future Works
I
NTRODUCTION
B
ACKGROUND OF
T
HE
W
ORK
u Tried to get more valuable information for finding
the best timing to trigger the sootblower
u A sootblower is a device for removing the soot that
is deposited on the furnace tubes of a boiler during combustion.
u Problems caused by soot n Reduced efficiency
n Soot fires
2016/10/7 National Cheng-Kung University, Taiwan 3
T
HE
G
OALS
(R
EQUIREMENTS
)
u Develop an integrated system for analyzing complex
event data streams.
u Utilize Lambda Architecture to take both factors of
latency and accuracy into consideration at the same time.
u Make the system scalable for the massive data
streams using Apache Spark and Apache Spark Streaming.
u Validate the system performance in terms of both
efficiency and accuracy with a real dataset
R
ELATED
W
ORK
E
PISODE
P
ATTERNS
2016/10/7 National Cheng-Kung University, Taiwan 6
H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering Frequent Episodes in Sequences,” Data Mining and Knowledge Discovery, Vol. 1, Issue 3, pp. 259-289, 1997.
CEAS (C
OMPLEX
E
VENT
A
NALYTICAL
S
YSTEM
)
2016/10/7 National Cheng-Kung University, Taiwan 7
CEAS (Complex Event Analytical System) Complex Event Sequences Data Preprocessing Data Collecting Data Cleaning Data Transforming Pattern Mining Sliding Window Episode Mining Pattern Update Rule Management Rule Detection Rule Pool Rule Evaluation Prediction Application Platform JSON Jerry C.C. Tseng, Jia-Yuan Gu, and Vincent S. Tseng, “A Novel Complex-Events Analytical System Using Episode Pattern Mining Techniques,” IScIDE 2015, Suzhou, China, June 14-16, 2015.
T
HE
L
AMBDA
A
RCHITECTURE
2016/10/7 National Cheng-Kung University, Taiwan 8
All Data Precompute Views
View 1 View 2 View N
View 1 View 2 View N
Stream
Processing Increment Views New Data Stream Query Incremental computing Batch update Real-time views Batch views BATCH LAYER SERVING LAYER SPEED LAYER Merge http://lambda-architecture.net/ http://en.wikipedia.org/wiki/Lambda_architecture
T
HE
P
ROPOSED
S
YSTEM
SICEM
(S
CALABLE
I
NCREMENTAL
C
OMPLEX
E
VENT
M
INING
)
2016/10/7 National Cheng-Kung University, Taiwan 10
Complex Event Data Streams Batch Layer Batch updating Batch Episode Mining Merge Layer Pattern Merging Rule Generation Candidate Patterns Δ1 Δ2 Δn Speed Layer Delta Episode Mining
Speed Updating Rule
Pool Application Platform JSON Preprocessing Layer Table Splitting Event Clustering Table Merging Dimension Reduction SAX Transformation Sequencing Segmentation
M
ODULES IN
P
REPROCESSING
L
AYER
u Event Clustering u Table Merging u Table Splitting u Dimension Reducing u SAX Transformation u Sequencing u SegmentationM
ODULES IN
B
ATCH
L
AYER
u Batch Updatingu Batch Episode Mining
M
ODULES IN
S
PEED
L
AYER
u Speed Updatingu Delta Episode Mining
M
ODULES IN
M
ERGE
L
AYER
u Pattern Mergingu Rule Gengeration
2016/10/7 National Cheng-Kung University, Taiwan 14
{ "RULES": [ { "LHS": ["(A)", "(B)"], "RHS": ["(C)"], "sup": 0.50, "conf": 0.80 }, { "LHS": ["(B,C)", "(A,B,E)", "(D)"], "RHS": ["(A,D,E)"], "sup": 0.33, "conf": 0.75 }, … ] } Sample rules in JSON format
A
LGORITHMS
u The algorithms of our proposed system comprises
three parts
n BatchEpisodeMining, the patterns of the whole data will
be found.
n DeltaEpisodeMining will find the delta patterns from the
new coming data in the speed layer.
n Once the DeltaEpisodeMining is done, then the
Pattern-merging will matches the patterns found in
BatchEpisodeMining and DeltaEpisodeMining.
DeltaEpisodeMining Pseudo Code
E
XPERIMENTAL
E
VALUATION
S
TREAMING
D
ATASETS
u The dataset consists of two parts
n sensor data, collected by a fire power plant during three months
n event data, including about 53 thousands of event records of sootblower operation
E
XPERIMENTAL
E
NVIRONMENT
Roles HW Spec.
Qty’s CPU Memory
master 1 @3.5GHz, 1600 MHz Intel i7-2600 32GB slave 6 @2.7GHz, 1600 MHz Intel Celeron G1620 8GB
2016/10/7 National Cheng-Kung University, Taiwan 19
• Linux Ubuntu 12.04.5 LTS
• Cloudera CDH 5.3.8 with Cloudera Manager 5.3 • Apache Spark and Apache Spark Streaming 1.2 • Java Programming
E
XPERIMENTAL
R
ESULTS
(1/2)
E
XPERIMENTAL
R
ESULTS
(2/2)
min_sup 0.8 0.75 0.7 0.65 0.6 Baseline 3820 5655 9968 16408 27457 Proposed 3820 5638 9952 16307 27253 Match Rate 100% 99.70% 99.84% 99.38% 99.26%C
ONCLUSIONS
&
F
UTURE
W
ORKS
R
EVIEW OF
O
UR
W
ORKS
u We proposed an analytical system named SICEM for
complex event incremental mining with episode mining over data streams.
u We developed a series of modules and three algorithms
within the four major components, namely
n preprocessing,batch, speed and merge layer
u The experimental results showed that the proposed
system took both the efficiency and accuracy into account.
u We took use of Apache Spark and Apache Spark
Streaming as the development framework to make it
scalable for the massive complex event data streams.
F
UTURE
W
ORKS
u We plan to develop a module to have better control
for the process flow of the complex event sequences of data streams.
u We plan to design an interface for users to make
use of this system more easily.
u We will try to build a cluster of more nodes for
experiments.
Q & A
u More discussions are welcome via email.
u If I can’t give you a good answer now, we may have
more discussions via email as well.
2016/10/7 National Cheng-Kung University, Taiwan 25
This study is conducted under the “Complex Event Processing Decision-Making Analysis Platform - Smart Sootblower System of Fired Power Plant” of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.