• 沒有找到結果。

Incremental maintenance of ontology-exploiting association rules

N/A
N/A
Protected

Academic year: 2021

Share "Incremental maintenance of ontology-exploiting association rules"

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

Incremental Maintenance of

Ontology-Exploiting Association Rules

Ming-Cheng Tseng

1

, Wen-Yang Lin

2

and Rong Jeng

3

1, 3

Institute of Information Engineering, I-Shou University, Taiwan

2

Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan

(2)

Outline

Introduction

Problem description

The proposed algorithm

Performance evaluation

Conclusions

(3)

Introduction

Motivation

In general, there exist lots of semantic relationships

(domain knowledge) among items

It is natural to incorporate domain ontology into the

process of data mining to explore more innovative rules

The source databases are changing over time

E.g., insertion, deletion, modification

The discovered knowledge (rules) has to be updated to

(4)

Introduction (cont.)

Association rules

Given:

A database of customer transactions

Each transaction is a set of items

Find all rules X  Y that correlate the presence of

one set of items X with another set of items Y

Example:

(5)

Introduction (cont.)

Strong association rules

Given:

User’s specified constraints

Minimum support (min_sup)

minimum confidence (min_conf)

Finding rules X  Y with support and confidence larger than t

he user’s specified minimum values

Example:

min_sup = 25%, min_conf = 50%

(6)

Introduction (cont.)

Frequent itemsets (patterns) mining

The association mining problem can be reduced to the pr

oblem of mining frequent itemsets, i.e., itemsets with supp

ort larger than min_sup

Example

min_sup = 25%, min_conf = 50%

Sony VAIO  HP LaserJet 1300 (Sup. 30%, Conf.  60%)

sup({Sony VAIO, HP LaserJet 1300}) = 30%

(7)

Introduction (cont.)

Ontology

W3C Web Ontology Working Group

“An ontology formally defines a common set of terms

that are used to describe and represent a domain

knowledge.”

e.g., taxonomy: a kind of ontology presenting class

ification relationship among objects

Tomato

Vegetable

Carrot

Kale

Non-root

Vegetable

Pickle

Apple

Fruit

Papaya

(8)

Introduction (cont.)

Ontology-exploiting association rules

---Memory Hard Disk Notebook Desktop PC PC ---RAM 256MB S 60GB IBM 60GB RAM 512MB Sony VAIO Gateway GE IBM TP Printer HP DeskJet Epson EPL ---Ink Cartridge Photo Conductor Toner Cartridge ---Composition Classification

IBM 60GB HD => HP DeskJet

(9)

Problem Description

Incremental maintenance of ontology-exploiting associatio

n rules

Given:

A database of customer transactions DB

An incremental database db

An item ontology T

Discovered frequent itemsets in DB, L

minimum support, ms, and minimum confidence, mc

Find all frequent itemsets in UD = DB + db w.r.t. ms

Construct all strong rules from the frequent itemsets w.r.t. m

(10)

Problem Description (cont.)

-- Example

TID

Purchased Items

1

IBM TP, Epson EPL, Toner Cartridge

2

Sony VAIO, IBM TP, Epson EPL

3

IBM TP, HP DeskJet, Ink Cartridge

4

HP DeskJet

5

IBM TP, HP DeskJet, Ink Cartridge

6

Sony VAIO, Ink Cartridge

Composition Classification

Photo

Conductor

Toner

Cartridge

HP

DeskJet

Printer

Epson

EPL

-Ink

Cartridge

- -

-RAM

256MB

IBM

60GB

Sony

VAIO

PC

IBM

TP

S

60GB

-Customer transactions DB

L

1

Count

L

2

& L

3

Count

{Printer}

{PC}

{IBM TP}

{RAM 256MB*}

{IBM 60GB*}

5

5

4

5

4

{Printer, PC}

{Printer, IBM TP}

{Printer, RAM 256MB*}

{Printer, IBM 60GB*}

{RAM 256MB*, IBM 60GB*}

{Printer, RAM 256MB*, IBM 60GB*}

4

4

4

4

4

4

Discovered frequent itemsets L

Item ontology G

(11)

Problem Description (cont.)

Example

TID

Purchased Items

1

IBM TP, Epson EPL, Toner Cartridge

2

Sony VAIO, IBM TP, Epson EPL

3

IBM TP, HP DeskJet, Ink Cartridge

4

HP DeskJet

5

IBM TP, HP DeskJet, Ink Cartridge

6

Sony VAIO, Ink Cartridge

Composition Classification

Photo

Conductor

Toner

Cartridge

HP

DeskJet

Printer

Epson

EPL

-Ink

Cartridge

- -

-RAM

256MB

IBM

60GB

Sony

VAIO

PC

IBM

TP

S

60GB

-TID

Items Purchased

7

Toner Cartridge

8

IBM TP, HP DeskJet, IBM 60GB, Toner

Cartridge

9

IBM 60GB, Toner Cartridge

Customer transactions DB

Incremental transactions db

Item ontology G

minsup = 70%

Updated frequent itemsets L’

(12)

Basic scheme

An Apriori-based maintenance algorithm

Employing a bottom-up, level-wise searching strategy

Starting from frequent 1-itemset, L

1

, then L

2

, …, L

k

, etc.

A

B

C

D

ABC ABD

ACD

BCD

ABCD

AB AC

AD

BC

BD CD

(13)

Notation

Definition

DB

Original database

db

Incremental database

UD

Updated database UD  DB + db

T

Item ontology

ED

Extension of DB with extended items in T

ed

Extension of db with extended items in T

UE

Updated extended database UE ED + ed

The Proposed Algorithm – IMARO (cont.)

(14)

Example

(15)

Note on database extension

A component item may exist as a primitive item itself

To clarify the meaning of associations involving such an

item, we have to differentiate the role this item play

e.g.,

IBM TP => Ink Cartridge

buy an IBM TP notebook, also buy an Ink Cartridge

buy an IBM TP notebook, also buy an product composed of Ink

Cartridge

The Proposed Algorithm – IMARO (cont.)

TID

Purchased Items

5

IBM TP, HP DeskJet, Ink Cartridge

TID

Primitive Items

Extended Items

5

IBM TP, HP DeskJet,

Ink Cartridge*

PC, RAM 256MB, IB

M 60GB, Printer, Ink

(16)

The Proposed Algorithm – IMARO (cont.)

Process flow for updating frequent k-itemsets

(17)

Frequent/infrequent itemsets inference

The Proposed Algorithm – IMARO (cont.)

Conditions

Results

L

ED

L

ed

UE

Action

Case

freq.

no

1

undetd.

compare sup

UD

(A) with ms

2

undetd.

scan DB

3

(18)

The Proposed Algorithm – IMARO (cont.)

Optimization 1: Candidate pruning

Any candidate itemset that contains both an item and anyo

ne of its extensions (generalized item or component) is pru

ned.

Photo

Conductor

Toner

Cartridge

HP

DeskJet

Printer

Epson

EPL

-Ink

Cartridge

- -

-RAM

256MB

IBM

60GB

Sony

VAIO

PC

IBM

TP

S

60GB

-{Epson EPL, Printer}

(19)

The Proposed Algorithm – IMARO (cont.)

The extension of an item

can be added only if that i

tem does appear in at lea

st one candidate itemset

being counted currently

Photo Conductor Toner Cartridge HP DeskJet Printer Epson EPL -Ink Cartridge - - -RAM 256MB IBM 60GB Sony VAIO PC IBM TP S 60GB

(20)

Performance Evaluation

Compared with applying our proposed algorithms, AROC and AROS, to the whole database DB+db with T

Test data

A synthetic dataset generated by the IBM data generator with artificially–built ontology

Parameter

Default value

|DB|

Number of original transactions

200,000

|t|

Average size of transactions

20

N

Number of items

362

R

Number of groups

30

L

Number of levels

4

(21)

Performance Evaluation (cont.)

Varying minimum supports

10

100

1000

1

1.5

2

2.5

3

3.5

ms %

R

un

t

im

e

(s

ec

.)

AROC

AROS

IMARO

log

(22)

Performance Evaluation (cont.)

Varying incremental transaction size

0

50

100

150

200

250

300

2

4

6

8

10

12

14

16

18

20

Number of incremental transctions (x 10,000)

R

un

t

im

e

(s

ec

.)

(23)

Conclusions

We have investigated the problem of updating ontology-e

xploiting association rules when new transactions are ins

erted into the database

An Apriori-based algorithm is proposed

Other issues

More complicated semantic relationships and knowledge

Non-uniform minimum support

Generalized item or composite item occurs more frequently

Towards a total solution for evolving environments

Ontology evolution, database update

Interactive refinement of support constraints

(24)

Thanks for

Thanks for

your

your

attention!

attention!

(25)

Conclusions (cont.)

Taxonomy of semantic relationships

(26)

Related Work

Comparison with previous work

Contributors

Model of incremental maintenance of association rules

Type of database update

Type of ontology

Srikant & Agrawal, 1995

none

classification

Han & Fu, 1995

none

classification

Cheung et al., 1996

insertion

classification

Cheung et al., 1997

insertion, deletion and

modification

none

Jea et al., 2003

none

composition

參考文獻

相關文件

maintenance and repair works should be carried out by school and her maintenance agent(s) to rectify defect(s) as identified in routine and regular inspections. Examples of works

Data larger than memory but smaller than disk Design algorithms so that disk access is less frequent An example (Yu et al., 2010): a decomposition method to load a block at a time

Discovering the City by Mining Diverse and Multimodal Data Streams – IBM Grand Challenge: New York City 360. §  Exploring and Integrating Multiple Contents and Sources for

Internal hard disks, external and removable hard disks, solid state drives, memory cards, USB flash drives,. Discovering Computers 2010: Living in a Digital World

 Evaluated deadline and cost perfor mance of various scheduling polici es under a large range of SLA cost function and

[r]

頁碼編排步驟 (4) 點選 格式 後,出現以下畫面:. 接著選擇頁碼要呈現

Mason,”Global Business Drivers:Alinging Information Technology to Global Business Strategy”, P.146 IBM Systems Journal 32(1993). Langenwalter; Enterprise Resource Planning and