• 沒有找到結果。

RecentStudiesInPPDM

N/A
N/A
Protected

Academic year: 2021

Share "RecentStudiesInPPDM"

Copied!
82
0
0

加載中.... (立即查看全文)

全文

(1)

Recent Studies in Privacy-Preserving

Data Mining

1

Leon S.L. Wang

Department of Information Management National University of Kaohsiung

(2)

Outline

Data Mining – a quick glance

Privacy-Preserving Data Mining (PPDM)

 Objective, common practices, possible attacks

Current Research Areas

 K-Anonymity, Utility, Distributed Privacy, Association Rule

Hiding

Recent Studies

(3)

Data Mining

1

Market basket analysis (

Association Rules

)

“if a customer purchases diapers, then he will very likely

purchase beer

Sequences (Sequential Patterns)

“A customer who bought a iPod three months ago is likely to

order a iPhone within one month”

(4)

Training Data

N A M E R A N K Y E A R S T E N U R E D

M ike A ssistan t P ro f 3 n o M ary A ssistan t P ro f 7 yes

Classification Algorithms Classifier (Model) 

Classification

Data Mining

2

(5)

N A M E R A N K Y E A R S T E N U R E D

T o m A ssistan t P ro f 2 n o M erlisa A sso ciate P ro f 7 n o G eo rg e P ro fesso r 5 yes Jo sep h A ssistan t P ro f 7 yes

5

Data Mining

3

Classifier

Testing

Data Unseen Data (Jeff, Professor, 4)

Tenured?

(6)

Data Mining

4

Clustering

• Unsupervised learning: Finds “natural” grouping of instances given unlabeled data

(7)

7

Data Mining

5

Data mining:

 Extraction of interesting (non-trivial, implicit, previously

unknown and potentially useful) information or patterns from data in large databases

Alternative names:

 Knowledge discovery in databases (KDD), knowledge

extraction, data/pattern analysis, data archeology, data

(8)

Privacy Preserving Data Mining

1

Motivating example – Group Insurance

Commission: they found MA governor’s

medical record

(9)

Privacy Preserving Data Mining

2

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

Hospital Patient Data (Name, ID are hidden) Vote Registration Data (public info)

 Andre has heart disease!

9

Motivating examples – Group Insurance

(10)

Privacy Preserving Data Mining

3

Motivating examples – A Face Is Exposed

for AOL Searcher No. 4417749

Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it was not much of a shield. – New York Times, August 9, 2006.

Thelma Arnold's

(11)

Privacy Preserving Data Mining

4

11

Motivating examples – American On Line

~650k users, 3 months period, ~20 million queries released

No name, no SSN, no driver license #, no credit card #

The user, ID 4417749, was found to be Thelma Arnold, a 62 year old woman living in Georgia.

Lost of privacy to users, damage to AOL, significant damage to academics who depend on such data.

(12)

Privacy Preserving Data Mining

5

Motivating examples – Netflix Prize

 In October of 2006, Netflix announced the $1-million Netflix Prize for improving their movie recommendation system.

 Netflix publicly released a dataset containing 100

million movie ratings of 18,000 movies, created by 500, 000 Netflix subscribers over a period of 6 years.

(13)

13

Privacy Preserving Data Mining

6

Motivating examples – Association Rules

Supplier

ABC Paper Company

Retailer

XYZ Supermarket Chain

1. Allow ABC to access customer XYZ’s DB

2. Predict XYZ’s inventory needs & Offer reduced prices

(14)

Privacy Preserving Data Mining

7

Supplier ABC discovers (thru data mining):

 Sequence: Cold remedies -> Facial tissue

Association: (Skim milk, Green paper)

Supplier ABC runs coupon marketing

campaign:

 “50 cents off skim milk when you buy ABC

products”

(15)

15

Privacy Preserving Data Mining

1

- Objective

Privacy

 The state of being private; the state of not being seen by others

Database security

 To prevent loss of privacy due to

viewing/disclosing unauthorized data

PPDM

(16)

Privacy Preserving Data Mining

2

- Common Practices

Limiting access

 Control access to the data

 Used by secure DBMS community

“Fuzz” the data

 Forcing aggregation into daily records instead of individual transactions or slightly altering data values

(17)

17

Privacy Preserving Data Mining

3

- Common Practices

Eliminate unnecessary groupings

 The first 3 digits of SSNs are assigned by office sequentially

 Clustering high-order bits of a “unique identifier” is

likely to group similar data elements

 Unique identifiers are assigned randomly

Augment the data

 Populate the phone book with extra, fictitious people in non-obvious ways

 Return correct info when asking an individual, but return incorrect info when asking all individuals in a department

(18)

Privacy Preserving Data Mining

4

- Common Practices

Audit

 Detect misuse by legitimate users

 Administrative or criminal disciplinary action may be initiated

(19)

19

Privacy Preserving Data Mining

5

- Possible Attacks

Linking attacks (Sweeney IJUFKS „02)

 Re-identification

 Identity linkage (K-anonymity)

 Attribute linkage (l-diversity)

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

Hospital Patient Data (Name, ID are hidden) Vote Registration Data (public info)

(20)

Privacy Preserving Data Mining

6

- Possible Attacks

Corruption attacks(Tao ICDE08, Chaytor ICDM09)

 Background knowledge; Perturbed generalization (PG)

(21)

21

Privacy Preserving Data Mining

7

- Possible Attacks

Differential privacy (Dwork ICALP ‟06)

 Add noise to the data set so that the difference

between any query output to 30 records and to any 29 records will be very small (within a differential).

(22)

Privacy Preserving Data Mining

8

- Possible Attacks

Realistic adversaries (Machanavajjhala VLDB ‟09)

 Weak privacy:

 l-diversity, t-closeness

 Adversaries need to know very specific information  Strong privacy:

 Differential privacy

 Adversaries need to know all information except victum  Epislon-privacy

 Adversary‟s knowledge can vary and learn, and is

(23)

23

Privacy Preserving Data Mining

9

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Degree attack: knowing Bob has with 4 friends =>

(24)

Privacy Preserving Data Mining

10

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Sub-graph attack (one match), knowing Bob’s friends and friend‟s friends => Vertex 7 is Bob

(25)

25

Privacy Preserving Data Mining

11

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Sub-graph attack (k-match), knowing Bob’s

friends => 8 matches, but share common labels 6 & 7 => still uniquely identify vertex 7 is Bob

(26)

Privacy Preserving Data Mining

12

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Hub fingerprint attacks

 Some hubs have been identified, adversary knows the distance between Bob and hubs

(27)

27

Privacy Preserving Data Mining

13

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

 Label attack

Interaction graph

(28)

Privacy Preserving Data Mining

14

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

(29)

29

Privacy Preserving Data Mining

15

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

(30)

Privacy Preserving Data Mining

16

- Possible Attacks

Active attacks (Backstrom WWW „07)

 Planted a subgraph H into G and connect to targeted nodes (add new nodes and edges)

 Recover H from G & identify targeted nodes‟

identity and relationships

 Walk-based (largest H), cut-based (smallest H)

(31)

31

Privacy Preserving Data Mining

17

- Possible Attacks

Passive attacks (no nodes, no edges added)

 Start from a coalition of friends (nodes) in

anonymized graph G, discover the existence of edges among users to whom they are linked to

Semi-passive attacks (add edges only, no nodes)

From existing nodes in G, add fake edges to targeted nodes

(32)

Privacy Preserving Data Mining

18

- Possible Attacks

Intersection attacks (Puttaswamy CoNEXT‟09)

 Two users were compromised, A and B,

A queries server for the visitor of “website xyz”,

B queries server for the visitor of “website xyz”,

(33)

33

Privacy Preserving Data Mining

19

- Possible Attacks

Intersection attacks

 StarClique (add latent edges)

 The graph evolution process for a node. The node first selects a

subset of its neighbors. Then it builds a clique with the members of this subset. Finally, it connects the clique members with all the non-clique members in the neighborhood. Latent or virtual edges are added in the process.

(34)

Privacy Preserving Data Mining

20

- Possible Attacks

Relationship attacks (Liu SDM‟09)

 Sensitive edge weights, e.g. transaction expenses in business network,

 Reveal the shortest path between source and sink, e.g., A -> D,

(35)

35

Privacy Preserving Data Mining

21

- Possible Attacks

Relationship attacks (Liu SDM‟09)

 Preserving the shortest path, e.g. A -> D,

 Min perturbation on path length, path weight,

(36)

Privacy Preserving Data Mining

22

- Possible Attacks

Relationship attacks (Liu ICIS‟10)

 Preserving the shortest path, e.g. A -> D,

K-anonymous weight privacy

 the blue edge group and the green edge group satisfy the 4-anonymous privacy where  =10.

(37)

37

Privacy Preserving Data Mining

23

- Possible Attacks

Relationship attacks (Das ICDE‟10)

 Preserving linear property, e.g., shortest paths,

 The ordering of the five edge weights are preserved after naïve anonymization.

x5 x1 x4 x3 x2, where x1=(v1, v2), x2=(v1, v4), x3=(v1, v3), x4=(v2, v4), x5=(v3, v4),

The minimum cost spanning tree is preserved. {(v1, v2), (v2, v4), (v1, v3)}

The shortest path from v1 to v4 is changed.

 The ordering of the edge weight is still exposed. For example, v3 and v4 are best friends and v1 and v4 are not so good friends.

x1 x

4

x3

x2 x

(38)

Privacy Preserving Data Mining

24

- Possible Attacks

Location Privacy (Papadopoulos VLDB‟10)

 Preserving the privacy of the location of user in

(39)

39

Privacy Preserving Data Mining

25

- Possible Attacks

Location Privacy (Papadopoulos VLDB‟10)

 Location obfuscation

 Send additional set of “dummy” queries, in addition to actual

query

 Data transformation

 Encrypted query is sent to LBS  PIR-based location privacy

 PIR-based queries are sent to LBS server and retrieve blocks without server discovering which blocks are requested

(40)

Privacy Preserving Data Mining

26

- Possible Attacks

Inference through data mining attacks

 Addition and/or deletion of items so that Sensitive Association Rules (SAR) will be hidden

 Generalization and/or suppression of items so that the confidence of SAR will be lower than ρ

D

O

DM

R

O

(41)

41

Privacy Preserving Data Mining

27

- Possible Attacks

ρ-uncertainty (Cao VLDB‟10)

 Given a transaction dataset, sensitive items Is, uncertainty level ρ, the objective is to make the confidence of all sensitive association rules to be less than ρ, i.e., Conf (χ -> α) < ρ., where χ  I and α Is.

If Alice knows Bob bought b1, then she knows Bob also bought {a1, b2, α, }, where Is = {α, }

(42)

Privacy Preserving Data Mining

28

- Possible Attacks

ρ-uncertainty (Cao VLDB‟10)

 Given a transaction dataset, sensitive items Is, uncertainty level ρ = 0.7, a hierarchy of

non-sensitive items, the published data after suppression

(43)

Current Research Areas

1

Privacy-Preserving Data Publishing

 K-anonymity

 Try to prevent privacy de-identification

Utility-based Privacy-Preserving

Distributed Privacy with Adversarial

Collaboration

Privacy-Preserving Application

 Association rules hiding

(44)
(45)

Privacy Preserving Data Publishing

1

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

Hospital Patient Data (Name, ID are hidden) Vote Registration Data (public info)

 Andre has heart disease!

45

(46)

Privacy Preserving Data Publishing

2

(47)

47

Privacy Preserving Data Publishing

3

“Fuzz” the data

k-anonymity, at least k tuples in one group

(48)

Utility-based Privacy-Preserving

1

(49)

49

Utility-based Privacy-Preserving

2

(50)

Utility-based Privacy-Preserving

3

Q1:”How many customers under age 29 are

there in the data set?”

Q2: “Is an individual with age=25, education=

Bachelor, Zip Code = 53712 a target customer?”

Table 2, answers: “2”; “Y”

(51)

51

Distributed Privacy with Adversarial

Collaboration

Input privacy (2)

D

1

+D

2

+D

3

R

O

DM

(52)

Association rule mining

Input: D

O

, min_supp, min_conf

Output: R

O TID Items T1 ABC T2 ABC T3 ABC T4 AB T5 A min_supp=33% min_conf=70% 1 B=>A (66%, 100%) 2 C=>A (66%, 100%) 3 B=>C (50%, 75%) 4 C=>B (50%, 75%) 5 AB=>C (50%, 75%) 6 AC=>B (50%, 75%) 7 BC=>A (50%, 100%) 8 C=>AB (50%, 75%) 9 B=>AC (50%, 75%) |A|=6,|B|=4,|C|=4 |AB|=4,|AC|=4,|BC|=3 |ABC|=3

(53)

53

Input: DO, X (items to be hidden on LHS),

min_supp, min_conf  Output: DM TID Items T1 ABC T2 ABC T3 ABC T4 AB T5 A T6 AC min_supp=33% min_conf=70% X = {C} AC DM Association Rules 1 B=>A (50%, 100%) 2 C=>A (66%, 100%) 3 B=>C (33%, 66%) 4 C=>B (33%, 50%) 5 AB=>C (33%, 66%) 6 AC=>B (33%, 50%) 7 BC=>A (33%, 100%) 8 C=>AB (33%, 50%) 9 B=>AC (33%, 66%) 10 A=>B (50%, 50%) 11 A=>C (50%, 66%) 12 A=>BC (33%, 33%) |A|=6,|B|=3,|C|=4 |AB|=3,|AC|=4, |BC|=2 |ABC|= 2 hidden hidden hidden lost lost lost try try

(54)

Association Rule Hiding

3

- Side

effects

Hiding failure, lost rules, new rules

R

② Lost Rules

R h ~ R

(55)

55

Association Rule Hiding

4

Output privacy

D

O

R

O

D

M

R

M

DM

DM

Modification

D

O

R

O

D’

o

DM

DM

Modification

• Input privacy (1)

R

M

= R

O

-

R

H

(56)

Recent Studies

1

-

Informative Association Rule Set (IRS)

 Informative Association rules

 Input: DO, min_supp, min_conf, X = {C}  Output: {C=>A (66%, 100%), C=>B (50%, 75%)} TID Items T1 ABC T2 ABC T3 ABC T4 AB 1 B=>A (66%, 100%) 2 C=>A (66%, 100%) 3 B=>C (50%, 75%) 4 C=>B (50%, 75%) 5 AB=>C (50%, 75%) 6 AC=>B (50%, 75%) 7 BC=>A (50%, 100%) 8 C=>AB (50%, 75%) 9 B=>AC (50%, 75%) Rules #6,7,8 predict same RHS {A,B} as #2,4

(57)

Recent Studies

2

-

Hiding IRS

1

(LHS)

57

Input: DO, X (items to be hidden on LHS),

min_supp, min_conf  Output: DM TID Items T1 ABC T2 ABC T3 ABC T4 AB T5 A T6 AC min_supp=33% min_conf=70% X = {C} AC DM Association Rules 1 B=>A (50%, 100%) 2 C=>A (66%, 100%) 3 B=>C (33%, 66%) 4 C=>B (33%, 50%) 5 AB=>C (33%, 66%) 6 AC=>B (33%, 50%) 7 BC=>A (33%, 100%) 8 C=>AB (33%, 50%) 9 B=>AC (33%, 66%) 10 A=>B (50%, 50%) 11 A=>C (50%, 66%) 12 A=>BC (33%, 33%) |A|=6,|B|=3,|C|=4 |AB|=3,|AC|=4, |BC|=2 |ABC|= 2 hidden hidden hidden lost lost lost try try

(58)

Recent Studies

3

Proposed Algorithms

Strategy:

To lower the confidence of a given rule X => Y,

 either

 Increase the support of X, but not XY, OR

 Decrease the support of XY (or both X and XY)

)

(

support

)

(

support

)

(

X

XY

X

XY

Y

X

Conf

(59)

Recent Studies

4

Proposed Algorithms

59

Multiple

scans

of database (Apriori-based)

 Increase Support of LHS First (ISL)  Decrease Support of RHS First (DSR)

One scan

of database

 Decrease Support and Confidence (DSC)

 Propose Pattern-Inversion tree to store data

Maintenance

of hiding informative association rule sets

(60)

Recent Studies

5

-

Analysis

For multiple-scan algorithms:

Time

effects

 DSR faster than ISL

 Due to size of candidate transactions is smaller 

Database

effects

(61)

Recent Studies

6

-

Analysis

61

Side

effects

 DSR: no hiding failure (0%), few new rules (5%) and

some lost rules (11%)

 ISL: shows some hiding failure (12.9%), many new

(62)

Recent Studies

7

-

Proposed Algorithms

One-scan algorithm DSC

 Pattern-inversion tree TID Items T1 ABC T2 ABC T3 ABC A:6:[T5] B:4:[T4] Root C:1:[T6]

(63)

Recent Studies

6

-

Proposed Algorithms

One-scan algorithm DSC

 Time effects, Database effects

(64)

Recent Studies

9

-

Proposed Algorithms

(65)

Recent Studies

10

-

Analysis

65

For single-scan algorithm:

DSC is O(2|D| + |X|*l2*K*logK)

where |X| is the number of items in X, l2 is the maximum number of large

2-itemsets, and K is the maximum number of iterations in DSC algorithm. SWA is O((n1-1)*n1/2*|D|*Kw)

where n1 is the initial number of restrictive rules in the database D and Kw is the

window size chosen.

SWA has higher order of complexity O(l22*|X|2*|D|2) if Kw |D|

(66)

Recent Studies

11

Maintenance of Hiding IRS

Maintenance

of hiding informative association rule

sets:

' D TID D T1 111 001 T2 111 111 T3 111 111 T 110 110 TID T7 101 T8 101 T9 110   D+ = D + 

D

'

(67)

Recent Studies

12

Maintenance of Hiding IRS

67

Maintenance of hiding informative association rule

sets:

TID D+ (DSC) (MSI)

T1 111 111 001 T2 111 111 111 T3 111 111 111 T4 110 110 110 T5 100 100 100 T6 101 001 001 T7 101 001 001 T8 101 101 101 T9 110 110 110

(68)

Recent Studies

13

Maintenance of Hiding IRS

(69)

Recent Studies

14

Maintenance of Hiding IRS

69

(70)

Recent Studies

15

K- anonymity and K

m

- anonymity

K-anonymity

 Every record has k-1 other identical record on QIs

K

m

- anonymity

 The support count of every m-itemset ≧ k

Domain Generalization Hierarchy (DGH)

Data types

(71)

Recent Studies

16

K- anonymity on transaction data with minimal

addition/deletion of items

71 

3-anonymity

{T1, T4, T5}, {T2, T3, T6} e1 e2 e3 T1 1 1 0 T2 0 0 1 T3 1 0 1 T4 1 0 0 T5 1 0 0 T6 1 0 1 e1 e2 e3 T1 1 0 0 T2 1 0 1 T3 1 0 1 T4 1 0 0 T5 1 0 0 T6 1 0 1

(72)

Recent Studies

17

K- anonymity on transaction data with minimal

addition/deletion of items

a1 a2 a3 T1 1 1 1 T2 1 1 0 T4 0 1 1 T5 1 0 0 a1 a2 a3 T3 1 0 1 T6 0 1 0 a1 a2 a3 T1 1 1 1 T2 1 1 0 T3 1 0 1 T4 0 1 1 a1 a2 a3 T5 1 0 0 T6 0 1 0

(73)

Recent Studies

18

K- anonymity on transaction data with minimal

addition/deletion of items

73

Propose an O(log k)-approximation solution to

(74)

Recent Studies

19

K- anonymity and K

m

- anonymity

Data type K-anonymity Km- anonymity

Relational + DGH

Generalization & suppression: Samarati TKDE ’01, Sweeney IJUFKS ’02;

Clustering: Li DAWAK ’06; Byun DASFAA ’07; Mohammed KDD ‘09, LKC-privacy, Top-down; Relational, no DGH Approximation algorithms (minimize suppression,

clustering): Park SIGMOD ’07; Clustering: Aggarwal PODS ’06, r-gather, ;

(75)

Recent Studies

20

Graph and Spatial Data

75

Data type K-anonymity Km- anonymity

Graph+ DGH Campan PinKDD08, SaNGreeA; Graph, no

DGH

1. Kuramochi DMKD’05,find freq patterns in large sparse graph;

2. Liu SIGMOD’08, k-degree, k nodes w/ same edge #, random, small-world, scale-free, prefuse, Enron, powergrid,

co-authors graphs;

3. Chang VLDB ‘09, Predictive anonymity; 4. Zon VLDB ‘09, K-automorphism for

multiple structural attacks, prefuse & co-author graphs, Pajek sw Erdos Renyi, scale-free models generated;

5. Bhagat VLDB ‘09, class-based anonymity; 6. Puttaswamy CoNEXT ’09, Intersection

(76)

Recent Studies

21

Graph and Spatial Data

Data type K-anonymity Km- anonymity

Graph, edge weight

1. Liu SDM’09, ano sensitive edge weight of undirected graph, maintain shortest

paths, Gaussian randomization, greedy perturbation, EIES and synthetic datasets; 2. Liu ICIS’10, k-anonymous weighted edge

on directed graph, k edges with weight differences less than ;

3. Das ICDE’10, ano directed edge weight, keep linear property, generate linear inequalities for shortest paths, LP solver, model Flickr, LiveJoural, Orkut, Youtube;

(77)

Recent Studies

22

K- anonymity and K

m

- anonymity

77

Major issues

 Large data volume  High dimensionality  Sparseness 

Approaches

 Wit DGH  Generalization or suppression,  bottom-up or top-down  No DGH  clustering,

(78)

Discussions

1

From relational and set data to

graph data

 Privacy-preserving social networking  Privacy-preserving collaborative filtering

(79)

Discussions

2

79

From

centralized

data

to

distributed

data

Distributed databases

 Horizontally partitioned

 Grocery shopping data collected

by different supermarkets

 Credit card databases of two

different credit unions

 “fraudulent customers often have

(80)

Discussions

3

From

centralized

data to

distributed

data

Distributed databases  Vertically partitioned

(81)

Discussions

4

81

From

data

privacy to

information

privacy

 Hiding aggregated information

 Car dealer inventory, hide stock, not individual query

 Air line, hide total seats left, prevent terrorist flying less crowded

flight

(82)

References

Some websites

 Privacy-Preserving Data Mining

 Privacy Preserving Data Mining: Models and Algorithms

(http://www.springerlink.com/content/978-0-387-70991-8)  http://www.springer.com/west/home?SGWID=4-102-22-52496494-0&changeHeader=true  http://www.cs.umbc.edu/~kunliu1/research/privacy_review.html  http://www.cs.ualberta.ca/~oliveira/psdm/pub_by_year.html  Kdnuggets: www.kdnuggets.com

參考文獻

相關文件

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

Following the supply by the school of a copy of personal data in compliance with a data access request, the requestor is entitled to ask for correction of the personal data

Additional Key Words and Phrases: Topic Hierarchy Generation, Text Segment, Hierarchical Clustering, Partitioning, Search-Result Snippet, Text Data

In our AI term project, all chosen machine learning tools will be use to diagnose cancer Wisconsin dataset.. To be consistent with the literature [1, 2] we removed the 16

2 machine learning, data mining and statistics all need data. 3 data mining is just another name for

In developing LIBSVM, we found that many users have zero machine learning knowledge.. It is unbelievable that many asked what the difference between training and

Security and privacy related literatures [19] focused on methods of preserving and protecting privacy of RFID tags; the RFID reader collision avoidance and hidden terminal

Furthermore, in order to achieve the best utilization of the budget of individual department/institute, this study also performs data mining on the book borrowing data