• 沒有找到結果。

overview of social network and privacy preservation - part II

N/A
N/A
Protected

Academic year: 2021

Share "overview of social network and privacy preservation - part II"

Copied!
86
0
0

加載中.... (立即查看全文)

全文

(1)

Social Networks and

Privacy Preservation

(Part II)

Leon S.L. Wang (王學亮)

(2)

Outline

Overview of Social Networks (Part I)

 Analysis

 Extraction, Construction  Application

 Challenge

Privacy Preservation (Part II)

Privacy-Preserving Data Publishing (PPDP)

Privacy-Preserving Network Publishing (PPNP) Privacy-Preserving Data Mining (PPDM)

(3)

Querying public information is an activity of daily life. But

Cloud

(4)

Attacks from client side (un-trusted

clients)

Retrieve information, analyze, re-identify personal

information, then attack,

 Relational data privacy - PPDP

 Set data (transaction data) privacy - PPDP

 Graph data (social network data) privacy – PPNP, PPSN  Edge weight privacy – PPNP, PPSN

(5)

Attacks from server side

(un-trusted servers)

Based on client’s queries, identify personal information

(location, identity), then attack,

 Location-based service privacy,  Obfuscation: cloaking and anonymity,

 Data transformation: incremental NN, encrypted databases,  PIR-based (Private Information Retrieval),

(6)

Objective

To protect privacy of users and information (PPNP)

 Identity disclosure  Link disclosure

 Attribute disclosure

(7)

Recent Studies

PPDP, PPNP, PPDM

 Motivating Examples, Objective, Common Practices, Possible

Attacks

Some Research Areas

 K-Anonymity, Utility Issues, Distributed Privacy, Association

(8)

Privacy Preserving Data Publishing

1

Motivating example – Group Insurance

Commission: they found MA governor’s

medical record

(9)

Privacy Preserving Data Publishing

2

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

Hospital Patient Data (Name, ID are hidden) Vote Registration Data (public info)

Motivating examples – Group Insurance

(10)

Privacy Preserving Data Publishing

3

Motivating examples – A Face Is Exposed

for AOL Searcher No. 4417749

Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it was not much of a shield. – New York Times, August 9, 2006.

Thelma Arnold's

identity was betrayed by AOL records of her Web searches, like ones for her dog,

(11)

Privacy Preserving Data Publishing

4

Motivating examples – American On Line

~650k users, 3 months period, ~20 million

queries released

No name, no SSN, no driver license #, no credit card #

The user, ID 4417749, was found to be Thelma Arnold, a 62 year old woman living in Georgia.

Lost of privacy to users, damage to AOL, significant damage to academics who depend on such data.

(12)

Privacy Preserving Data Publishing

5

Motivating examples – Netflix Prize

 In October of 2006, Netflix announced the $1-million Netflix Prize for improving their movie recommendation system.

 Netflix publicly released a dataset containing 100

million movie ratings of 18,000 movies, created by 500, 000 Netflix subscribers over a period of 6 years.

 Anonymization - replacing usernames with random identifiers.

 Shown that 84% of the subscribers could be uniquely identified by an attacker who knew 6 out of 8 movies

(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)

Privacy Preserving Network Publishing

7

You have control on what information you

want to share, who you want to connect with

You do not have comprehensive and

accurate idea of the information you have

explicitly and implicitly disclosed

Setting online privacy is time consuming

and

many of you choose to

accept the default

setting

Eventually you lose control… and you are

(21)

Privacy Preserving Data Mining

1

Motivating examples – Association Rules

Supplier

ABC Paper Company

Retailer

XYZ Supermarket Chain

1. Allow ABC to access customer XYZ’s DB

2. Predict XYZ’s inventory needs & Offer reduced prices

(22)

Privacy Preserving Data Mining

2

Supplier ABC discovers (thru data mining):

 Sequence: Cold remedies -> Facial tissue

Association: (Skim milk, Green paper)

Supplier ABC runs coupon marketing

campaign:

 “50 cents off skim milk when you buy ABC

products”

Results:

 Customers buy Green paper from ABC

 Lower sales of Green paper for XYZ (Bad)

(23)

Privacy Preservation

1

- Objective

Privacy

 The state of being private; the state of not being seen by others

Database security

 To prevent loss of privacy due to

viewing/disclosing unauthorized data

Privacy Preservation

(24)

Privacy Preservation

1

- Common Practices

Limiting access

 Control access to the data

 Used by secure DBMS community

“Fuzz” the data

 Forcing aggregation into daily records instead of individual transactions or slightly altering data values

(25)

Privacy Preservation

3

- Common Practices

Eliminate unnecessary groupings

 The first 3 digits of SSNs are assigned by office sequentially

 Clustering high-order bits of a “unique identifier” is

likely to group similar data elements

 Unique identifiers are assigned randomly

Augment the data

 Populate the phone book with extra, fictitious people in non-obvious ways

 Return correct info when asking an individual, but return incorrect info when asking all individuals in

(26)

Privacy Preservation

4

- Common Practices

Audit

 Detect misuse by legitimate users

 Administrative or criminal disciplinary action may be initiated

(27)

Privacy Preservation

5

- Possible Attacks

Linking attacks (Sweeney IJUFKS „02)

 Re-identification

 Identity linkage (K-anonymity)

 Attribute linkage (l-diversity)

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

(28)

Privacy Preservation

6

- Possible Attacks

Corruption attacks(Tao ICDE08, Chaytor ICDM09)

 Background knowledge; Perturbed generalization (PG)

(29)

Privacy Preservation

7

- Possible Attacks

Differential privacy (Dwork ICALP ‟06)

 Add noise to the data set so that the difference

between any query output to 30 records and to any 29 records will be very small (within a differential).

(30)

Privacy Preservation

8

- Possible Attacks

(31)

Privacy Preservation

9

- Possible Attacks

(32)

Privacy Preservation

10

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Degree attack: knowing Bob has with 4 friends => vertex 7 is Bob

(33)

Privacy Preservation

11

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Sub-graph attack (one match), knowing Bob’s friends and friend‟s friends => Vertex 7 is Bob

(34)

Privacy Preservation

12

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Sub-graph attack (k-match), knowing Bob’s

friends => 8 matches, but share common labels 6 & 7 => still uniquely identify vertex 7 is Bob

(35)

Privacy Preservation

13

- Possible Attacks

Structural attacks (Zhou VLDB „09)

 Hub fingerprint attacks

 Some hubs have been identified, adversary knows the distance between Bob and hubs

(36)

Privacy Preservation

14

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

 Label attack

Interaction graph

(37)

Privacy Preservation

15

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

(38)

Privacy Preservation

16

- Possible Attacks

Non-structural attacks (Bhagat VLDB „09)

(39)

Privacy Preservation

17

- Possible Attacks

Active attacks

(Backstrom WWW „07)

 Planted a subgraph H into G and connect to targeted nodes (add new nodes and edges)

 Recover H from G & identify targeted nodes‟

identity and relationships

 Walk-based (largest H), cut-based (smallest H)

(40)

Privacy Preservation

18

- Possible Attacks

Passive

attacks (

no nodes, no edges added

)

 Start from a coalition of friends (nodes) in

anonymized graph G, discover the existence of edges among users to whom they are linked to

Semi-passive

attacks (

add edges only, no nodes

)

From existing nodes in G, add fake edges to targeted nodes

(41)

Privacy Preservation

19

- Possible Attacks

Intersection attacks (Puttaswamy CoNEXT‟09)

 Two users were compromised, A and B,

A queries server for the visitor of “website xyz”,  B queries server for the visitor of “website xyz”,  The intersection is C only (privacy leak)

(42)

Privacy Preservation

20

- Possible Attacks

Intersection attacks

 StarClique (add latent edges)

 The graph evolution process for a node. The node first selects a

subset of its neighbors. Then it builds a clique with the members of this subset. Finally, it connects the clique members with all the non-clique members in the neighborhood. Latent or virtual edges are added in the process.

(43)

Privacy Preservation

21

- Possible Attacks

Relationship attacks (Liu SDM‟09)

 Sensitive edge weights, e.g. transaction expenses in business network,

 Reveal the shortest path between source and sink, e.g., A -> D,

(44)

Privacy Preservation

22

- Possible Attacks

Location Privacy (Papadopoulos VLDB‟10)

 Preserving the privacy of the location of user in

(45)

Privacy Preservation

23

- Possible Attacks

Inference through data mining attacks

 Addition and/or deletion of items so that Sensitive Association Rules (SAR) will be hidden

 Generalization and/or suppression of items so that the confidence of SAR will be lower than ρ

D

O

DM

R

O

Modification

R

M

= R

O

-

R

H

(46)

Privacy Preservation

24

- Possible Attacks

ρ-uncertainty (Cao VLDB‟10)

 Given a transaction dataset, sensitive items Is, uncertainty level ρ, the objective is to make the confidence of all sensitive association rules to be less than ρ, i.e., Conf (χ -> α) < ρ., where χ I and α Is.

If Alice knows Bob bought b1, then she knows Bob also bought {a1, b2, α, }, where Is = {α, }

(47)

Privacy Preservation

25

- Possible Attacks

ρ-uncertainty (Cao VLDB‟10)

 Given a transaction dataset, sensitive items Is, uncertainty level ρ = 0.7, a hierarchy of

non-sensitive items, the published data after suppression

(48)

Privacy Preservation

26

- Possible Attacks

(49)

Privacy Preservation

27

- Possible Attacks

(50)

Privacy Preservation

28

- Possible Attacks

(51)

Current Research Areas

1

Privacy-Preserving Data Publishing

 K-anonymity

 Try to prevent privacy de-identification

Utility-based Privacy-Preserving

Distributed Privacy with Adversarial

Collaboration

Privacy-Preserving Application

 Association rules hiding

Privacy-Preserving Network Publishing

(52)
(53)

Privacy Preserving Data Publishing

1

DOB Sex Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail

Name DOB Sex Zipcode Andre 1/21/76 Male 53715

Beth 1/10/81 Female 55410 Carol 10/1/44 Female 90210 Dan 2/21/84 Male 02174 Ellen 4/19/72 Female 02237

Hospital Patient Data (Name, ID are hidden) Vote Registration Data (public info)

(54)

Privacy Preserving Data Publishing

2

K-Anonymity for linking attacks

(55)

Privacy Preserving Data Publishing

3

“Fuzz” the data

k-anonymity, at least k tuples in one group

(56)

Utility-based Privacy-Preserving

1

(57)

Utility-based Privacy-Preserving

2

(58)

Utility-based Privacy-Preserving

3

Q1:”How many customers under age 29 are

there in the data set?”

Q2: “Is an individual with age=25, education=

Bachelor, Zip Code = 53712 a target customer?”

Table 2, answers: “2”; “Y”

(59)

Distributed Privacy with Adversarial

Collaboration

Input privacy (2)

D

1

+D

2

+D

3

R

O

(60)

Association rule mining

Input: D

O

, min_supp, min_conf

Output: R

O TID Items T1 ABC T2 ABC T3 ABC T4 AB T5 A T6 AC min_supp=33% min_conf=70% 1 B=>A (66%, 100%) 2 C=>A (66%, 100%) 3 B=>C (50%, 75%) 4 C=>B (50%, 75%) 5 AB=>C (50%, 75%) 6 AC=>B (50%, 75%) 7 BC=>A (50%, 100%) 8 C=>AB (50%, 75%) 9 B=>AC (50%, 75%) 10 A=>B (66%, 66%) 11 A=>C (66%, 66%) 12 A=>BC (50%, 50%) DO |A|=6,|B|=4,|C|=4 |AB|=4,|AC|=4,|BC|=3 |ABC|=3 Not AR

(61)

Input: DO, X (items to be hidden on LHS), min_supp, min_conf  Output: DM TID Items T1 ABC T2 ABC T3 ABC T4 AB T5 A T6 AC min_supp=33% min_conf=70% X = {C} AC D 1 B=>A (50%, 100%) 2 C=>A (66%, 100%) 3 B=>C (33%, 66%) 4 C=>B (33%, 50%) 5 AB=>C (33%, 66%) 6 AC=>B (33%, 50%) 7 BC=>A (33%, 100%) 8 C=>AB (33%, 50%) 9 B=>AC (33%, 66%) 10 A=>B (50%, 50%) |A|=6,|B|=3,|C|=4 |AB|=3,|AC|=4, hidden hidden hidden lost lost lost try try

(62)

Association Rule Hiding

3

- Side

effects

Hiding failure, lost rules, new rules

R ① Hiding Failure ② Lost Rules ③ New Rules R h ~ R h

(63)

Association Rule Hiding

4

Output privacy

D

O

R

O

D

M

R

M

DM

DM

Modification

D

O

R

O

D’

o

DM

DM

Modification

• Input privacy (1)

R

= R

-

R

(64)

Privacy-Preserving Network Publishing

1

K-anonymity

 K-degree, neighborhood, automorphism,

k-isomorphism, k-symmetry, k-security, k-obfuscation

Generalization

 Clustering nodes into supernodes

Randomization

 Statistically add/delete/switch edges

Other works

 Edge weighted graph, privacy scores

Output Perturbation

(65)

Privacy-Preserving Network Publishing

2

(66)

Privacy-Preserving Network Publishing

3

(67)

Privacy-Preserving Network Publishing

4

(68)
(69)

Privacy-Preserving Network Publishing

6

(70)

Privacy-Preserving Network Publishing

7

(71)

Privacy-Preserving Network Publishing

7

(72)

Privacy-Preserving Network Publishing

8

(73)

Privacy-Preserving Network Publishing

9

(74)

Privacy-Preserving Network Publishing

10

Relationship attacks (Liu SDM‟09)

 Preserving the shortest path, e.g. A -> D,

 Min perturbation on path length, path weight,

(75)

Privacy-Preserving Network Publishing

11

Relationship attacks (Liu ICIS‟10)

 Preserving the shortest path, e.g. A -> D,

K-anonymous weight privacy

 the blue edge group and the green edge group satisfy the 4-anonymous privacy where  =10.

(76)

Privacy-Preserving Network Publishing

12

Relationship attacks (Das ICDE‟10)

 Preserving linear property, e.g., shortest paths,

 The ordering of the five edge weights are preserved after naïve anonymization.

x5 x1 x4 x3 x2, where x1=(v1, v2), x2=(v1, v4), x3=(v1, v3), x4=(v2, v4), x5=(v3, v4),

The minimum cost spanning tree is preserved. {(v1, v2), (v2, v4), (v1, v3)}

The shortest path from v1 to v4 is changed.

 The ordering of the edge weight is still exposed. For example, v3 and v4 are best friends and v1 and v4 are not so good friends.

x1 x

4

x2 x

(77)

Privacy-Preserving Network Publishing

13

(78)

Location Privacy

1

Location Privacy (Papadopoulos VLDB‟10)

 Location obfuscation

 Send additional set of “dummy” queries, in addition to actual

(79)

Location Privacy

2

Location Privacy (Papadopoulos VLDB‟10)

 Data transformation

(80)

Location Privacy

3

PIR-based location privacy

 PIR-based queries are sent to LBS server and retrieve blocks without server discovering which blocks are requested

(81)

Discussions

1

From relational and set data to

graph data

 Privacy-preserving social networking  Privacy-preserving collaborative filtering

(82)

Discussions

2

From

centralized

data

to

distributed

data

Distributed databases

 Horizontally partitioned

 Grocery shopping data collected

by different supermarkets

 Credit card databases of two

different credit unions

 “fraudulent customers often have

(83)

Discussions

3

From

centralized

data to

distributed

data

Distributed databases  Vertically partitioned

(84)

Discussions

4

From data privacy to information privacy

 Hiding aggregated information

 Car dealer inventory, hide stock, not individual query

 Air line, hide total seats left, prevent terrorist flying less crowded

flight

(85)

Discussions

5

How to quantify graph

utility-privacy tradeoff

especially for (dynamic) rich graphs?

Existing data publication techniques do not

provide guarantees of accuracy of graph analysis

How to define utility?

Scalability is always an issue.

Differential privacy

preserving social network

mining

(86)

Tutorials

Privacy in data system, Rakesh Agrawal, PODS03

Privacy preserving data mining, Chris Clifton, PKDD02, KDD03

Models and methods for privacy preserving data publishing and analysis,

Johannes Gehrke, ICDM05, ICDE06, KDD06

Cryptographic techniques in privacy preserving data mining, Helger Lipmaa ,

PKDD06

Randomization based privacy preserving data mining, Xintao Wu, PKDD06

Privacy in data publishing, Johannes Gehrke & Ashwin Machanavajjhala, S&P09 Anonymized data: generation, models, usage, Graham Cormode & Divesh

Srivastava, SIGMOD09

A tutorial of privacy-preservation of graphs and social networks, Xintao Wu,

Xiaowei Ying, PAKDD 2011

Privacy-aware data management in information networks, Michael Hay, Kun Liu,

參考文獻

相關文件

例如 : http ( 網頁伺服器所用的協定 ) 定義了 client 如何向 server request 網頁及 server 如何 將網頁及其中的各種內容回傳給 client 。. 提供服務給 application layer

本次的作業 (netstat -na 部分 ) 即在觀看所有機 器上的 transport layer 連線. 本次的作業 (netstat -na 部分 ) 即在觀看所有機 器上的 transport layer

Centre for Learning Sciences and Technologies (CLST) The Chinese University of Hong

The differential mode of association: Understanding of traditional Chinese social structure and the behaviors of the Chinese people. Introduction to Leadership: Concepts

The elderly health centres provide people aged 65 or above with comprehensive primary healthcare services which include health assessments, physical check-ups, counselling,

It costs &gt;1TB memory to simply save the raw  graph data (without attributes, labels nor content).. This can cause problems for

n Another important usage is when reserving network resources as part of a SIP session establishment... Integration of SIP Signaling and Resource

They are: Booklet (6) – Healthy Community, exploring the communicable and non- communicable diseases and how they affect community health so that students are able to