General Terms - 提供具資料共享與保護的語意規範於雲端環境中

WWW, Semantic Web, Database

∗Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WIMS’11, May 25-27, 2011 Sogndal, Norway Copyright c 2011 ACM 978-1-4503-0148-0/11/05... $10.00

Keywords

data sharing and integration, semantics-enabled policy, pri-vacy protection, query rewriting, ontology and rule

1. INTRODUCTION

Large enterprises spend a great deal of time and money on data (or information) integration [3]. Data integration is the problem of combining the data from autonomous and heterogeneous sources, and providing users with a unified view of these data through so called global (or mediated) schema. The global schema, which is a reconciled view of the information, that provides query services to end users. The design of a data integration system is a very complex task, which includes several different issues: heterogeneity of the data sources, relation between the global schema and the data sources, limitations on the mechanisms for accessing the sources, and how to process queries expressed on the global schema, etc [11].

Three approaches have been proposed to model a set of source descriptions that specify the semantic mapping be-tween the source schema and the global schema. The first one, called global-as-view (GAV), requires that the each con-cept in the global schema is expressed in terms of query over the data sources. The GAV deals with the case when the stable data source contains details not present in the global schema so it is not used for dynamically adding or deleting data sources.

The second one, called local-as-view (LAV), requires the global schema to be specified independently from the sources, and the source descriptions between the stable global schema, such as ontology and the dynamic data sources are estab-lished by defining each concept in the data sources as a view over the global schema [10] [26]. LAV descriptions handle the case in which the global schema contains details that are not present in every data sources.

The third one, called global-local-as-view (GLAV), a source description that combines the expressive power of both GAV and LAV, allowing flexible schema definitions independent of the particular details of the data sources [14] [30]. The data integration system uses these different source descrip-tions to reformulate a user query into a query over the source schemas. However, data sharing and integration are ham-pered by legitimate and widespread privacy concerns so it is critical to develop techniques that enables the integration and sharing of data without losing a user’s privacy [12].

Privacy protection policies represent a long-term promise made by an enterprise to its users and are determined by business practice and legal concerns. It is undesirable to change an enterprise’s promises to customers every time an internal access control rule changes. If possible, we should enable the integration of Platform for Privacy Preferences (P3P) and Enterprise Privacy Authorization Language (EPAL) policies to provide accountable and transparent information processing for data owners to revise their data usage per-missions [2].

Although many organizations post online privacy policies, they must realize that simply posting a privacy policy on their websites does not guarantee true compliance with ex-isting legislation. Following the OECD’s Fair Information Principles (FIPs)¹, an organization should provide norms of personal information process for its data collection, re-tention, use, disclosure, and destruction. An organization must also be accountable for its information possession and should declare the purposes of information usage before col-lection. Moreover, an organization should collect personal information with an individual’s consent and disclose per-sonal information only for previously identified purposes.

In this paper we are addressing the following research issues.

More detailed modelling and implementation will be shown in the later sections.

• Data sharing and protection services are considered in a large number of servers. The incentives for using the virtual platform (VP) is to avoid solving the com-plex pair-wise problem of ontology matching and rule integration between these servers. Therefore a uni-fied global data sharing and protection service can be achieved at the VP.

• Privacy protection policies are expressed as a combi-nation ontology and rule, i.e. O + R, where ontol-ogy O includes TBox schema and ABox instances, and rules R include deductive rule set (RS) and facts (F ).

Data sharing and protection in multiple servers are achieved through a combination of semantics-enabled formal protection policy (F PP).

• The challenge of designing a semantic privacy protec-tion model is to ensure a soundness and a complete-ness of data sharing and protection in multiple servers.

For the soundness criterion, we do not allow unin-tended data being released to the data users through the global policy schema (GPS) at the VP. Other-wise, it violates the privacy protection policies. As for the completeness criterion, we do not miss any eligible shared data when a user asks for a data request ser-vice at the VP. Therefore, shareable data obtained at the VP should equal data obtained directly from each server.

Each enterprise server declares its P3P privacy protection policies that takes into account the FIPs criteria (see Fig-ure 1). Then EPAL policies are established in each site, corresponding to the P3P [24]. For each data request, the

1See http://www.privacyrights.org/ar/fairinfo.htm

data handling and usage controls are based on the EPAL policies. However P3P and EPAL lack formal and unam-biguous semantics to specify privacy protection policies so they are limited in the policy enforcement and auditing sup-port for the software agents. One of the research challenges for the online privacy protection problem is to develop a privacy management framework and a formal semantics lan-guage to empower agents to enforce privacy protection poli-cies. Agents must avoid any policy violation of each data request. We attempt to establish a semantic privacy pro-tection model to address this issue. Each server shares its collected data with other servers but without breaking the original data usage commitment to its clients [25].

The contributions of this paper are twofold. We first of-fer a three layers semantic privacy-preserving model which encompasses and extends the existing work on data shar-ing and integration by usshar-ing a combination of ontology and rule for the representation of privacy protection policies. In particular, we define a formal policy using ontology for pri-vacy protection concept descriptions and rule for data query and access control services. Then we focus on solving the soundness and completeness of query rewriting problem us-ing a perfect ontology mergus-ing and a perfect rule integration from the local formal protection policies. Followed by each possible data query at the VP, we briefly demonstrate how the soundness and completeness criteria for privacy protec-tion data integraprotec-tion can be achieved using this semantics-enabled privacy-preserving model.

The paper is organized as follows. In section 2, we present a semantic privacy-preserving model as a framework for data sharing and integration services. In section 3, we define a formal policy combination as an integration of formal poli-cies from autonomous data sources. Each formal policy is composed of ontologies and rules for each independent data source. A privacy protection policy is a type of formal pol-icy used for specifying a data usage constraint from a data owner. In section 4, we formally define a formal policy com-bination in terms of ontology mapping, merging, and align-ment. Then we demonstrate how a perfect rule integration is used for query rewriting at the VP corresponding to each local schema. In section 6, we briefly prove the soundness and completeness of privacy-preserving data sharing and in-tegration based on this semantic privacy-preserving model.

We conclude with related work and discussion in the last two sections.

2. A PRIVACY-PRESERVING MODEL

A semantic privacy protection model is proposed with three layers, where the bottom layer provides data sources from the relational databases, the middle layer provides a semantics-enabled local schema for each independent service domain.

The top layer is served at the VP, which provides a unified global view of privacy-preserving data sharing and integra-tion services (see Figure 2).

We have a merged global ontology schema created by map-ping and aligning local ontology schemas with a LAV source description from multiple local schemas in the middle layer.

The idea of using description logic (DL) to model the local and global schemas is to empower the ontology’s abstract concept representation and reasoning capabilities. A query

Figure 1: A semantic privacy protection model extended from the integration of P3P and EPAL for data sharing and protection in multiple servers

is defined as an SQWRL datalog rule in the SWRL-based policy to access to a global ontology [31]. Each SQWRL data service query for a global ontology at the VP is mapped to multiple queries as SQWRL datalog rules for each local schema. This is a LAV query rewriting service which has been investigated in databases but it is largely unexplored in the context of DL-based ontologies [14].

2.1 Formal Privacy Protection Policy

A policy’s explicit representation in terms of ontologies or rules depends on what the underlying logic foundation of your policy language is. If your policies are created from DL-based policy language, such as Rein or KAoS, then ordinary policies are shown as TBox schema and ABox instances.

Otherwise, policies created from LP-based policy language, such as EPAL or Protune ordinary policies are a set of rules with predicates of unary, binary, or ternary variables and facts [5].

In the SemPIF framework [21], we define Policy Interchange Format (PIF) to follows W3C O + R standards [6] and strives to provide a mechanism for agents to preserve different pol-icy syntax and semantics throughout its polpol-icy integration and interchange. In addition, agents can use meta-PIF, pro-viding further management and reconciliation services of PIF-enabled multiple policies across various domains. In this paper, we apply the SemPIF framework for the privacy-preserving data integration through a combination of formal policies.

A formal policy (F P) is a declarative expression correspond-ing to a human legal norm that can be executed in a com-puter system without causing any semantic ambiguity. An F P is created from a policy language (PL), and this PL is shown as a combination of ontology language and rule language . Therefore, an F P is composed of ontologies O and rules R, where ontologies are created from an ontology language and rules are created from a rule language.

A formal protection policy (F PP) is an F P that aims at representing and enforcing resource protection principles, where the structure of resources is modelled as ontologies O but the resources protection is shown as rules R.

A privacy protection policy shown as an F PP is a combina-tion of ontologies and rules, e.g., O + R, where DL-based on-tologies, such as OWL-DL ontologies provide a well-defined structure data model for data sharing, while Logic Program (LP)-based rules, such as datalog rules provide further ex-pressive power for data query and protection. There are nu-merous O + R combinations available for designing privacy protection policies, such as SWRL [20], and OWL2 RL [17].

Each O + R combination implies what expressive power we can extract from ontologies for the rules and vice versa.

The SWRL is one of the O + R semantic web languages suitable for a policy representation in the privacy protection model. But this is not an exclusive selection. Other O + R combinations, such as CARIN, OWL2 RL are also possi-ble for modeling formal privacy protection policy whenever their underlying theoretical foundations and development tools are available. We fully utilize the SWRLTab develop-ment tools and SQWRL OWL-DL query language [31] in the Prot´eg´e to model and enforce semantic privacy protec-tion policies.

We face a research challenge of combining SWRL-based pri-vacy protection policies from multiple servers to ensure the soundness and completeness of data sharing and protection criteria. Another challenge is to solve the policy’s syntax and semantics incompatibility when we allow policy combi-nation in multiple servers. SWRL is based on the classical first order logic (FOL) semantics that mitigates a possible semantic and syntax inconsistency when policies come from different servers.

But we still face a background policy inconsistency prob-lem when default policy assumptions vary between different

Figure 2: A semantic privacy protection model

servers. For example, one server uses open policy assump-tion, where no explicit option-out for data usage means option-in, but the other server uses closed policy assump-tion, where no explicit in for data usage means option-out. We avoid this kind of policy inconsistency by requesting all sites to use a uniform policy assumption, and to collect option-in data usage choices from users whenever multiple policies are integrated.

Previous studies for policy combination did not consider solving the problem of merging multiple schemas and inte-grating access control rules from multiple servers [4] [28]. In this paper we propose a semantic privacy protection model that allows flexibly combining TBoxes of privacy protection policies without moving ABox instances from its original data source until a data request service is initiated (see Figure 3).

Therefor the global ontology TBox schema and rules created at the VP have the latest updated incoming data from each server when a user asks a query.

Data integration aims at providing unified and transpar-ent access to a set of autonomous and heterogeneous data sources. The semantic privacy protection model providing global ontology schema for data sharing is similar to the data integration problem solved by DL − LiteA ontologies shown in [8]. Here we are also focusing on data protection besides data sharing and integration.

The goal of ontology-based data integration in DL − LiteA

is to provide a uniform access mechanism to a set of hetero-geneous relational database sources, freeing the user from having the knowledge about where the data are, what they are stored, and how they can be accessed. The idea is based on decoupling information access from its relational data storage so users only access the conceptual layer shown as ontology, while the relational data layer, hidden to users, manages the data.

Compared with DL−LiteA, we have extended and used it as a part of our semantic privacy protection model. We have three layers of data sharing and integration infrastructure

instead of two layers shown in DL − LiteA so we face a research challenge of ontology merging and rule integration from the middle layer to the top layer when we enforce a privacy protection policy (see Figure 3).

A semantic privacy protection model composed of three main components:

• In the top layer at the VP, we have a global policy schema (GPS), including a global ontology schema (GS) aligned and merged from several local schemas (LS), e.g. TBox and a set of rule integration at the middle layer. The VP provides conceptual data ac-cess and protection services that give users a unified conceptual “global view” with access control power for each data request.

• Ontology-based data sources are external, independent, and heterogeneous, and each local ontology was com-bined with logic program (LP)-based rules for each server in the middle layer.

• Mapping language (ML), which semantically links a GS and integrated rule set in the top layer to each server’s ontology LS and privacy protection rules in the middle layer.

3. A FORMAL POLICY COMBINATION

A formal policy combination (F PC) in a global policy schema (GPS) allows data sharing as integration of F P from a va-riety of servers.

Figure 3: A virtual platform for ontology mapping, merging, and rule integration from multiple servers

where

i is the index of a server i.

⊕ is an operator for formal policy combination,

is an operator for ontology mapping and merging, is an operator for rule integration.

In a semantic privacy protection model, a formal protec-tion policy combinaprotec-tion (F PPC) allows data sharing and protection from F PC = ⊕

iFi) provides data query and protection services in

iOi.

3.1

F PP

for Privacy Protection

A privacy protection policy is a type of F PP. We designed an ontology that declares the FIPs’ attributes as classes in an F PP (see Figure 4). The attributes, purpose, datauser, data, obligation, and action that allow people to specify the constraints of privacy protection policies using related prop-erty chains.

Constraint properties is a type of owl : ObjectProperty that specify what are the feasible domain and range classes of the above attributes. For example, a property hasOptInPurpose has its domain and range classes shown as follows:

T v ∀ hasOptInPurpose.Data, T v ∀ hasOptInPurpose⁻.Purpose.

Then a datalog rule, in the SWRL-based policy representa-tion, allows us to use a property chain to combine the two feasible classes together:

hasOptInPurpose.Data(?data)

∧ hasOptInPurpose⁻.Purpose(?purpose)

−→ hasOptInPurpose(?data, ?purpose) ←− (1)

Similarly, a hasOptInDatauser property has its domain and range classes shown as follows:

T v ∀ hasOptInDatauser.Data, T v ∀ hasOptInDatauser⁻.Datauser.

Then another datalog rule allows us to use another property chain to combine another two feasible classes together:

hasOptInDatauser.Data(?data)

∧ hasOptInDatauser⁻.Datauser(?datauser)

−→ hasOptInDatauser(?data, ?datauser) ←− (2)

Based on (1) and (2), we have a feasible set of ABox instances with data, purpose, and datauser combinations of an at-tribute set that was permitted from the original dataowner to allow a particular type of datauser to ask for a data set with a permissive purpose. When a server collects a cus-tomer’s data, the promise of data usage will be ensured if a data user’s identity and usage purpose are verified success-fully. Otherwise, the data will be kept secret without a data user’s awareness.

These are easily extended to the other two attributes, action and obligation, to complete the FIPs’ privacy protection criteria. An ordinary data user is allowed to ask a query service with action = read at the VP. The other actions, such as deletion or modify, are only allowed for a system administrator in the middle layer when (s)he asks to delete a user’s data to satisfy the obligation of data retention period or for a data owner updates his or her own profile data.

Figure 4: A partial ontology schema for OECD FIPs’ attributes shown as owl : Class, and constraints shown as owl : Property

3.2 Data Request Services

A server declares its privacy policy in P3P before a data owner’s data is collected. Once a user accepts a server’s pri-vacy declaration policy, the data usage constraints are speci-fied as Figure 5, where FIP’s five attributes (?d, ?p, ?du, ?a, ?o) for data, purpose, datauser, action, and obligation, are classes, and hasOptInDatauser, hasOptInPurpose, etc., are properties proposed as chains of usage constraints for at-tributes. For each data request service, an initial feasi-ble parameter input set is F S = input(?du, ?r, ?p), where

?du ∈ Datauser, ?r = read ∈ Action, ?p ∈ Purpose and output dataset with associated obligations is output(?d, ?o), where ?d ∈ Data, ?o ∈ Obligation. The feasible dataset shown as ABox instances will be discovered by using SQWRL datalog rules. Further permissible actions will be activated when the following data protection policies are satisfied.

Figure 5: Five major FIP’s attributes, such as data, purpose, etc are shown as owl : class and chained by associated owl : Property, such as hasOptInDatauser, hasOptInPurpose, etc.

3.3

F PPC

at the

A data user still possibly collects a shareable data by ask-ing each server individually without usask-ing a formal privacy protection policy combination (F PPC). But the high com-plexity of using query services for all of data sources hinders

people from using this data sharing approach. The other

在文檔中提供具資料共享與保護的語意規範於雲端環境中 (頁 30-42)