5 Conclusion and Future Work - 提供具資料共享與保護的語意規範於雲端環境中

We present a three-layered formal policy framework to demonstrate how data usage crosses multiple judicial domains in the cloud. We focus on the design and modeling of the CLD layer with numerous TLD built-ins for the formal policy framework. In this innovative cloud framework, a TLD specifies its legal virtual boundary to accept a data request. When a data user asks for information disclo-sure using a role with a purpose from a location. A data usage context is created to determine which TLD with its various policies is eligible to constrain data usage. A domain-policy is applied to select which data policies are applicable for a real information disclosure within a TLD. A meta-policy is used for setting up the data-policy’s priority for policy management when policy conflicts exist.

Semantics-enabled policies are shown as a combination of ontologies and rules, where ontologies describe the concept of policies, including domain-policy, meta-policy and data-meta-policy. Rules further enforce these different type of policies. The semantics-enabled policies are applied to a scenario where the national security policies for information sharing and the privacy protection policies for data usage are both satisfied. Finally, the CLD layer’s proof-of-concepts prototype, based on the OpenTC architecture, has been implemented to justify our approach.

Acknowledgements

This research was partially supported by the NSC Taiwan under Grant No. NSC 99-2221-E-004-010 and NSC 100-2221-E-004-011-MY2.

14 Yuh-Jong Hu, Win-Nan Wu, and Jiun-Jan Yang

References

1. Bruening, J.P., Treacy, B.C.: Cloud computing: privacy, security challenges. Pri-vacy & Security Law Report (2009)

2. Takabi, H., et al.: Security and privacy challenges in cloud computing environ-ments. IEEE Seurity & Privacy 8 (2010) 24–31

3. Ant´on, I.A., et al.: A roadmap for comprehensive online for privacy policy man-agement. Comm. of the ACM 50 (2007) 109–116

4. Vimercati, S.D.C.d., et al.: Second research report on next generation policies, project deliverable D5.2.2. Technical report, PrimeLife (2010)

5. Ardagna, A.C., et al.: A privacy-aware access control system. Journal of Computer Security 16 (2008) 369–397

6. Karjoth, G., et al.: Translating privacy practices into privacy promises - how to promise what you can keep. In: POLICY’03, IEEE (2003)

7. Hu, Y.J., Yang, J.J.: A semantic privacy-preserving model for data sharing and in-tegration. In: International Conference on Web Intelligence, Mining and Semantics (WIMS’11), Norway, ACM (2011)

8. Cabuk, S., et al.: Towards automated security policy enforcement in multi-tenant virtual data centers. Journal of Computer Security 18 (2010) 89–121

9. Popp, R., Poindexter, J.: Countering terrorism through information and privacy protection technologies. IEEE Seurity & Privacy 4 (2006) 24–33

10. Kettler, B., et al.: Facilitating information sharing across intelligence community boundaries using knowledge management and semantic web technologies. In Popp, L.R., Yen, J., eds.: Emergent Information Technologies and Enabling Policies for Counter-Terrorism. Wiley (2005) 175–195

11. Buchanan, W., et al.: Interagency data exchange protocols as computational data protection law. In: Legal Knowledge and Information Systems - JURIX, IOS Press (2010) 143–146

12. Bonatti, P., Olmedilla, D.: Policy language specification, enforcement, and inte-gration. project deliverable D2, working group I2. Technical report, REWERSE (2005)

13. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl-edge Acquisition 5 (1993)

14. Kagal, L., et al.: Using semantic web technologies for policy management on the web. In: 21st National Conference on Artificial Intelligence (AAAI), AAAI (2006) 15. Tonti, G., et al.: Semantic web languages for policy representation and reasoning:

A comparison of KAoS, Rei, and Ponder. In: 2nd International Semantic Web Conference (ISWC) 2003. LNCS 2870 (2003) 419–437

16. Hu, Y.J., Boley, H.: SemPIF: A semantic meta-policy interchange format for mul-tiple web policies. In: 2010 IEEE/WIC/ACM Int. Conference on Web Intelligence and Intelligent Agent Technology, IEEE (2010) 302–307

17. Hosmer, H.H.: Metapolicies I. ACM SIGSAC Review 10 (1992) 18–43

18. Berger, S., et al.: Security for the cloud infrastructure: Trusted virtual data center implementation. IBM Journal of Research and Development (2009) 6:1–6:12 19. Clifton, C., et al.: Privacy-preserving data integration and sharing. In: Data Mining

and Knowledge Discovery, ACM (2004) 19–26

20. Calvanese, D., Giacomo, G.D.: Data integration: A logic-based perspective. AI Magazine 26 (2005) 59–70

International Journal of Computer Science and Applications c

Technomathematics Research Foundation Vol. XX No. XX, pp. XXX - XXX, 20XX

Semantics-enabled Policies for

Super-Peer Data Integration and Protection

Yuh-Jong Hu and Win-Nan Wu and Jiun-Jan Yang ENT Lab., Dept. of Computer Science

National Chengchi University Taipei, Taiwan, 11605

[email protected], {99753505,98753036}@nccu.edu.tw

Extending from the previous semantic privacy-preserving model, we propose a wide-scale peer data integration and protection architecture. Any user from a super-peer domain can contribute new data, schema, or even mappings for other super-super-peer domains to integrate the information. Each super-peer domain is essentially a mediator-based data integration system, where an agent at the super-peer performs semantic local mappings to manage a set of its local peers endowed with shareable relational data sources. Semantic global mappings are also possible from the current super-peer to interlink with other super-peers located in their super-peer domains. A super-peer is the only place, at the virtual platform (VP), where an agent can empower the data integration and access control services for a super-peer domain. Through the semantics-enabled privacy protection policies, authorized view-based queries posed to a super-peer can enable the data integration without losing a user’s privacy. The ontology mapping and merging algorithm with a local-as-view (LAV) source description that creates a global ontology schema in a super-peer by integrating multiple local ontology schemas for data integration. The perfect rules integration of datalog rules enforces the data query and protection services. Finally, using a global-local-as-view (GLAV) for global semantic mappings among super-peers, we have a greater flexibility of data integration and protection in the super-peer architecture.

Keywords: Semantics-enabled policies, super-peer data integration, privacy protection, ontology and rule

1. Introduction

Large enterprises spend a great deal of time and money on data (or information) integration [Bernstein and Haas, 2008]. Data integration is the problem of com-bining data from autonomous and heterogeneous sources, and providing users with a unified view of these data through so called global (or mediated) schema. The global schema, which is a reconciled view of the information, that provides query services to end users. The design of a data integration system is a very complex task and includes several different issues: heterogeneity of the data sources, relation between the global schema and the data sources, limitations on the mechanisms for accessing the sources, and how to process queries expressed on the global schema,

2 Yuh-Jong Hu et al.

etc [Calvanses et al., 2002].

We face a data request for a tremendous amount of heterogeneous and scalable data sources on the web. A peer data management system (PDMS) inherits the spirit of PAYGO approach that enables a wide-scale data integration [Franklin et al., 2005] [Madhavan et al., 2007]. In a PDMS, each peer exports data in terms of its own schema, and information integration is achieved by establishing mappings among the various peer schemas. In the super-peer network architecture, we group a set of peers into a super-peer domain and organize them into a two-level architecture. In the lower level, called the peers, and in the upper one, called the super-peer [Ben-eventano et al., 2007]. More precisely, a peer integrates data sources into a local ontology. A super-peer contains a data integration system, which integrates these local peers’ ontologies into a global ontology through ontology mapping, alignment, and merging. Therefore, a traditional data integration system can be viewed as a special case of a PDMS.

Three approaches have been proposed to model a set of source descriptions that specify the semantic mapping between the source schema and the global schema.

The first one, called global-as-view (GAV), requires that the each concept in the global schema is expressed in terms of query over the data sources. The GAV deals with the case when the stable data source contains details not present in the global schema so it is not used for dynamically adding or deleting data sources.

The second one, called local-as-view (LAV), requires the global schema to be specified independently from the sources, and the source descriptions between the stable global schema, such as ontology and the dynamic data sources are established by defining each concept in the data sources as a view over the global schema [Cal-vanese and Giacomo, 2005] [Lenzerini, 2002]. LAV descriptions handle the case in which the global schema contains details that are not present in every data sources.

The third one, called global-local-as-view (GLAV), a source description that combines the expressive power of both GAV and LAV, allows flexible schema def-initions independent of the particular details of the data sources [Friedman et al., 1999] [Nash and Deutsch, 2007]. The data integration system uses these different source descriptions to reformulate a user query into a query over the source schemas.

However, data integration is hampered by legitimate and widespread privacy con-cerns, so it is critical to develop a technique that enables the integration and sharing of data without losing a user’s privacy [Clifton et al., 2004].

Privacy protection policies represent a long-term promise made by an enterprise to its users and are determined by business practices and legal concerns. It is un-desirable to change an enterprise’s promises to customers every time an internal access control rule changes. If possible, we should allow the integration of Plat-form for Privacy Preferences (P3P) and Enterprise Privacy Authorization Language (EPAL) policies to provide accountable and transparent information processing for data owners to revise their data usage permissions [Ant´on et al., 2007].

Although many organizations post online privacy policies, they must realize

Semantics-enabled Policies for Super-Peer Data Integration and Protection 3

that simply posting a privacy policy on their Web sites does not guarantee true compliance with existing legislation. Following the OECD’s Fair Information Prin-ciples (FIPs)^a, an organization should provide the norms of personal information processing for its data collection, retention, use, disclosure, and destruction. An organization must also be accountable for its information possession and should declare the purposes of information usage before collection. Moreover, an organiza-tion should collect personal informaorganiza-tion with an individual’s consent and disclose personal information only for previously identified purposes.

Fig. 1. A semantic privacy protection model extended from the P3P and EPAL integration for data integration and protection in a super-peer domain

Each enterprise as a peer declares its P3P privacy protection policies that takes the FIPs’ criteria (see Figure 1). Then EPAL policies are established in each site, corresponding to the P3P [Karjoth and Schunter, 2002]. For each data request, the data handling and usage controls are based on the EPAL policies. However, P3P and EPAL lack formal and unambiguous semantics to specify privacy protection policies so they are limited in the policy enforcement and auditing support for software agents. One of the research challenges for the online privacy protection problem is to develop a privacy management framework and a formal semantics language to empower agents to enforce privacy protection policies. Agents must avoid any policy violation of each data request. We attempt to establish a semantic privacy protection model for a super-peer domain to address this issue. In a super-peer domain, each peer shares its collected data with other peers but without breaking the original data usage commitment to its clients [Karjoth et al., 2003].

aSee http://www.privacyrights.org/ar/fairinfo.htm

4 Yuh-Jong Hu et al.

1.1. Research Issues and Contributions

In this paper we are addressing the following research issues:

• We aim at providing data integration and protection services for various data sources to perform effective data sharing for different purposes in a super-peer domain,. The incentives for using a super-peer model involve the avoidance of solving the complex pair-wise ontology matching and rule integration problems between peers. In addition, various complex ontology evolution and compatibility issues among peers can be hidden in a super-peer domain.

• Privacy protection policy representation and enforcement issues are also addressed. Policies are expressed as a combination of ontologies and rules, i.e. O + R, where ontology O includes TBox schemas and ABox instances, and rule R includes deductive rule sets (RS) and facts (F). Data integration and protection are achieved at the super-peer for multiple peers through a combination of semantics-enabled formal protection policies (FPP).

• In a super-peer domain, the challenge of designing a semantic privacy pro-tection model is to ensure a soundness and a completeness of data inte-gration and protection within a super-peer domain. For the soundness cri-terion, we do not allow unintended data being released to the data users through the global policy schema (GPS) at the super-peer. Otherwise, it violates the privacy protection policies. As for the completeness criterion, we do not miss any eligible shared data when a user asks for a data re-quest service at the super-peer. Therefore, shareable data obtained at the super-peer should equal data obtained directly from each peer.

• In the multiple super-peer domains environment, we focus on using an emer-gent semantic mapping technique from a super-peer domain to interconnect with another one when additional information is requested on demand. This wide-scale data integration and protection problem faces the challenge of effectiveness data sharing without causing any semantic ambiguity of on-tology mappings among super-peers. In addition, we avoid the undecidable computation of query answering posed to the super-peer by using acyclic schema mappings in a tree-based information query.

Our contributions. Our main contributions are: (i) We offer a three layer seman-tic privacy-preserving model for a super-peer domain. This extends our previous work on data integration for privacy protection policies [?]. We define a formal policy using ontology for privacy protection concepts and rules for data query and access control services. (ii) We focus on solving the soundness and completeness of query rewriting problems for a super-peer domain by using a perfect ontology merg-ing and rule integration. Followed by each possible data query at the super-peer, we briefly demonstrate how the soundness and completeness criteria of a privacy pro-tection data integration can be achieved. (iii)In the multiple super-peer domains

Semantics-enabled Policies for Super-Peer Data Integration and Protection 5

environment, we propose a tree-based information query technique by using the GLAV semantic ontology mappings among super-peers to achieve a wide-scale data integration. This avoids possible cyclic schema mappings as shown in [Calvanese et al., 2006]. We also adopt the top-down query answering strategy to pose au-thorized view-based queries over the super-peer to provide data integration and protection services. By incrementally collecting global information from each addi-tional super-peer domain, we use the GLAV schema mappings among super-peers to collect information from their peers by using the LAV mappings between a super-peer and its super-peers.

Outline. The paper is organized as follows. In Section 2, we present a semantic privacy-preserving model as a framework for data integration services. In Section 3, we define a formal policy combination as an integration of formal policies from au-tonomous data sources. Each formal policy is composed of ontologies and rules for each independent data source. A privacy protection policy is a type of formal policy used for specifying a data usage constraint from a data owner. In Section 4, we for-mally define a formal policy combination in terms of ontology mapping, alignment, and merging. Then, in Section 5, we demonstrate how a perfect rule integration is used for query rewriting at the super-peer corresponding to its local peers’ schema.

In Section 6, we briefly prove the soundness and completeness of privacy-preserving data integration for a super-peer domain. Following Section 7, the semantics of a super-peer data integration system is specified and demonstrated with an example.

Finally, we point out the related work and draw our conclusion.

2. A Privacy-Preserving Model

A semantic privacy protection model is proposed with three layers for a super-peer domain, where the bottom layer provides data sources from the relational databases and the middle layer provides a semantics-enabled local schema for each peer’s independent service domain. The top layer is served at the super-peer, which provides a unified global view of privacy-preserving data integration services (see Figure 2).

We have a merged global ontology schema created by mapping and aligning local ontology schemas with a LAV source description from multiple local schemas in the middle layer. The idea of using description logic (DL) to model the local and global schemas is to empower the ontology’s abstract concept representation and reasoning capabilities. A query is defined as an SQWRL datalog rule in the SWRL-based policy to access a global ontology [O’Connor and Das, 2009]. Each SQWRL data service query posed to a global ontology at the super-peer is mapped to multiple queries as SQWRL datalog rules for local schemas. This is a LAV query rewriting service which has been investigated in databases but has largely been unexplored in the context of DL-based ontologies [Friedman et al., 1999].

6 Yuh-Jong Hu et al.

Fig. 2. A semantic privacy protection model in a super-peer domain

2.1. Formal Privacy Protection Policy

A policy’s explicit representation in terms of ontologies or rules depends on what the underlying logic foundation of your policy language is. If your policies are created from a DL-based policy language, such as Rein or KAoS, then ordinary policies are shown as TBox schemas and ABox instances. Otherwise, policies are created from an LP-based policy language, such as EPAL or Protune. In that case, ordinary policies are a set of rules with predicates of unary, binary, or ternary variables and facts [Bonatti and Olmedilla, 2005].

A formal policy (FP) is a declarative expression corresponding to a human legal norm that can be executed in a computer system without causing any semantic ambiguity. An FP is created from a policy language (PL), and this PL is shown as a combination of ontology language and rule language. Therefore, an FP is composed of ontologies O and rules R, where ontologies are created from an ontology language and rules are created from a rule language.

A formal protection policy (FPP) is an FP that aims at representing and enforc-ing resource protection principles, where the structure of resources is modeled as ontologies O but the resources protection is shown as rules R.

A privacy protection policy shown as an FPP is a combination of ontologies and rules, e.g., O + R, where DL-based ontologies, such as OWL-DL ontologies provide a well-defined structure data model for data integration, while Logic Program (LP)-based rules, such as datalog rules provide further expressive power for data query and protection. There are numerous O + R combinations available for designing pri-vacy protection policies, such as SWRL [Horrocks et al., 2005], and OWL2 RL [Grau et al., 2008b]. Each O + R combination implies whatever expressive power we can extract from ontologies for the rules and vice versa.

The SWRL is one of the O + R semantic web languages suitable for a policy

Semantics-enabled Policies for Super-Peer Data Integration and Protection 7

representation in the privacy protection model. However, this is not an exclusive selection. Other O + R combinations, such as CARIN, OWL2 RL are also possible for modeling formal privacy protection policy whenever their underlying theoretical foundations and development tools are available. We fully utilize the SWRLTab development tools and SQWRL OWL-DL query language [O’Connor and Das, 2009]

in the Prot´eg´e to model and enforce semantic privacy protection policies.

We face a research challenge of combining SWRL-based privacy protection poli-cies from multiple peers to ensure the soundness and completeness of data integra-tion and protecintegra-tion criteria in a super-peer domain. Another challenge is to solve

在文檔中提供具資料共享與保護的語意規範於雲端環境中 (頁 54-97)