CHAPTER 2 R ELATED W ORKS
2.4 OWL AND OWL-QL
The Web Ontology Language (OWL) [26] [28] is the most popular for implementing semantic web applications. It is a makeup language for sharing and publishing data using ontology on the World Wide Web (WWW). It also can easily express the ontology needed in a particular domain. There are three sub-languages in OWL. They are OWL Lite, OWL DL and OWL Full [28]. Figure 2-3 shows the inclusion relation of them.
OWL Lite OWL
DL OWL
Full
OWL Lite OWL
DL OWL
Full
Figure 2-3: Level relation of OWL language
In Figure 2-3, OWL-Lite is the smallest among the three. It is used to support the users who primarily need only a classification hierarchy and simple constraints.
OWL-DL supports users with the maximum expressiveness while retaining computational completeness. Finally, OWL- Full can also support users with the maximum expressiveness, but its syntactic freedom of RDF has no computational guarantees.
OWL is developed based on DAML (DARPA Agent Markup Language) and OIL (Ontology Inference Layer) [22], and is now a W3C [27] recommendation. The OWL includes class, property, object, some logics, and among others. For example, a class can be defined as owl:Class and a subclass can be defined as owl:SubClass.
In this thesis, we take the OWL Query Language (OWL-QL) as the query language [20]. OWL-QL is a formal language and protocol for querying answers represented in OWL. It is an updated version of the DAML Query Language (DQL).
A query request is parsed into three parts that are subject, property and object. These components are then filled in the OWL-QL query patterns that are a set of triples with the form (<property> <subject> <object>), where any item in the triple can be a variable. The patterns formed are then used to retrieve appropriate results.
CHAPTER 3
The Framework of the An Integrated OWL Data Mining and Query System
In this chapter, we describe the framework of the proposed integrated OWL data mining and query system. It consists of five sub-systems, including query parser, rule inference system, ontology management system, knowledge generation system and knowledge management system. The framework of the proposed system architecture is shown in Figure 3-1.
Figure 3-1: The proposed system framework
Figure 3-1 will be examined from the viewpoints of two different types of actors:
the end user and the system administrator. From the viewpoint of end users, the system functions on dealing with user’s query and generating answers. From the viewpoint of system administrators, the system acts as a back end to support the query
processing. They are illustrated below.
3.1 The Viewpoint of End Users
As mentioned above, from the viewpoint of end users, the system functions mainly on dealing with user’s query and generating answers. It mainly consists of the two sub-systems of query parser and rule inference system. The rule inference system further includes the two modules of the OWL query patterns and the inference engine.
When a query comes, the system processes it in the following scenario.
On receiving a query launched by an end user, the query parser first identifies the subject, predicate and object from the input query. It then sends the extracted items to the module of the OWL query patterns, where the relationship among the subject, predicate and object are found and expressed in a query pattern. The query pattern is then forwarded to the inference engine for finding matched knowledge. The inference engine thus searches the integrated knowledge base, and performs appropriate reasoning if necessary, to find the answers and output them to the user. The ingredients of the framework under this scenario are shown in Figure 3-2.
user
Figure 3-2: The ingredients of the framework in light of user viewpoint
3.2 The Viewpoint of System Administrators
From the viewpoint of system administrators, the system acts as a back end to support the query processing. It mainly consists of the three sub-systems of ontology management system, knowledge generation system and knowledge management system. The ontology management system is responsible for the building and management of domain ontology that is used to expand the semantic meaning of the data stored in the transaction database and the rule stored in the rule base. It includes two modules, the ontology editor and the domain ontology. The ontology editor is used by the system administrator to edit the ontology stored in the domain ontology module.
The knowledge generation system is in charge of the actual mining job from transaction data. It includes four modules. They are data manager, object-oriented transaction database, rule mining engine and association rules. The administrator or the user can update the data stored in the object-oriented transaction database through the data manager. The object-oriented transaction database stores the data in the form of objects, which may consists of the same set of attributes in a class of objects. The rule mining engine then finds inter- and intra-class association rules among data items from the object-oriented transaction database. The rules generated are then collected in the module of association rules.
The knowledge management system is used to convert the mined rules into the OWL format and to properly maintain them. It includes rule conversion engine, rule editor, OWL rule base, knowledge integration and integrated knowledge base. The rule conversion engine is responsible for automatically transforming the rules into the OWL format. It performs this according to some transformation rules and constraints.
In addition to the rules generated from the rule conversion engine, new OWL rules can also directly be created and existing OWL rules can be edited by an administrator through the rule editor. The system framework also allows an administrator to utilize some ontology editing tools, such as protégé, to manually transform the rules into the OWL format. The rules from the rule conversion engine or from the rule editor are kept in the OWL rule base. Finally, the OWL association rules are combined with the relevant part of the OWL ontology to form an integrated knowledge base through the knowledge integration module for answering users’ queries in a semantic way. The ingredients of the framework under this scenario are shown in Figure 3-3.
Rule
CHAPTER 4
Module Design in the Integrated OWL Data Mining and Query System
In this chapter, the functions of every sub-system and its components are described as follows.
4.1 Query Parser
Keywords usually have to be extracted to reflect the real request of a sentence or a query. The Query Parser is thus designed to achieve this purpose. For matching the OWL patterns, it recognizes subjects, properties and objects in a sentence or a query as keywords. There are two kinds of queries usually appearing. The first one is the data query, which is a simple one and is the most commonly used in past applications.
For example, assume we give the query “Who owns the car?”. It is first parsed to get the unknown subject (who), the exact property (owns) and the known object (car) according to the format of the OWL-QL Patterns [6][20]. The second one belongs to rule queries, which has still seldom been seen in the current applications. For example, assume the query related to the association rule “If beverage is an antecedent and snack is a consequent, what confidence is the rule?” is to be parsed. In the example, the words beverage, snack and rule are set as subjects; antecedent, consequent and confidence are set as objects; and “is” is set as a property (type). These items extracted are thus stored as the OWL query pattern and used for inference.
4.2 The Rule Inference System
The work of this sub-system is to infer appropriate answers according to user queries. It is based on the OWL Query Language. It consists of the two components:
the OWL query patterns and the inference engine. The functions of each component are described as follows.
4.2.1 OWL Query Pattern
The component is implemented by OWL Query Language (OWL-QL) [20].
which is a formal language and protocol for querying answers and represented in OWL. An OWL-QL query pattern is a set of triples of the form:
(<property> <subject> <object>)
Items in the triple may be constants or variables (with their names beginning with the character “?”). For the above example, the OWL query pattern receives the three keywords owns (property), who (subject) and car (object) from the query parser.
It then judges that the word “who” represents an unknown subject and replaces it with the variable “?”. The following triple is then formed:
(<owns> <?> <car>)
The triple may be presented by a RDF Graph [25] as shown in Figure 4-1.
? car
owns
subject
property
object owns
? car
subject
property
object
Figure 4-1: The RDF Graph representing the triple in the example
4.2.2 Inference Engine
The main function of the inference engine is to infer appropriate answers through the integrated knowledge base according to the OWL query pattern. Continuing the above example, assume the fact about “Tom owns a car” stored in the integrated knowledge base is expressed in the OWL form as follows:
<rdf:RDF>
<rdf:Description rdf:about="#Tom">
<owns rdf:resource="#car"/>
</rdf:Description>
</rdf:RDF>
</owl-ql:premise>
The inference engine will derive the answer as Tom in the following process:
Query: (“Who owns the car?”) Query Pattern: (owns ?p car)
Must-Bind Variables List: (?p) Answer: Tom
Note that the symbol “?p” represents a variable, and the inference engine must reason about the variable and output it to users. Figure 4-2 represents the match process of the query in visualization of the RDF Graph.
Tom car
own
?p car
own
Tom car
own
?p car
own
Figure 4-2: The match process of the query in visualization of the RDF Graph
There have been several products in the market which can be used to achieve the inference function. In our system, we use the OWL-QL Server to implement the functions. The server is developed by the Knowledge Systems Laboratory in Stanford University.
4.3 The Ontology Management System
The Ontology Management System is responsible for the building and management of the domain ontology that is related to the transaction data stored in the object-oriented transaction database. It includes two modules, the ontology editor and
the domain ontology. They are described as follows.
4.3.1 The Domain Ontology
The term Ontology [11][12][13][17][19] was proposed in the field of philosophy at the earliest. It is now commonly used for the research of semantic web. Ontology mainly presents the entries or things in the world, and relationships between the entries. Ontology also includes descriptions of classes, properties, their instances, and among others. It can be thought of as the representation of knowledge. In this system, we use ontology to describe items in the transactions, their classes (concepts), properties, class relationships, association rules, and among others. It can be built, handled and maintained by the system administrator through the ontology editor. An example of ontology regarding food is shown in Figure 4-3, which describes the concept hierarchy of food and its relationships among the concepts (classes).
food
Figure 4-3: An example of ontology regarding food
In this thesis, the domain ontology is represented in the OWL format. The domain ontology in Figure 4-3 is thus transformed into the representation depicted in Figure 4-4 through the ontology editor.
food
Figure 4-4: The OWL representation of the ontology in Figure 4-3
In Figure 4-4, each node is a concept and each link is a property. Some original attributes of an item are also represented as concepts in the OWL representation. For example, the class wine has an attribute age, which is also thought of as a class in the OWL representation. In the context of the OWL syntax, a class is defined by owl:Class. The semantic relationships among classes are represented by the properties and are specified through the OWL syntax of owl:OjectProperty. Besides, classes may have their subclasses. For the above example, beverage, snack and fruit are the subclasses of the class food, and the property hasCost connects the two classes of beverage and cost. In the representation of the RDF format, beverage is the subject, cost is the object, and hasCost is the property. Each subject should have its domains and each object have ranges. The link of has hasCost thus connects the domain of
beverage and the range of cost. That is, it relates instances of the class beverage to instances of the class cost.
4.3.2 The Ontology Editor
The ontology editor allows the system administrator to edit the domain ontology in the OWL format. The module can utilize an existing ontology editing tool, such as protégé, to achieve this purpose. If the system administrator has known that some classes or relationships about the data items need to be added, deleted or updated, he/she can directly edit the domain ontology through the ontology editor. In this paper, the ontology editor, Protégé [24] is used. It can allow users to build ontology for semantic web functions, in particular in the W3C's Web Ontology Language (OWL).
The interface of the Protégé software is illustrated in Figure 4-5.
Figure 4-5: A snapshot of the interface of Protégé
The ontology in Figure 4-4 is represented by OWL as follows:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns="http://www.owl-ontologies.com/unnamed.owl#"
xml:base="http://www.owl-ontologies.com/unnamed.owl">
<owl:Ontology rdf:about=""/>
<owl:Class rdf:ID="food"/>
<owl:Class rdf:ID="beverage">
<rdfs:subClassOf>
<owl:Class rdf:ID="food"/>
</rdfs:subClassOf>
</owl:Class>
<owl:Class rdf:about="#wine">
<rdfs:subClassOf>
<owl:Class rdf:about="#beverage"/>
</rdfs:subClassOf>
<owl:disjointWith rdf:resource="#milk"/>
<owl:disjointWith>
<owl:Class rdf:about="#tea"/>
</owl:disjointWith>
</owl:Class>
<owl:Class rdf:about="#tea">
<rdfs:subClassOf>
<owl:Class rdf:about="#beverage"/>
</rdfs:subClassOf>
<owl:Class rdf:ID="milk">
<rdfs:subClassOf>
</owl:Class>
<owl:Class rdf:ID="cost"/>
<owl:Class rdf:ID="expiration"/>
<owl:Class rdf:ID="alcohol"/>
<owl:Class rdf:ID="age"/>
<owl:Class rdf:ID="category"/>
<owl:Class rdf:ID="color"/>
<owl:Class rdf:ID="fat"/>
<owl:Class rdf:ID="flavor"/>
<owl:ObjectProperty rdf:ID="hasCost">
<rdfs:domain>
<owl:Class rdf:about="#beverage"/>
</rdfs:domain>
<rdfs:range rdf:resource="#cost"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasExpriation">
<rdfs:range rdf:resource="#expiration"/>
<rdfs:domain rdf:resource="#beverage"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasAge">
<rdfs:domain rdf:resource="#wine"/>
<rdfs:range rdf:resource="#age"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasAlcohol">
<rdfs:domain rdf:resource="#wine"/>
<rdfs:range rdf:resource="#alcohol"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasCategory">
<rdfs:domain rdf:resource="#tea"/>
<rdfs:range rdf:resource="#category"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasColor">
<rdfs:domain rdf:resource="#tea"/>
<rdfs:range rdf:resource="#color"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasFat">
<rdfs:range rdf:resource="#flavor"/>
<rdfs:domain rdf:resource="#milk"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasFlavor">
<rdfs:domain rdf:resource="#milk"/>
<rdfs:range rdf:resource="#flavor"/>
</owl:ObjectProperty>
</rdf:RDF>
4.4 The Knowledge Generation System
The subsystem mainly manages the association rules and transforms them into the OWL format. It includes rule conversion engine, rule editor, OWL rule base, knowledge integration and integrated knowledge base. They are described as follows.
The function of this sub-system is to generate association rules from an object-oriented transaction database. It includes the four modules: data manager, object-oriented transaction database, rule mining engine and association rules. They are introduced below.
4.4.1 The Data Manager
The data manager stores the data in a particular domain into the object-oriented transaction database. It can use operations such as insertion, deletion and update to handle the data. Any object-oriented database management system (DBMS) can play the role.
4.4.2 The Object-Oriented Transaction Database
An object-oriented transaction includes one or more purchased items, each of which is represented as an object or an instance. Each instance inherits its characteristics from a superior object, called class, which defines the basic structure of objects with common properties, including attributes, default values, and methods.
The roles of classes and instances in an object-oriented transaction data are like those that schema and tuples play in a relational database [15].
4.4.3 The Rule Mining Engine
This component aims at finding relationships among data items. It extracts the data stored in the object-oriented transaction database to generate inter- and intra-class association rules. Our previously proposed approach for mining rules from object-oriented transactions [15] is used here. There are three kinds of knowledge to be discovered: inter-class association rules, intra-class association rules and inter-intra class association rules. Objects and attributes are assumed to be binary, with the number 1 representing that the objects and the desired attributes appear. If they are not binary, they can be preprocessed by transforming an attribute with n values into n new binary attributes.
The mining process is processed in a top-down way to find the associations. It can be divided into three main phases. The first phase mines inter-class itemsets. That is, it discovers the association rules among the classes. The second phase mines the intra-class itemsets in individual classes. The third phase uses the results from the
above two phases to find the inter-intra itemsets. The results from the previous phase can be used to prune candidates in the current phase. Besides, the approach can be easily modified to stop at an intermediate phase if only the desired kind of knowledge is to be obtained. That is, the algorithm can stop at Phase 1 for getting only inter-class association rules or at Phase 2 for getting intra-class association rules. It can thus provide a flexible way according to users’ desires.
4.4.4 The Association Rules
The module of association rules store the knowledge mined from the data by the rule mining engine. As mention before, there are three kinds of knowledge to be discovered: inter-class association rules, intra-class association rules and inter-intra class association rules. Some examples of the three kinds of rules are shown below.
The inter-class association rules:
1. If the subclass = watermelon, then the subclass = apple with a confidence factor of 0.8;
2. If the subclass = watermelon, then the subclass = apple with a support factor of 0.6;
The intra-class association rules:
1. If the cookie (cost = 1), then cookie (category = 1) with a confidence factor of 1;
2. If the cookie (cost = 1), then cookie (category = 1) with a support factor of 0.7;
The inter-intra association rules:
1. If the watermelon (ripe = 1), then apple (grade = 2) with a confidence factor of