Our goal in first step is to classify tables and columns into groups as Table 4.1. Classifying these entities is necessary to reach the more accurate mapping results when applying the following intuitions.
• Intuitively, two entities are hard to be mapped if their external structures are dif-ferent and therefore we group tables of the relational database schema into three kinds of table: Independent Tables, Associated Tables, and G/S Table. We map foreign keys to object properties and non-foreign keys to object or datatype prop-erties. Because object properties in OWL can link individuals of domain classes to individuals of range classes and foreign keys in the relational database are their counterparts since for foreign keys, they can link the instances of table they store in to the instances of table they refer to. But as for non-foreign keys, their counter-parts in OWL ontologies are datatype properties because both of them can not link
two individuals or instances to each other. Nevertheless, non-foreign keys can also be mapped to object properties because there are some columns in the relational database that can independently represent entities, i.e., these columns can be the range classes of object properties. Furthermore, foreign keys and non-foreign keys can be further classified into various groups and this will discussed in the following sections.
• Intuitively, two entities have different internal structure, especially for the different datatype, will be difficult to match each other and hence when mapping between non-foreign keys and datatype properties, the datatype between them will be taken into consideration.
Since the useful semantic data is not explicitly defined in the relational database schema, in other words, the information such as which table belongs to which group is invisible and hence we will first engage in the semantic enrichment of the relational database to extract some implicit information which can be used to find some classifica-tions of tables and columns such as G/S tables. Therefore, semantic enrichment is the task of gathering additional semantic information which is not explicitly available from the relational database system [6]. In the research of transforming the relational database schemas to other data models such as object-oriented databases schemas, semantic enrich-ment is considered an important process and hence many researches in this area had made some helpful contributions. In general, these results of research are presented as rules.
Furthermore, in the topic of extracting OWL ontologies from the relational database, the processes of extracting OWL ontologies also primarily focus on finding rules through analyzing the semantic of the relational database schema. Therefore, we will greatly take advantage of the contributions in these field to classify tables and columns to get their semantic.
4.3.1 Classifying Tables
In relational data model, there are some tables representing the real world entities whose instances can be identified exclusively through its own attributes and have no any rela-tionship with other tables. In this thesis, they are called independent tables since the
semantic of this kind of table usually explicitly reflects in the relational database schema and the following is its definition.
Definition 4.3.1. For table ti , ti is an independent table if F K(ti) = Φ and no other foreign keys of tables refers to ti
Associated tables are the main components in the relational data model represent-ing the various relationship between tables such as one-to-one relationship, many-to-one relationship and many-to-many relationship. Among them, a relationship table will be created in order to express the many-to-many relationship between tables. A relation-ship table in the relational data model denotes a set of tuples, each one is composed of instances of the entities that participate in the relationship. The number of the partic-ipation of these entities in a relationship constructs a n-arity relation. For example, for tables “package(packageID, numOfDays, departed,...)” and “spot(name, location)” in the relational database schema 5.3, package is a table used to store the related information of the journey. Therefore, a package can have many spots and a spot can be included by many packages. In order to represent many-to-many relationship between “package” and
“spot”, a relationship table “spotDetail(packageID, sname)” , which combining primary keys “packageID” and “sname” as foreign keys, need to be created. The followings are the definitions of associated tables and relationship tables.
Definition 4.3.2. For table ti , ti is an associated table if there exist a foreign key ci which is not an IS-A FK or some other foreign keys of tables refer to ti
Definition 4.3.3 (Relationship Table). For tables t1, t2,..., ti and tj , tj is a relationship table if P K(tj) = F K(tj) = P K(t1) P K(t2) . . . P K(t1)
To represent inclusions between the sets of instances of two entities, so called Gen-eralization/Specialization(G/S) tables are used. G/S tables states the inheritance of properties from a more general entity to a more specific one. In other words, two tables have a G/S relationship if every instance of one table is also an instance of the other ta-ble. In general, many studies [24] had considered that finding the tables have inheritance relationships is equal to finding the tables have the subsumed relationship between their own primary keys and hence the definition for G/S tables is as follow.
Definition 4.3.4 (G/S table). For tables ti, tj , ti and tj are G/S tables if P K(ti) ⊆ P K(tj) and exist Ref (ci) = tj, where ci ∈ F K(ti)
For example, for tables “accom(hotelCode, location)” and “resortHotel”(hotelCode, resort) in our example of relational database schema 5.3, where hotelCode of resortHo-tel refers to horesortHo-telCode of accom. These two tables are G/S tables since there exist P K(resortHotel)⊆ P K(accom) and Ref(hotelcode) = accom.
By using the definitions, independent tables and associated tables can be automat-ically detected by the system, but as for some G/S tables user interaction is required because the relational data model is known to be a semantically poor model and the con-cept of inheritance relationship cannot be directly represented in the relational database.
Besides, in the designing of the relational database schema, designers rarely consider the inheritance relationships between tables because the main goal of designing a relational database is to create the binary relationship between two tables. Therefore, as opposed to OWL ontologies, inheritance relationships are less and not explicit in the relational database schema. In order to find the tables ti and tj satisfying the definition of G/S table, we need to verify every instance of P K(ti) is also an instance of P K(tj). But since we have no instance in hand can proof it, some cases need querying users to find the G/S tables.
Now discussing these cases, for tables ti, tj and foreign key ci , ti and tj can at least have “is-part-of” and “has-part” relationship [24] if ci ∈ F K(ti), ci ∈ P K(ti) and Ref (ci) = tj. If the condition|P K(tj)| = 1 pluses to this case, we can further make sure ti and tj exist a G/S relationship since we know every instance of ti is also an instance of tj. But if the conditions |P K(ti)| > 1, |P K(tj)| > 1 and |P K(ti)| ≥ |P K(ti)| are satisfied, whether ti and tj have a G/S relationship can only be decided by interacting with users. For instance, if P K(ti) ={ca, cb}, P K(tj) = {cc, cd} and ca is also a foreign key of ti referring to cc of tj, ti and tj can not be made sure if they have a G/S relation since whether instances of cb are subsume by instances of cd or not can not be proofed.
4.3.2 Classifying Foreign Keys
After the process of classifying tables, the semantic of different kind of foreign keys can also be found concurrently [24]. These kinds of foreign keys are defined as follows.
Definition 4.3.5 (Base FK). For table ti and its foreign key ci, ci is a base FK if ci P K(ti)
Definition 4.3.6 (IS-A FK). For super-table tj, its sub-table ti and foreign key ci of ti, ci is an IS-A FK if ci ⊆ P K(ti) and ci ⊆ P K(tj)
Definition 4.3.7 (Part-Of FK). For table ti and its foreign key ci, ci is a Part-Of FK if ci is not an IS-A FK and ci ⊆ P K(ti)