Chapter 7 Knowledge Fusion
7.2 Knowledge Fusion Framework
The process of our proposed approach consists of three phases: the preprocessing phase, the partitioning phase, and the ontology construction phase. The whole process is illustrated in Figure 7.4. Firstly, the preprocessing phase deals with syntactic problems such as format transformation and rule base cleaning, and construct the
relationship graph according to the cleaned, transformed flat rule base. Secondly, the relationship graph is partitioned according to Criterion 1 and Criterion 2 in the partitioning phase. Finally, the new ontology of the flat rule base is constructed using the partitioned relationship graph in the ontology construction phase. The three phases can be described in detail in the rest of this section.
Relationship Graph
Figure 7.4: The knowledge fusion framework
7.2.1 Preprocessing Phase
The preprocessing phase consists of format transformation, rule base cleaning, and relationship graph construction. The format transformation of the source rule bases consists of two steps: one is to transform the rules to first-order logic, and the other is to remove the ontologies of the rule bases. In this paper, we assume that the syntactic heterogeneity is solved by ODBC, HTML, XML, and other related technologies
[VSV+01]. After preprocessing, the rules from all rule bases are then stored into a flat rule base. There is currently no ontology about the flat rule base. The ontology will be built in the relationship graph partitioning phase.
After all rules of the original rule bases are logically preprocessed, we should put all rules together and perform knowledge cleaning, such as validation and verification.
The problem with rules includes redundancy, contradiction/conflict, circularity and incompleteness [GR89][RN95]. Directed Hypergraph Adjacency Matrix Representation [RSC97] is used to validate and verify the rules for completeness, correctness, and consistency. This cleaning step provides a basis for the relationship graph construction.
After cleaning the flat rule base, the construction of the relationship graph can be performed. The algorithm we proposed is as follows:
Algorithm 7.1: Relationship Graph Construction Algorithm Input: A rule base B = (VB, RB, CB)
Output: An un-partitioned relationship graph G = (VG, RG, P, L, I, O) Step 1. Set VG = VB, RG =s RB.
Step 2. For each two rules r1, r2∈RG, let S be the intersection of the variables of RHS1 and the variables of LHS2, add the link (S, r1, r2) to L.
Step 3. Set I as the variables of all LHS sentences of all rules.
Step 4. Set O as the variables of all RHS sentences of all rules.
7.2.2 Partitioning Phase
Before introducing our proposed algorithm, we take a brief discussion about shared vocabulary ontology, semantic distance function, and pseudo rules in the following sections.
7.2.2.1 Shared Vocabulary Ontology and Semantic Distance Function
The shared vocabulary ontology can be constructed either by domain experts or by the general lexical reference system, such as WordNet [MBF+90]. If the knowledge sources to be fused are in the same or related domains, the customized shared vocabulary ontology for the domains is more proper than general one.
The semantic distance function we use is based on the Hirst and St-Onge’s measure of semantic relatedness [HS98], and is defined as follows:
Sv(v1, v2) = path_length + c * d, ∀ v1,v2∈V, (4)
where path_length is the length from v1 to v2 in the shared vocabulary ontology, d is the number of changes of direction in the path, and c is constant. If the path does not exist, the function returns “infinity”. Sv(v1, v2) = 0 if and only if v1=v2.
7.2.2.2 Pseudo Rules
Before partitioning the relationship graph, we should firstly transform the incoming variables and outgoing variables of a relationship graph into two set of pseudo rules, Pseudo Incoming Rule Set and Pseudo Outgoing Rule Set, respectively. These pseudo
rules add connections among rules and help for dealing with shallow knowledge, of which the connected rules may be too few for generating partitions. Each of the incoming variables is transformed to a Pseudo Incoming Rule by the following format:
If TRUE Then <An_Incoming_Variable>
Similarly, each of the outgoing variables is transformed to a Pseudo Outgoing Rule by the following format:
If <An_Outgoing_Variable> Then EMPTY
The pseudo rules should be eliminated after partitioning the relationship graph. The removal of the pseudo rules is simply to discard all pseudo rules of all rule classes. If a rule class is empty after the removal, remove the rule class too.
Example 6
In this example, we start with the un-partitioned relationship graph from G1 and G2, as illustrated in Figure 7.5. The rules are the same as those in Example 3. After the transformation, three pseudo rules are generated:
s1: If TRUE Then pps s2: If TRUE Then spps s3: If alert Then EMPTY
The un-partitioned, pseudo-rules-added relationship graph for G1 and G2 is illustrated in Figure 7.6.
r2 r3
I
O hl
pps spps
sf
alert r1
Figure 7.5: The un-partitioned relationship graph
r3 r4
hl
pps spps
sf
alert r1
s1 s2
s3
Figure 7.6: The un-partitioned, pseudo-rules-added relationship graph
7.2.2.3 The Partitioning Algorithm
After the un-partitioned relationship graph (including pseudo rules) is constructed, the partitioning process can be performed. Combining Criterion 1, Criterion 2 and Criterion 3, the following function for a partition pi is used for the partitioning process:
Fp(pi) = Lp(pi) + k * Sp(pi) – l * Ip(pi), (5)
where k and l are defined before algorithm running, represents the weight of the importance of three criteria. The following algorithm is proposed based on the greedy growth concept.
Algorithm 7.2: Relationship Graph Partitioning Algorithm
Input: An un-partitioned relationship graph G, pseudo rules added.
Output: A partitioned relationship graph G’, pseudo rules not removed yet Step 1. Randomly select a rule from rules of G, and add it to a new partition p.
Step 2. Select rule r from G which is connected to p, p’=p+{r}, with minimal Fp(p’).
Step 3. If Fp(p’) ≦Fp(p), p = p’.
Step 4. If there is any rule that is connected to p, go to Step 2.
Step 5. Add p to G’.
Step 6. If there is any rule in G, go to Step 1.
Example 7
In this example, we continue with the un-partitioned, pseudo-rules-added relationship graph from G1 and G2, as illustrated in Figure 7.7. Let k=0.5 and l=0.5 in the semantic distance function and c=1 in the algorithm. For the shared vocabulary ontology illustrated in Figure 7.3 (of which each path of any two nodes contains only one
“change of directions”), the values of semantic distance function are the same as Table 1.
r3 r4 hl
pps spps
sf
alert r1
s1 s2
s3
Figure 7.7: The relationship graph G3, before removing the pseudo rules
r2 r3
I
O hl
pps spps
sf
alert r1
Figure 7.8: The relationship graph G3
Firstly, we select r2 randomly, and add it to a new partition p4; Lp (p4) = 2 + 1 = 3, Sp
(p4) = (2.5 + 4.5 + 4.5) / 3 = 3.83, Ip(p4)=0, Fp (p4) = Lp (p4) + Sp (p4) - Ip(p4)= 6.83.
Consider three partitions p5 = {r2, r1}, p6 = {r2, r3}, and p7 = {r2, s2}. Fp (p5) = 6.5, Fp
(p6) = 6.83, and Fp (p7) = 4.83. Therefore, p7 is chosen to be the only one partition now. Consider partition p8 = {r2, s2, r1} and p9 = {r2, s2, r3}; Fp (p8) = 4.17, Fp (p9) = 4.5; p8 is chosen. Now consider p10 = {r2, s2, r1, s1}, p11 = {r2, s2, r1, r3}; Fp (p10) = 3.75, Fp (p11) = 4.35; p10 is chosen. Then consider p12 = {r2, s2, r1, s1, r3}; Fp (p12) = 3.9 > Fp (p10); p10 is retained. Since all rules connected to p10 are checked, p10 is the
finally obtained partition.
Now we pick s3 as p11. Fp (p11) = Lp (p11) + Sp (p11) – Ip(p11) = 1 + 0 - 4 = -3.
Consider partition p12 = {s3, r3}; Fp (p12) = 1 + 2.5 – 3.69 = -0.19; p11 is confirmed.
Since there is only one rule r3 left, it is a partition itself, p13. Therefore, three partitions are generated, as illustrated in Figure 7.7. The final result of the algorithm, G3, is illustrated in Figure 7.8. G3 = (V, R, L, P3, I, O).P3 = {p10, p13}, p10={r1, r2}, p13={r3}.
The evaluations of G3 by the three criteria are as follows. The result G3 is better than G1 and G2, by both criteria.
.Criterion 1: LG(G3) = (2+3) / 2 = 2.5, which is the same as G1 but smaller than G2.
.Criterion 2: SG(G3) = ( (21/6) + (2.5/1) ) / 2 = 3, which is smaller than G1 and G2.
.Criterion 3: IG(G3) = ( (29.5/8) + (29.5/8) ) / 2 = 3.69, which is larger than G1 and G2.
7.2.3 Ontology Construction Phase
The final phase of our proposed framework is to construct ontology according to the partitioned relationship graph. Ontology includes many aspects of conceptualization [HPH01][RN95][SOW00]. Among them, two important aspects are discussed in our work: classes and relationships.
Three classes are generated by the relationship graph: Variable, Rule, and RuleClass, which map to the variables, rules, and partitions, respectively. The name of a RuleClass is given arbitrarily but uniquely. Table 2 shows the classes and relationships of the generated ontology.
Table 7.2: The classes and relationships of the generated ontology
Relationships (Properties) Class
Property Type Description
Variable Name Unique Text The name
The rule class belonged Ante. Var. Set of Variable The LHS variables Rule
Cons. Var. Set of Variable The RHS variables Name Unique Text The name Rules Set of Rule
Name
The rules contained Key. Var Set of Variable The key variable
In. Var. Set of Variable The incoming variables Rule-
Class
Out. Var. Set of Variable The outgoing variables
Three kinds of relationships, represented by properties, are generated by the relationship graph: the Rule-RuleClass relationships, the Rule-Variable relationships, and the RuleClass-Variable relationships. The Rule-RuleClass relationships map to the members of the partitions, and are represented by the properties belongTo and hasRule of Rule and RuleClass, respectively. The Rule-Variable relationships hasLHSVariables and hasRHSVariables represented by the corresponding properties of rule are gained from the involved variables of LHS and RHS of the rules respectively. The RuleClass-Variable relationships hasIncomingVariables and hasOutgoingVariables represented by the corresponding properties of rule class are gained from the incoming variables and outgoing variables of the partitions respectively. In addition to the name, incoming and outgoing variables, a RuleClass contains another semantic relevant property, hasKeyVariables. The hasKeyVariables property is a set of the names of the lowest super-ordinates (most specific common subsumers) of all terms involved in the rule class in the shared vocabulary ontology.
This property indicates that the key variables of a RuleClass, can briefly summarize the semantic meanings of a RuleClass.
Example 8
For the relationship graph G3 in Example 6 (Figure 7.8), a RuleClasses c1, represented by DAML+OIL[HPH01], is shown in Figure 7.9.
Figure 7.9: Ontology of RuleClass c1