Ideally, the user’s mining intention can be reflected in the mining model setting. However, without complete comprehension of the schema and domain related knowledge, the end users may develop mining models based on their experiences or intuition. The mining model formulated by the users can possibly be semantically invalid, leading to incorrect or redundant search space or mining results and wasting the mining efforts they have made. For a common mining model formulation interface, a system often provides as many syntactic error checking mechanisms as possible. For example, it provides a popup list or list box for attribute selection to avoid users’typo. However less effort has been made with semantic checking due to the lack of semantic relationship information beyond the data warehouse.
Below, we will elaborate on how an intelligent assistance can be built into the proposed data warehouse mining system under the support of ontologies.
Specifically, we will show through the assistance of the ontologies we have introduced, the mining model formulation interface can provide the semantic error detection and mining model element recommendation in an attempt to improve the effectiveness and the efficiency of mining processes. Users can express their mining intention more precisely and even clarify
CustID
or renew their original mining intentions. The intelligent checking mechanisms proposed in our system framework are shown in Figure 11.
Element Recommendation Semantic Check
User Interface
﴾ ﴿
Ontologies
Figure 11. Intelligent assistance in mining model setting
Intelligent assistance in semantic checking
Through the support of schema ontology and schema constraint ontology, the system can provide semantic checking against mining model elements. Figure 12 shows four different results will be displayed to inform the user of the appropriateness of his mining modeling setting, and provide a rationale if errors occur to help the user reformulate the model.
Conforming to the definition of the mining model, our system will check the main elements, including data granularity tG, mining attributes tM, filtering conditions wc and hc.
Figure 12. Semantic checking mechanism
(a) Semantic checking of tG
This checks the semantic legality of the transaction ID for data grouping. The transaction ID set, tG, represents the data granularity and is the key for the mining transactions. If this question is asked: “What product associations are there between daily customer’s
Resulting cases:
Case 1. Pass checking Case 2. Warning message
Case 3. Reject with error message
Case 4. Automatic correction with message Mining
Model Setting
Semantic Checking
Schema Ontology
Schema Constraint Ontology
purchase?”, then tG will be {CustID, Date}; if another question is asked: “From daily product category purchase, are there any associations between customer’s education and gender?”, then tGwill be {Cateogry, Date}. A user can select tG based on their needs or interests but may also set it incorrectly because of the lack of semantic understanding.
Below are some scenarios of incorrect settings.
Example 1: tG{Size}
The system will reject this setting by semantic checking against the constraint ItemOnly(Size) in the schema ontology.
Example 2: tG{Gender}
The result of the grouping will be only two transactions, which is too few to generate any rules. This will be rejected according to the constraint NoSingleGroup(Gender).
Example 3: tG= {CustID, Gender}
tG1= {CustID, Gender} is actually a redundant form of tG2= {CustID} according to constraint Decide(CustID, Gender). The mining space for tG1and tG2is exactly the same.
The system will automatically correct the setting of tG2with warning messages.
(b) Semantic checking of tM
This is the semantic legality check of the user’s interested mining items. This checking specifically verifies if tMviolates any of the constraints in the schema constraint ontology.
Example 4: tM= {Year} or tM= {Year, Education}
Tedious rules such as
Year“1999” Year “2000”or
Year“1999”,Education “High School” Year“2002”,Education “Elementary”
will be generated. According to the constraint GroupOnly(Year), the model settings will be rejected.
Example 5: tM= {ProdID, Size}
According to constraint Decide(ProdID, Size), the mining item ProdID determines the value of Size. This setting will generate known rules since it digs the associations between the product name and its size. The following rule is an example:
ProdName“IBM TP” Size “17 inch * 15 inch * 1 inch”.
Example 6: tM= {Gender}
Tedious rules will be generated such as
Gender“Female” Gender “Male”.
The system will reject the senseless setting according to the constraint NoIntraMining(Gender).
(c) Semantic checking of wc
In the mining model setting, the wc filtering is operated before grouping data into transactions with transaction ID. The checking of wc includes type consistency checking and domain checking.
Example 8: wc(City ’Japan’)
This example has no problem with type consistency but ‘Japan’is actually not a city, therefore the system will respond with a domain checking warning to the user.
Example 9: wc(ProdName in 3C_DomainOntology (‘All-in-one’, Classification, var_All))
The 3C domain ontology can be used for filtering conditions. If a user is interested in only the “All-in-one”related products in market basket analysis, all the objects with
‘classification’relationship to All-in-one in 3C domain ontology should be retrieved.
The domain ontology retrieved values are then used for selecting related transactions from the data warehouse.
(e) Semantic checking of (tG, hc)
This function checks the semantic legality of aggregation used in the filtering condition hc.
Note, in the star schema model, there are three different types of measures, additive,
semi-additive and non-semi-additive, of which the semi-semi-additive measures are defined along some dimensions. For this reason, the checking of hc should be considered in accordance with the grouping ID to avoid invalid aggregation along the wrong dimensions.
Example 10: tG{ProdName, Date}, hc(sum(SaleAmount) > 1000)
In the schema ontology ‘SaleAmount’is an additive measure, therefore the system will pass the checking.
Example 11: tG{CustID, Date}, hc (sum(Cost) < 100)
The system will reject the setting because ‘Cost’is a semi-additive fact and should be, as shown in Figure 6, aggregated along with dimensions including Product.
Intelligent assistance in element recommendation
As well as the semantic checking, the system offers recommendations to lead the user, especially the inexperienced one, toward a more efficient mining model formulation process.
The functions, taking partial input from the unfinished mining model element created by the user, spontaneously list the recommendations of possible successive mining model constituent drawn from user preference ontology for users to refer to. Based on the example of the user preference ontology in Figure 10, we present some examples as follows:
(a) Recommendation of tGby giving a partial tG
A partial tGis taken as the key to search in the user preference ontology. It can be empty once the user does not know how to start a model setting. The system will respond with the available tGlist.
Example 11: Given partial tG= {CustID}
The system will prompt with the list {Date},…, {Category} as succeeding tGcandidates for the user to refer to.
(b) Recommendation of tMby giving tG and partial tM
With input of tG and partial tM,, the system will search the available sets of mining attributes in the user preference ontology for recommendations.
Example 12: Given tG{CustID, Date} and partial tM{ProdName}
The system will prompt the user with the following list: {Education},…,{Salesman} as referable suggestions.
(c) Recommendation of ms and mc by giving tGand tM
Example 13: tG{CustID, Date}, tM{ProdName, Education}
The corresponding ms and mc of the given tGand tMin the user preference ontology will be listed. In this case, ms60% and mc 85% will be suggested.