• 沒有找到結果。

Intelligent Assistance for Mining Model Formulation with Ontologies

Ideally, the user’s mining intention can be reflected in the mining model setting. However, without complete comprehension of the schema and domain related knowledge, the end users may develop mining models based on their experiences or intuition. The mining model formulated by the users can possibly be semantically invalid, leading to incorrect or redundant search space or mining results and wasting the mining efforts they have made. For a common mining model formulation interface, a system often provides as many syntactic error checking mechanisms as possible. For example, it provides a popup list or list box for attribute selection to avoid users’typo. However less effort has been made with semantic checking due to the lack of semantic relationship information beyond the data warehouse.

Below, we will elaborate on how an intelligent assistance can be built into the proposed data warehouse mining system under the support of ontologies.

Specifically, we will show through the assistance of the ontologies we have introduced, the mining model formulation interface can provide the semantic error detection and mining model element recommendation in an attempt to improve the effectiveness and the efficiency of mining processes. Users can express their mining intention more precisely and even clarify

CustID

or renew their original mining intentions. The intelligent checking mechanisms proposed in our system framework are shown in Figure 11.

Element Recommendation Semantic Check

User Interface

﴿

Ontologies

Figure 11. Intelligent assistance in mining model setting

Intelligent assistance in semantic checking

Through the support of schema ontology and schema constraint ontology, the system can provide semantic checking against mining model elements. Figure 12 shows four different results will be displayed to inform the user of the appropriateness of his mining modeling setting, and provide a rationale if errors occur to help the user reformulate the model.

Conforming to the definition of the mining model, our system will check the main elements, including data granularity tG, mining attributes tM, filtering conditions wc and hc.

Figure 12. Semantic checking mechanism

(a) Semantic checking of tG

This checks the semantic legality of the transaction ID for data grouping. The transaction ID set, tG, represents the data granularity and is the key for the mining transactions. If this question is asked: “What product associations are there between daily customer’s

Resulting cases:

Case 1. Pass checking Case 2. Warning message

Case 3. Reject with error message

Case 4. Automatic correction with message Mining

Model Setting

Semantic Checking

Schema Ontology

Schema Constraint Ontology

purchase?”, then tG will be {CustID, Date}; if another question is asked: “From daily product category purchase, are there any associations between customer’s education and gender?”, then tGwill be {Cateogry, Date}. A user can select tG based on their needs or interests but may also set it incorrectly because of the lack of semantic understanding.

Below are some scenarios of incorrect settings.

Example 1: tG{Size}

The system will reject this setting by semantic checking against the constraint ItemOnly(Size) in the schema ontology.

Example 2: tG{Gender}

The result of the grouping will be only two transactions, which is too few to generate any rules. This will be rejected according to the constraint NoSingleGroup(Gender).

Example 3: tG= {CustID, Gender}

tG1= {CustID, Gender} is actually a redundant form of tG2= {CustID} according to constraint Decide(CustID, Gender). The mining space for tG1and tG2is exactly the same.

The system will automatically correct the setting of tG2with warning messages.

(b) Semantic checking of tM

This is the semantic legality check of the user’s interested mining items. This checking specifically verifies if tMviolates any of the constraints in the schema constraint ontology.

Example 4: tM= {Year} or tM= {Year, Education}

Tedious rules such as

Year“1999” Year “2000”or

Year“1999”,Education High School” Year“2002”,Education Elementary”

will be generated. According to the constraint GroupOnly(Year), the model settings will be rejected.

Example 5: tM= {ProdID, Size}

According to constraint Decide(ProdID, Size), the mining item ProdID determines the value of Size. This setting will generate known rules since it digs the associations between the product name and its size. The following rule is an example:

ProdName“IBM TP” Size “17 inch * 15 inch * 1 inch”.

Example 6: tM= {Gender}

Tedious rules will be generated such as

Gender“Female” Gender “Male”.

The system will reject the senseless setting according to the constraint NoIntraMining(Gender).

(c) Semantic checking of wc

In the mining model setting, the wc filtering is operated before grouping data into transactions with transaction ID. The checking of wc includes type consistency checking and domain checking.

Example 8: wc(City ’Japan’)

This example has no problem with type consistency but ‘Japan’is actually not a city, therefore the system will respond with a domain checking warning to the user.

Example 9: wc(ProdName in 3C_DomainOntology (‘All-in-one’, Classification, var_All))

The 3C domain ontology can be used for filtering conditions. If a user is interested in only the “All-in-one”related products in market basket analysis, all the objects with

classification’relationship to All-in-one in 3C domain ontology should be retrieved.

The domain ontology retrieved values are then used for selecting related transactions from the data warehouse.

(e) Semantic checking of (tG, hc)

This function checks the semantic legality of aggregation used in the filtering condition hc.

Note, in the star schema model, there are three different types of measures, additive,

semi-additive and non-semi-additive, of which the semi-semi-additive measures are defined along some dimensions. For this reason, the checking of hc should be considered in accordance with the grouping ID to avoid invalid aggregation along the wrong dimensions.

Example 10: tG{ProdName, Date}, hc(sum(SaleAmount) > 1000)

In the schema ontology ‘SaleAmount’is an additive measure, therefore the system will pass the checking.

Example 11: tG{CustID, Date}, hc (sum(Cost) < 100)

The system will reject the setting because ‘Cost’is a semi-additive fact and should be, as shown in Figure 6, aggregated along with dimensions including Product.

Intelligent assistance in element recommendation

As well as the semantic checking, the system offers recommendations to lead the user, especially the inexperienced one, toward a more efficient mining model formulation process.

The functions, taking partial input from the unfinished mining model element created by the user, spontaneously list the recommendations of possible successive mining model constituent drawn from user preference ontology for users to refer to. Based on the example of the user preference ontology in Figure 10, we present some examples as follows:

(a) Recommendation of tGby giving a partial tG

A partial tGis taken as the key to search in the user preference ontology. It can be empty once the user does not know how to start a model setting. The system will respond with the available tGlist.

Example 11: Given partial tG= {CustID}

The system will prompt with the list {Date},…, {Category} as succeeding tGcandidates for the user to refer to.

(b) Recommendation of tMby giving tG and partial tM

With input of tG and partial tM,, the system will search the available sets of mining attributes in the user preference ontology for recommendations.

Example 12: Given tG{CustID, Date} and partial tM{ProdName}

The system will prompt the user with the following list: {Education},…,{Salesman} as referable suggestions.

(c) Recommendation of ms and mc by giving tGand tM

Example 13: tG{CustID, Date}, tM{ProdName, Education}

The corresponding ms and mc of the given tGand tMin the user preference ontology will be listed. In this case, ms60% and mc 85% will be suggested.

相關文件