The Dynamic EMCUD Algorithm - 動態知識擷取方法之研究

Output: The rules with embedded meaning about variants.

Stage I: Collect all facts of the weak embedded rules as inference log of the RB.

Stage II: Generate the new variants acquisition table AT’.

Step 1: Discover large itemsets L using the inference log.

Step 2: Generate AT’ using L and additional attributes provided by experts.

Step 3: Update the AOT’ according to AT’.

Stage III: Use EMCUD to generate rules of new variants.

Step 1: Generate rules according to AT’ and AOT’.

Step 2: Merge AT’ into original main acquisition table AT.

Step 3: Merge AOT’ into original main AOT.

3.2 Inference Log Collecting Based upon Meta Rule

Without loss of generality, assume there are k attributes to classify m objects in the main acquisition table. Thus, the total number of the embedded rules used in

Dynamic EMCUD is limited. In order to assist domain experts in noticing and analyzing the occurrence of the candidates of variant objects, the following four meta rules are used in Dynamic EMCUD to collect the frequent inference log (fact/ raw data) of weak embedded rules to help experts notice the occurrence of new objects.

MR₁: IF R_i,j is fired THEN Increase C_i,j by one.

MR2: IF CF(Ri,j) ≤ THCF, THEN Log Ri,j.

MR3: IF Ci,j ≥ THcnt AND CF(Ri,j) ≤ THCF THEN Run VODKA Algorithm to acquire the variants acquisition table increment AND Reset TimeOut.

MR4: IF TimeOut = THPeriod THEN Run VODKA Algorithm AND Reset TimeOut.

The meta rule MR1 is used to count the fired frequency of each embedded rule (Ci,j). The meta rule MR2 means that all facts (attribute-value pairs) of the embedded rules with marginally acceptable CF lower than strong CF bound threshold (THCF) are logged as a record, (Ri,j, A1, A2, … .,Ak, CF(Ri,j)). The meta rule MR3 means that if there exists one weak embedded rule with fired frequency exceeding the warning line threshold (THCNT), new variants may be discovered iteratively using VODKA. The meta rule MR₄ means that VODKA will be executed periodically to refresh the new variants acquisition table. The TimeOut will be reset when MR3 or MR4 is triggered.

3.3 The NEO-Learning Module

As we know, the KBS is proposed to help experts solve the difficult problems in a specific domain based upon the pre-constructed static knowledge base. However, the new objects will be developed or discovered as times goes on and might result in the inefficiency of KBS. Based upon the embedded rules generated by EMCUD, some new evolved objects may be classified into well-known object class by the weak embedded rule with weak CF which is not strongly suggested by experts. Through monitoring the frequency of these weak embedded rules, the candidates of new

characteristics of these candidates of new objects could be extracted from these collected inference logs. The evidence of the new objects can be confirmed by experts and some attributes could be modified and added when the dynamic knowledge is needed to be singled out. Moreover, the relationships between these inference logs might be represented as the significance of each attribute to each new object. Hence, analyzing the evolving trends of all attribute should be useful in capturing the realistic significance of the attribute to the object.

The NEO-learning module can help experts analyze the interesting inference logs of weak embedded rules to learn the evidence of new evolved objects using the VODKA to notice experts the occurrence of the new objects. Based upon the confirmed new objects, the relationships of all attributes of each object are analyzed to set the significance of the attribute with the times using TEA to help experts decide the CF values of the embedded rules of new objects, which can be generated using EMCUD according to the discovered objects stored in an AT increment and an AOT increment. Finally, the AT increment and the AOT increment will be integrated with the main AT and the main AOT, respectively.

3.3.1 Frequent Events Analysis

EMCUD lacks the ability of grid evolution for singling the new evolved objects out of well-known objects since experts may be unaware of the occurrence of the new evolved objects without sufficient information. Hence, we propose VODKA to monitor the frequent behaviors of interesting inference logs of the weak embedded rules with the lower CF values for helping experts notice the occurrence of the new objects.

Figure 3.2 The Flow of VODKA

The novelty of the VODKA shown in Figure 3.2 is to collect the inference logs of weak embedded rules from each KBS to learn the candidates of new evolved objects for experts to make a confirmation. The minor attribute-value pairs between inference logs of weak embedded rules are useful to help experts discover new knowledge and determine whether new object is evolved based upon fired frequency. For each object, if its inference logs of weak embedded rules are frequent, the frequent minor attribute-value pairs could be treated as candidates of new evolved objects.

Furthermore, new attributes or attribute-values of the new object could be defined and used to generate a small AT increment. Hence, these candidates will be used to help experts single the new objects out of the extended object class using the new object acquisition module based upon the AT increment.

Therefore, if the new objects are confirmed by experts, the related ambiguous attributes (minor attributes), which might result in the marginally acceptable CF values of weak embedded rules, could be refined or new attributes could be added to improve the classification ability. If the initial data type of a minor attribute is too rough to describe the object, a superior data type is recommended and the values of the attribute in both original object and new evolved object should be modified.

For example, the BOOLEAN data type may be refined to SINGLE VALUE data

type (Hwang and Tseng, 1990). If changing the data type still can not discriminate the new variants from original objects, acquiring new attributes from domain experts will be suggested in the new objects acquisition module. According to the complexity of relations between objects and attributes or even the relations between different tables, it is hard for experts to cooperate with each other in building every column and every row for each table. Finally, the result of new objects and corresponding attributes can be used to construct the AT increment.

3.3.2 Trend Evolution Analysis

Although the original idea of constructing AOT makes EMCUD more adaptive to elicit embedded meanings, the relative importance of all attributes to each object could be adjusted since the dynamic knowledge may change or evolve with the times.

It means that some embedded rules, which are recommended by experts now, may become uncertain in the near future. Each object in the AOT is decomposed to record the relative importance of each attribute to the object with the times. Since the traditional Repertory Grid-based KA methods do not record the evolved trend of each new object and the EMCUD is difficult in deciding the ordering of all attributes of the object by experts, the TEA, which can discover the evolution of the relative importance of each attribute to each object with the times, is proposed to help experts monitor the significant importance changing of all attributes to each object in a time interval.

As shown in Figure 3.3, the object can be singled out of the old object according to the viewpoints of experts or the learning results of the frequency events analysis.

Each attribute can be simply assigned as “0” or “1” in each time point for indicating

whether it is important to each object or not, where “0” represents the attribute is considered as the unimportant attribute to the object and “1” represents the attribute is important to the object. The domain expert can then decide which attributes are required to be traced with the times if some ordering values of the attributes are hard to be decided immediately.

Figure 3.3 The Flow of TEA

The “0” or “1” is called an attribute event e_t of each object in a time point t, and the attribute event sequence of “0” and “1” is recorded in a table to capture the evolved behavior of each object. Hence, the AOT increment can be generated for evolving the relative importance of each attribute to each object (ordering values) according to the sequence of “0” and “1” events with the times using a time series analysis approach. Since the “1” means an attribute is important to an object, the consecutive “1” recorded in consecutive time points indicates that relative importance of the object should become higher. On the contrary, the consecutive “0” indicates that the relative importance of the object should be lower. Hence, a simplified time series analysis is proposed to capture the trend meaning and incrementally adjust the CF value of each rule. Let the initial value of each signal sequence be the original AOT value of the attribute to the object.

3.4 Grid Merging

In order to maintain the new discovered new object, we propose grid merging algorithm shown in Algorithm 3.2 to integrate the AT increment and AOT increment into the main AT and the main AOT, respectively. Therefore, the small AT and the small AOT instead of the whole large main AT and the main AOT are used to update the embedded rule base using EMCUD.

在文檔中動態知識擷取方法之研究 (頁 39-45)