M ETA - RULE C ONSTRUCTION P ROCESS - AUTOMATIC META-RULES CONSTRUCTION

CHAPTER 4. AUTOMATIC META-RULES CONSTRUCTION

4.3 M ETA - RULE C ONSTRUCTION P ROCESS

Cluster Similarity Matrix can be generated again, i.e.,

 similarity threshold st. The process is terminated and output the result, set of the rule clusters, {{r₁, r₂}, {r₃, r₄}, {r₅}}.■

4.3 Meta-rule Construction Process

The second process of Automatic Meta-rule Constructor is meta-rule construction which is used to extract meta-rules from the partitioned rule clusters by Meta-rule

Extractor. Meta-rule Extractor consists of two subcomponents, Meta Apriori Algorithm and Confidence Calculator. The Meta Apriori algorithm is modified from Apriori algorithm which is used widely to generate frequent large itemsets in data mining algorithms [1]. The Meta Apriori algorithm is used to generate the meta-rules, and Confidence Calculator calculates the confidence value of each meta-rule. The meta-rule generated by Meta-rule Extractor is then stored in the Meta-rule base for further usage. The whole process is illustrated in Figure 7.

Rule Clusters

Meta Apriori Algorithm

Meta-rule Base Confidence

Calculator

Meta-rule Extractor

Figure 7. Components of Meta-rule Extractor.

In the following paragraphs, Meta Apriori algorithm is introduced first, and then the process of meta-rule generation will be examined.

4.3.1 Meta Apriori Algorithm

The Meta Apriori algorithm tries to discover the most frequent combinations of expressions to describe the rule cluster. The basic idea is that those most frequent combinations of expressions are used in many rules of the rule clusters, and once the combination is met, those rules may be related to the result. Different from Apriori algorithm, the transactions and itemsets defined in Meta Apriori algorithm are rule conditions and expressions. The notations used in Meta Apriori algorithm is given in Definition 4.11.

Definition 4.11 Transaction and itemset used in Meta-Apriori.

Given a rule cluster g_i = {r_i1, r_i2, …, r_iN}, where N is the number of rules in the rule cluster gi, the transaction and itemset are defined below:

tij = CONDITIONS_ij, where ^CONDITIONSij ∈^rij, ^j∈

[

1K^N

]

, is a set of expressions to be used as one transaction.

d: the itemset of Meta Apriori algorithm is a set of expressions

Example 4.9 From Example 4.8, two rules, r1 and r2, are grouped into the same rule cluster g₁, i.e. g₁ = {r₁, r₂}. The corresponding transactions are

t11 = {(protocol = TCP), (protected_network_direction = A), (source_port >

8080), (string = NetBus)}

t12 = {(protocol = TCP), (protected_network_direction = A), (source_port >

1023), (string = NetBus2)} ■

In Meta Apriori algorithm, the support count of the itemset is defined as the number of transactions that the itemset subsumes. That is, the set of expressions of the itemset subsume those of the transactions. Therefore, expression subsumption must be defined. For two expressions, em and en, em subsumes en if sub(em ,en) = 1, where sub() is called expression subsume function, which is defined in Definition 4.12.

Definition 4.12 Expression subsumption function.

Given two expressions em = (attributem operatorm valuem) and en = (attributen

operator_n value_n),

( )

, _n 1 ^attributeⁿ ^attribute^m

Definition 4.13 Itemset subsumption function.

Given an itemset d and a transaction t, the itemset subsumption function is defined as

( ) ( )

Input: A set of transactions, T; minimum support threshold, min_sup.

Output: A set of frequent itemset, D

Step 1. Generate the set of frequent 1-itemsets, D1, by scanning T.

Step 2. Set initial value of k to 2.

Step 3. Generate candidate k-itemsets Cik from D_i(k-1).

Step 4. For each k-itemset dk ∈Cik, compute the support count, that is, dk.support

Step 7. Output Dik.

Example 4.10 In the Example 4.8, rule base RB = {r₁, r2, r3, r4, r5} is partitioned into three rule clusters, g1 = {r1, r2}, g2 = {r3, r4}, and g3 = {r5}. The following steps illustrate how to apply Meta Apriori algorithm on g₁ to generate frequent itemsets.

The minimum support threshold, min_sup, is set to 0.9. For the sake of simplicity, every expression occurred in the RB is encoded in Table 1.

encoding expression

e₁ (protocol = TCP)

e₂ (protected_network_direction = A) e₃ (source_port > 8080)

e4 (source_port > 1023) e₅ (string = NetBus) e₆ (string = NetBus2)

Table 1. Encodings of expressions in RB.

1. Since there are two rules in g₁, T₁ consists of two transactions, t₁₁ and t₁₂. Those are listed below:

^t11 = {e₁, e₂, e₃, e₅};

^t12 = {e1, e2, e4, e6}.

2. In the first iteration, each expression in the transaction is a member of the set of candidate 1-itemsets, C₁₁. The algorithm scans all of the transactions in order to count the number of transactions that each itemset subsumes and summaries the result in Table 2.

itemset count

{e1} 2

{e2} 2

{e3} 1

{e4} 2

{e5} 1

{e6} 1

Table 2. The support count of each candidate in C11.

3. The minimum support count required is 2 (T₁ ⋅min_sup=2⋅0.9=1.8). The frequent 1-itemsets, D₁₁, can then be determined. Each itemset in D₁₁ must satisfy minimum support count; the content of D11 is thus {e1, e2, e4}.

4. The candidate 2-itemsets C₁₂ can be generated from D₁₁. To discover frequent 2-itemsets, D12, the transactions in T1 are scanned and the support count of each candidate is accumulated, as shown in the Table 3. Since support of each candidate is larger than or equal to minimum support count, the set of large 2-itemsets, D12, is {{e1, e2}, {e1, e4}, {e2, e4}}.

itemset count

(e1, e2) 2 (e1, e4) 2 (e2, e4) 2

Table 3. Support count of each candidate in C12.

5. The candidate 3-itemsets, C13, can be generated in the same way, which is listed in Table 4. Since D₁₃ =1, no more candidate itemsets of C₁₄ can be generated,

and the process is thus terminated.

itemset count

{e1, e2, e4} 2

Table 4. Support count of candidate in C13.

6. Therefore, the final output is D13 = {{e1, e2, e4}}. According to Table 1, the frequent combination of expressions is {(protocol = TCP),

(protected_network_direction = A), (source_port > 1023)}. ■

4.3.2 Meta-rule Generation

The meta-rule generation is based on the concept of constraint-based association mining [29][31]. That is, the format of meta-rule is specified in advance. The definition of meta-rule is given by Definition 4.14. The meaning of the meta-rule, mrj

= (CONDITIONS_j, (RULE_CLUSTER = g_i), conf), is that if all expressions of CONDITIONSj are satisfied, the rule cluster gi will be selected with confidence value, conf.

Definition 4.14 Meta-rule representation.

For a given rule cluster gi, each meta-rule, mrj, generated from gi has the form:

mr_j = (CONDITIONS_j, (RULE_CLUSTER = g_i), conf), where CONDITIONS_j is a set of expressions in the condition part of mrj, and conf is the confidence value of the meta-rule.

The frequent sets of expressions generated from one rule cluster may be the same with those generated from the other rule clusters. Therefore, once the frequent itemsets of all rule clusters are generated by Meta Apriori algorithm, the next step is to determine the confidence value of each corresponding meta-rule. Within the Meta-rule Extractor, Confidence Calculator is used to calculate the confidence value of each meta-rule generated from frequent itemsets by applying Confidence Calculation Algorithm as below.

Algorithm: Confidence Calculation Algorithm

Input: a set of meta-rules, MRB, in which meta-rules have no confidence values defined

Output: a set of meta-rules, MRB’, in which confidence values fro meta-rules specified

Step 1. At first, MRB’ is an empty set.

Step 2. Choose the meta-rules from MRB which have the same set of expressions in their condition parts with the first meta-rule of MRB. The first meta-rule of MRB is included in selection result.

Step 3. Accumulate the total support count of all selected meta-rules.

Step 4. Set the confidence value of each selected meta-rule via dividing its support count by total support count.

Step 5. Remove those selected meta-rules from MRB to MRB’.

Step 6. If MRB is empty, terminate and output MRB’.

Step 7. Go to step 2.

在文檔中基於規則庫切割的元知識建造方法 (頁 39-47)