Chapter 3 Rough Set Based ADR Detection
3.4 The Detection Method
3.4.1 Algorithm Description
Given a SRS dataset with missing values, we assume that the rule representing the ADR signal to be discovered is provided by the user. Our algorithm, as shown in Figure 3.3, computes the strength of the rule according to the following parameters, attribute covering (global or local), characteristic set (tolerance or similarity), approximation (singleton, subset, or concept), and the signal measure (PRR or ROR).
34
Input:
STab: the SRS data table;
RTemp: the rule template;
ACtype: the type of attribute coverings;
CStype: the type of characteristic sets;
APtype: the type of approximations;
MStype: the type of measures.
Output: The rule and the strength.
Steps:
1. Compute the characteristic sets of all records in STab according to ACtype and CStype;
2. Generate the four contingency sets, Xa, Xb, Xc, and Xd, according to the rule template RTemp;
3. Generate the lower and upper approximations of Xa, Xb, Xc, and Xd, according to the choosed approximation APtype;
4. Compute the rule strength using the approximate contingency X*a, X*b, X*c, and X*d, according to the measure MStype;
5. Return the rule with the computed strength;
Figure 3.3 Algorithmic framework of the proposed ADR detection method
The execution process is divided into four main phases:
(1) Compute the characteristic sets of each case in SRS;
(2) Generate the initial four contingencey sets, Xa, Xb, Xc, and Xd; (3) Generate the lower and upper approximations of Xa, Xb, Xc, and Xd; (4) Computing the strength of the rule in PRR or ROR measure;
In the following, we will describe each of the phases.
35
Phase 1: Compute the characteristic sets
The main task of this phase is, considering the missing values, to find similar cases of each case in the input data table. According to the definition of similarity characteristic set, if the missing values are indicated by lost (?), the process compares all attribute fields of two cases except those attribute values being null. On the other hand, if the missing values are indicated by don’t care (*), the process compares the specific attribute fields of two cases, and the null values are regarded as all possible values for the corresponding attribute field. The algorithm responsible for this phase is described in Figure 3.4.
Input: The data table STab, attribute covering type ACtype, characteristic set type CStype.
Output: The characteristic sets of all cases, denoted by KS;
Steps:
1. for each case r1 ST do
2. for each case r2 ST and r1 r2 do 3. if ACtype = ‘global’
4. then attribute set P A;
5. else P B; // B denotes the set of attributes in RTemp;
6. if CStype = ‘similarity’ then
7. if (for all fields f P, r1.f r2.f or r1.f null) 8. store r2 into KS(r1); // r2 is a similar case to r1;
9. else if CStype = ‘tolerance’ then
10. if (for all fields f P, r1.f r2.f or r1.f null or r2.f null) 11. store r2 into KS(r1); // r2 is a similar case to r1;
12. endif 13. endfor
Figure 3.4 Algorithm for computing characteristic set
36
Phase 2: Generate the contingency sets
The purpose of this phase is to obtain the initial four contingency sets Xa, Xb, Xc, and Xd, which correspond to the cells in the contingency table used for evaluating the strength of the rule. This phase is easy to be implemented, by simply inspecting each case in the data table ST against the corresponding conditions implicitly specified by the rule template RTemp, and assign the case into the corresponding contingency set.
Note that all cases with missing values appear in the attributes composed of the rule are ignored in this phase. We omit the algorithmic description of this procedure due to its simplicity.
Phase 3: Compute the lower and upper approximations
After the characteristic sets of all cases and the four contingency sets are ready for use, then we can proceed to this phase, which is, according to the type of approximation operation APtype, responsible for computing the lower and upper approximations for the four contingency sets, which are denoted as X*a, X*b, X*c, and X*d. For clarity, we separate this phase into two procedures, one for lower approximation (see Figure 3.5), and the other for upper approximation (see Figure 3.6).
Phase 4: Calculate the rule strength
The final phase is to calculate the measure value of the rule, either in PRR or ROR.
First, we count the approximate contingency values, a*, b*, c*, and d*, simply corresponding to the cardinalities of their lower and upper approximate contingency sets, i.e.,
a* = [| Xa |, |X |], ba * = [| Xb |, |X |], cb * = [| Xc |, |X |], dc * = [| Xd |, |X |]. d
37
Input: The characteristic set KS, the contingency sets Xa, Xb, Xc, and Xd, and the approximation type APtype.
Output: The lower approximations of Xa, Xb, Xc, and Xd, denoted by Xa, Xb, Xc, and Xd. Steps:
1. for each case r in STab do 2. switch APtype of 3. case ‘singleton’:
4. if (KS(r) Xa) store r into Xa; 5. else if (KS(r) Xb) store r into Xb; 6. else if (KS(r) Xc) store r into Xc; 7. else if (KS(r) Xd) store r into Xd; 8. case ‘subset’:
9. if (KS(r) Xa) Xa Xa ∪KS(r);
10. else if (KS(r) Xb) Xb Xb ∪KS(r);
11. else if (KS(r) Xc) Xc Xc ∪KS(r);
12. else if (KS(r) Xd) Xd Xd ∪KS(r);
13. case ‘concept’:
15. if (r Xa and KS(r) Xa) Xa Xa ∪KS(r);
16. else if (r Xb and KS(r) Xb) Xb Xb ∪KS(r);
17. else if (r Xc and KS(r) Xc) Xc Xc ∪KS(r);
18. else if (r Xd and KS(r) Xd) Xd Xd ∪KS(r);
19. end switch 20. endfor
Figure 3.5. Algorithm for computing lower approximations.
38
Input: The characteristic set KS, the contingency sets Xa, Xb, Xc, and Xd, and the approximation type APtype.
Output: The upper approximations of Xa, Xb, Xc, and Xd, denoted by X , a X , b X , c
and X . d
Steps:
1. for each case r in STab do 2. switch APtype of 3. case ‘singleton’:
4. if (KS(r) Xa ) store r into X ; a
5. else if (KS(r) Xb ) store r into X ; b
6. else if (KS(r) Xc ) store r into X ; c
7. else if (KS(r) Xd ) store r into X ; d
8. case ‘subset’:
9. if (KS(r) Xa ) XaX ∪KS(r); a
10. else if (KS(r) Xb ) XbX ∪KS(r); b
11. else if (KS(r) Xc ) XcX ∪KS(r); c
12. else if (KS(r) Xd ) XdX ∪KS(r); d
13. case ‘concept’:
15. if (r Xa and KS(r) Xa ) XaX ∪KS(r); a
16. else if (r Xb and KS(r) Xb ) XbX ∪KS(r); b
17. else if (r Xc and KS(r) Xc ) XcX ∪KS(r); c
18. else if (r Xd and KS(r) Xd ) XdX ∪KS(r); d
19. end switch 20. endfor
Figure 3.6. Algorithm for computing upper approximations.
39
Then we compute the strength (range value) of the rule by performing a simple range calculation according to the formula of PRR and ROR. The resulting formulas for range PRR and ROR are as follows:
)