Multiple Drugs and Multiple Symptoms Detection

Chapter 4 Methods for Signal Detection of Adverse Drug Reactions

4.2 Cube Based Methods

4.2.4 Multiple Drugs and Multiple Symptoms Detection

In this section, we will describe the third contingency cube-based algorithm, namely CBM-MM (Cube Based Mining for Signal Detection of Multiple Drugs and Multiple Symptoms). This algorithm is an extension of CBM-SM for detecting ADR signals involving multiple drugs and multiple symptoms. This kind of rules is analogous to the hybrid association rule in multi-dimensional association rule mining.

The contingency cubes used for CBM-MM are the same as those for CBM-SM that still

need to be stored in advance.

The CBM-MM algorithm consists of five main phases: (1) Initial candidate generation phase; (2) Frequent itemset generation phase; (3) New candidate generation phase; (4) Rule generation phase; and (5) Signal generation phase. Similarly, phases 2 and 3 are executed recursively until no itemset is generated. In the following, we will describe each of the five phases in detail. An example is also given to illustrate the CBM-MM algorithm.

(1) Initial candidate generation phase

The initial candidate generation phase is the same as that of CBM-SM introduced in section 4.2.3.

(2) Frequent itemset generation phase

This phase is similar to that of CBM-SM introduced in section 4.2.3 except the following difference: Each candidate itemset should be of the form I = {α₁, α₂, ..., αn, d₁, d₂,…, dp, s₁,s₂,…, sm} and the count accumulation is performed over all cells indexed by {α₁, α₂, ..., α_n, d_i, s_j} for 1 ≤ i ≤ p, 1 ≤ j ≤ m on contingency cube

<Attr(α₁), Attr(α₂), ..., Attr(α_n), Drug, PT>.

(3) New candidate generation phase

In this phase, the new set of candidate (k+1)-itemsets Ck+1 will be generatedfrom set of frequent k-itemsets Fk. This phase is an extension of the new candidate generation phase of CBM-SM. The algorithm of this phase is described in Figure 4.13.

Input: Frequent k-itemsets Fk .

Output: Candidate (k+1)-itemsets Ck+1. Steps:

15. C_k+1 = ∅ ;

16. for each itemset I1∈ Fk do 17. for each itemset I2∈ Fk do

18. if (I1[1] = I2[1] & I1[2] = I2[2] & ... & I1[k-1] = I2[k-1] & I1[k] ≠ I2[k] are different and belong to different attributes except Drug and PT

attribute) then 19. c = I1 I2 ;

20. if ((∃ k-subset s of c and s ∉ Fk) or (∃ two items of c belong to the same attribute except Drug and PT attribute) then

21. Delete c;

Figure 4.13. Algorithm of candidate generation phase.

(4) Rule generation phase

The task of this phase is to generate interesting rules from frequent k-itemsets (k ≥ 4). Thus, an itemset can be transformed into a rule if it satisfies four conditions: (1) Its attribute set covers all selected demographic attributes, and attributes Drug and PT; (2) No two items belong to the same attribute except PT and Drug; (3) At least two items of it belong to attribute PT; (4) At least two items of it belong to attribute Drug. For example, an itemset {a1, d1, d2, s1, s2} can be transformed into a rule as follows:

Age = a1, Drug1 = d1, Drug2 = d2 → Symptom1 = s1, Symptom2 = s2

(5) Signal generation phase

This phase is the same as that of CBM-SM, which has been introduced in section 4.2.3.

Example 4.4 Consider the base cube BC in Table 4.17. Assume a query Q dose not include any mining attribute and the selected measure PRR has been specified. In other words, the antecedent of generated rules is composed of drug information and the consequent is symptom. Let the frequency threshold be set at 3.

First, the initial set of candidate 1-itemsets C1 are generated from Drug and PT.

These itemsets are shown in Table 4.18. For each itemset in C1, find the corresponding contingency cube to accumulate the support count. Figure 4.14 shows the corresponding contingency cube <Demo_key, Drug> used for accumulating count of itemset {d1}. If {d1} appears in a cell of the corresponding contingency cube, its count will be increased by one. So, count(d1) = 4. The counts of other itemsets in C1 can be obtained similarly. The result is shown in Table 4.19. If the count of a candidate itemset is larger than or equal to the frequency threshold, this itemset is frequent. The set of frequent 1-itemsets F1 is shown in Table 4.20.

Then, we generate the set of candidate 2-itemsets C2, which is shown in Table 4.21. For itemset {d1, d2} in C2, <Demo_key, Drug > is the corresponding contingency cube. The count accumulation is shown in Figure 4.15. If {d1, d2} appears in a cell in the corresponding contingency cube, its count is increased by one. So, count(d1, d2) = 3.

The counts of other itemsets in C2 can be obtained in the same way. The set of frequent 2-itemsets F2 is shown in Table 4.22.

Table 4.17. The base cube BC of Example 4.4.

Demo_key Year Age Gender Weight Country Drug PT

1 y3 a2 g1 w2 c3 d1 s1

1 y3 a2 g1 w2 c3 d2 s1

1 y3 a2 g1 w2 c3 d1 s2

1 y3 a2 g1 w2 c3 d2 s2

2 y1 a1 g2 w2 c1 d2 s2

2 y1 a1 g2 w2 c1 d3 s2

3 y2 a2 g2 w2 c2 d1 s1

3 y2 a2 g2 w2 c2 d2 s1

3 y2 a2 g2 w2 c2 d1 s2

3 y2 a2 g2 w2 c2 d2 s2

4 y1 a2 g1 w1 c3 d2 s1

4 y1 a2 g1 w1 c3 d2 s2

5 y1 a2 g2 w1 c2 d1 s1

5 y1 a2 g2 w2 c2 d2 s1

5 y1 a2 g2 w1 c2 d1 s2

5 y1 a2 g2 w2 c2 d2 s2

6 y1 a1 g1 w1 c2 d1 s3

Table 4.18. The initial candidate 1-itemsets C1. Itemset Itemset

d1 s1

d2 s2

d3 s3

Figure 4.14. The example for accumulating the count of itemset {d1}.

Table 4.19. The set of candidate 1-itemsets C1. Itemset Count Itemset Count

d1 4 s1 4

d2 5 s2 5

d3 1 s3 1

Table 4.20. The set of frequent 1-itemsets F1. Itemset Count

d1 4

d2 5

s1 4

s2 5

Table 4.21. The candidate 2-itemsets C2.

Figure 4.15. The example for accumulating the count of itemset {d1, d2}.

Table 4.22. The set of frequent 2-itemsets F2. Itemset Count Itemset Count

d1, d2 3 d2, s1 4 d1, s1 3 d2, s2 5 d1, s2 3 s1, s2 4

We continue to generate the set of candidate 3-itemsets C3 by joining the set of frequent 2-itemsets, which is shown in Table 4.23. For itemset {d1, d2, s1} in C3,

<Demo_key, Drug, PT> is the corresponding contingency cube. The count accumulation is shown in Figure 4.16. If {d1, d2, s1} appears in the corresponding contingency cube, its count is increased by one. So, count(d1, d2, s1) = 3. The counts of all itemsets in C3 can be obtained in the same way. Table 4.24 shows the set of frequent 3-itemsets F3.

Table 4.23. The candidate 3-itemsets C3. Itemset Itemset d1, d2, s1 d1, s1, s2

d1, d2, s2 d2, s1, s2

Figure 4.16. The example for accumulating the count of itemset {d1, d2, s1}.

Table 4.24. The set of frequent 3-itemsets F3. Itemset Count

d1, d2, s1 3 d1, d2, s2 3 d1, s1, s2 3 d2, s1, s2 4

The candidate 4-itemsets C4 is shown in Table 4.25. For itemset {d1, d2, s1, s2}, we can obtain its count from the corresponding contingency cube <Demo_key, Drug, PT>

that is shown in Figure 4.17 and Table 4.26. Finally, no candidate itemset can be generated. The process thus stops and proceeds to the rule generation phase.

Table 4.25. The set of candidate 4-itemsets C4. Itemset

d1, d2, s1, s2

Figure 4.17. The example for accumulating the count of itemset {d1, d2, s1, s2}.

Table 4.26. The set of frequent 4-itemsets F4. Itemset Count d1, d2, s1, s2 3

In the rule generation phase, each k-itemsets (k ≥ 4) will be issued as a rule. In this example, a generated rule is:

Drug1 = d1, Drug2 = d2 → Symptom1 = s1, Symptom2 = s2

Next in the signal generation phase the measure of each generated rule will be calculated and inspected to see whether it passes the threshold or not. The result is shown in Table 4.27. Finally, it sorts all signals by their measure values and output them for users in the last phase. The results are shown in Table 4.28.

Table 4.27. The measure values of the generated rules.

Rule a b c d PRR PRR -1.96SE>1

Drug1 = d1, Drug2 = d2 → Symptom1 = s1, Symptom2 = s2

3 0 1 2 3 Yes

Table 4.28. The result of signals.

No. Signal PRR 1 Drug1 = d1, Drug2 = d2 → Symptom1 = s1, Symptom2 = s2 3

在文檔中藥物不良反應成因的分析與偵測之知識發掘平台 (頁 64-73)