Associative Classification Based Method - Methods for Signal Detection of Adverse Drug Reaction

Chapter 4 Methods for Signal Detection of Adverse Drug Reactions

4.3 Associative Classification Based Method

In this section, we describe the last algorithm, namely ABCM-MS, for detecting ADRs involving drug-drug interactions. Unlike the previous two types of ADR rules, the cube-based approach is not suitable for this type of ADR rules.

Instead, we regard this type of ADR signal involving single symptom caused by drug interactions as a classification rule expressed as “Conditions → Class”. The Conditions refer to the mining attributes with drugs inclusive. The Class means the suffered symptom. Currently, there are many efficient algorithms for classification, such as neural network, case-based reasoning, decision tree, bayesian network and support vector machines (SVM) [9]. However, these algorithms are not suitable for the interactive mining environment and lack the ability for dealing with rules exhibiting intra-attribute values, e.g., drug-drug interactions. In this thesis, we thus develop an algorithm, called ABCM-MS (Association Based Classification Mining for Signal Detection of Multiple Drugs and Single Symptom), which is modified from the CMAR (Classification Based on Multiple Class-Association Rules) algorithm [15].

The basic concept of CMAR is that a classification rule can be considered as a

special case of association rules. We can adopt the current algorithm of association rule to mine the classification rule. The core of CMAR relies on a data structure analogous to the FP-tree used in the well-known efficient frequent itemsets mining algorithm, FP-growth, called CR-tree, which is used to store possible classification rules. The primary advantage of CMAR is that the mining process does not have to generate candidate itemsets. Furthermore, to alleviate unnecessary computation spent on counting items uninteresting to the users, we adopt the concept of constraint pattern mining, treating the mining attributes and drug as item constraints of association rule [18] and pruning items not conforming to the constraints.

Figure 4.18 shows the workflow of ABCM-MS. It contains two stages: (1) Off-line process; (2) On-line process, which are further divided into five phases: (1) Data transformation phase; (2) Dataset reduction phase; (3) CR-tree construction phase; (4) Rule generation phase; and (5) Signal generation and sorting phase.

During the offline process, ABCM-MS first prunes infrequent items and classes in transaction dataset, and stores the results to disk. Then, it performs a transaction reduction by removing those items not belonging to attributes within the user specified risk patterns. At the online mode, it then uses the reduced transactions to construct a CR-tree, which is then used to derive the ADR rules. In the following, we describe each of the five phases in detail.

(1) Data transformation phase

In data warehouses, data is usually stored in relational format. Most association mining algorithms, however, require the input data being in horizontal transaction type.

The relational data must be transformed into transactional data format before mining process. To this end, we extract each report in the data warehouse as a transaction composed of all demographic attributes, Drug, and PT through the following simple

SQL statement:

Figure 4.18. The workflow of ABCM-MS.

SELECT Demo_key AS TID, Year, Age, Gender, Weight, Country, Drug, PT FROM DW

where DW means ADR data warehouse.

(2) Dataset reduction phase

One of the designing purposes of the system is to establish an interactive environment. Hence, the efficiency is the most important issue. Since the spontaneous reporting database is very huge, if unrelated items or transactions can be pruned before the mining process starts, the efficiency of the system can be significantly improved.

For the above reasons, the first step in this phase is to scan transaction dataset T once and calculate the count of each item and class for finding all frequent items and classes.

The set of frequent classes set are denoted as C1. The frequent items are sorted in descending order according to their counts and denoted as F1. Then, it scans the dataset again and sorts each transaction according to F1, and prunes all infrequent items and classes. At the same time, a transaction is pruned if it does not contain both frequent drug item and symptom class. The process for generating the reduced transaction dataset T’ is shown in Figure 4.19. Finally, it stores frequent classes C1, frequent items F1 and the reduced transaction dataset T’ in disk for the next phase.

Input: A transaction dataset T ; A set of frequent item F1; A set of frequent class C1;

Output: The reduced transaction dataset T’.

Steps:

Figure 4.19. Generation of the reduced transaction dataset.

(3) CR-tree construction phase

The on-line process starts at this phase. The task of this phase is to construct the CR-tree by scanning the stored transaction dataset T. At the same time, it also prunes

the items unrelated to the user specified mining attributes and eliminates any transaction that does not contain any related items. Note that we use the same approach proposed in the CMAR algorithm to build CR-tree.

(4) Rule generation phase

In this phase, it traverses the CR-Tree to find out all frequent rules. A frequent rule means that its count is greater than or equal to the frequency threshold ξ. Here, the approach used in CMAR is also adopted. First, it constructs the conditional pattern base of each frequent item and class. Then, it builds its conditional CR-tree on the conditional pattern base. Finally, it performs mining procedure recursively on that conditional CR-tree to generate all frequent rules. If a rule does not contain any related items, then it is pruned.

(5) Signal generation and sorting phase

The task of this phase is to generate interesting signals. For each rule, the four cell values of the contingency table as shown in Table 4.2 are obtained from the corresponding nodes of CR-tree to calculate the selected measure value. Then, it finds all rules whose measure values are more than or equal to the threshold. These signals are sorted based on their measure values and the number of items in the antecedent.

Finally, the top-k signals are outputted. If the user does not satisfy with the results, he can go back to phase 3 to re-select interesting parameters.

Example 4.5 Continue Example 4.1. We illustrate the process of ABCM-MS by using this example below.

After the data transformation phase, the transformed transaction dataset is shown in Table 4.29.

Table 4.29. The transaction dataset T.

TID Year Age Gender Weight Country Drug PT

1 y3 a2 g1 w2 c3 d1, d2, d3 s1

2 y1 a1 g2 w2 c1 d2, d3 s2

3 y2 a2 g2 w2 c2 d1, d3 s1, s2

4 y1 a2 g1 w1 c3 d1, d3 s1

5 y1 a1 g2 w1 c2 d2 s1

6 y3 a2 g1 w2 c1 d2 s3

In the data reduction phase, it scans the transaction dataset T once to find all frequent items F1 and classes C1, which are shown in Tables 4.30 and 4.31, respectively.

Table 4.30. The frequent items F1. Item Count

a2 4

w2 4

d2 4

d3 4

y1 3

g1 3

g2 3

d1 3

Table 4.31. The frequent class C1. Class Count

s1 4

Then, it scans the dataset again, sorts items within each transaction, and prunes all infrequent items and classes. Then, store the reduced transaction. The result T’ is shown in Table 4.32. Finally, F1, C1, and T’ are stored in disk.

Table 4.32. The reduced transaction dataset T’.

TID Transaction Class label 1 a2, w2, d2, d3, g1, d1 s1

3 a2, w2, d3, g2, d1 s1

4 a2, d3, y1, g1, d1 s1

5 d2, y1, g2 s1

In CR-tree construction phase, it scans the stored transaction dataset T’, prunes unrelated items and transactions, and constructs the CR-tree. The reduced transaction dataset T’’ and CR-tree are shown in Table 4.33 and Figure 4.20, respectively.

Table 4.33. The reduced transaction dataset T’’.

TID Transaction Class label

1 a2, d2, d3, d1 s1

3 a2, d3, d1 s1

4 a2, d3, d1 s1

Figure 4.20. The initial CR-tree for Example 4.4.

In the rule generation phase, it constructs the conditional CR-tree of each frequent 1-itemset to generate all frequent rules. When the process for generating rules involving the frequent item is completed, the item is eliminated from the header node and those nodes indicating class labels are merged into their parent nodes. The process of rule generating is depicted in Figure 4.21. The frequent rules are shown in Table 4.34.

Finally, all frequent rules are checked against the user specified mining attributes and drug (antecedent), which in this example are Age and Drug. The resulting set of satisfied rules is shown in Table 4.35.

In the last phase, the remaining frequent rules are checked. If the measure values of a rule is more than or equal to the threshold of the selected measure, then it is outputted as a signal. Table 4.36 shows the result of measure checked rules. Finally, the signals are sorted as shown in Table 4.37. The top-k signals are then outputted.

Figure 4.21. All conditional CR-trees.

Table 4.34. The frequent rules.

Rule Count

Drug = d1 → Symptom = s1 3

Drug1 = d1, Drug2 = d3 → Symptom = s1 3 Age = a2, Drug = d1 → Symptom = s1 3 Age = a2, Drug1 = d1, Drug2 = d3 → Symptom = s1 3

Drug = d3 → Symptom = s1 3

Age = a2, Drug = d3 → Symptom = s1 3

Age = a2 → Symptom = s1 3

Table 4.35. The frequent rules that satisfy user specified mining attributes.

Rule Count

Age = a2, Drug = d1 → Symptom = s1 3 Age = a2, Drug1 = d1, Drug2 = d3 → Symptom = s1 3 Age = a2 , Drug = d3 → Symptom = s1 3

Table 4.36. The measure values of rules.

Rule a b c d PRR PRR -1.96SE>1

Age = a2, Drug1 = d1, Drug2 = d3 → Symptom = s1

3 0 1 2 3 Yes

Age = a2, Drug = d1 → Symptom = s1 3 0 1 2 3 Yes Age = a2, Drug = d3 → Symptom = s1 3 0 1 2 3 Yes

Table 4.37. The sorted signals.

No. Signal PRR

1 Age = a2, Drug1 = d1, Drug2 = d3 → Symptom = s1 3 2 Age = a2, Drug = d1 → Symptom = s1 3 3 Age = a2, Drug = d3 → Symptom = s1 3

Chapter 5 Implementation and Experiments

In this chapter, we describe our implementation of the proposed platform, focusing on the user interface and performance study of the proposed algorithms, including CBM-SS and ABCM-MS. All implementations and experiments were performed on a personal computer with Intel Core2 Duo 2.33Ghz CPU, 3GB main memory, and 320 GB hard disk. The operation system is Windows XP. The database system we used is Microsoft SQL SERVER 2005. In the implementation of our system, the data files reported by FDA in 2004, 2005, 2006 and 2007, totally four years are used. There are about 60,000 to 110,000 reports in each quarter of years, and the total number of reports in each year is 272400, 326626, 324077 and 378736. Table 5.1 and Table 5.2 show the statistics of data for each year and each quarter, respectively.

Table 5.1. Statistics of data in each year.

Data set Number of reports

Number of drugs

Number of mining attributes

Number of symptoms

2004 272295 13300 97 9673

2005 325674 14273 212 10238

2006 323791 14019 231 10354

2007 378176 14437 230 10436

Table 5.2. Statistics of data in each quarter.

The implemented algorithms, CBM-SS and ABCM-MS, need additional storage spaces to enhance the efficiencies. Table 5.3 and Table 5.4 show the storage requirement for each year and each quarter, respectively.

Table 5.3. Storage requirement in each year.

Data set Storage requirement of CBM-SS Storage requirement of ABCM-MS

2004 133.444 14.25

2005 228.584 18.119

2006 313.068 18.745

2007 241.864 19.475

Unit: Megabytes (MB)

Section 5.1 shows the user interface and operational processes of our system. In Section 5.2, we will compare the performance of CBM-SS and ABCM-MS with the pre-stored process against those without the pre-stored process. In Section 5.3, we demonstrate example ADRs of single symptom caused by single drug and drug interactions detected by our system. Besides, the authenticities of these signals are also analyzed.

5.1 Implementation

The web-based interface of our system is first present in this section. We use Microsoft ASP.NET 2.0 to implement our website. Figure 5.1 shows the homepage of the website.

The system menu contains four functions, including Home, Drug interaction, Single drug, and Manage, as shown in the red frame. The function “Home” is used to go back to the main page of the system. The function “Drug interaction” will invoke algorithm ABCM-MS to detect suspected ADRs of single symptom caused by drug interactions with/without demographic attributes. Figures 5.2 and 5.3 show the corresponding user interface for drugs interaction detection and the resulting page of

discovered ADRs, respectively. Besides, through function “Single drug” users can use CBM-SS to detect suspected ADRs of single symptom caused by single drug with/without demographic attributes.

Table 5.4. Storage requirement in each quarter.

Data set Storage requirement of CBM-SS Storage requirement of ABCM-MS

2004Q1 29.064 3.187

2004Q2 31.912 3.037

2004Q3 29.568 3.601

2004Q4 32.132 3.58

2005Q1 31.364 3.739

2005Q2 33.796 4.124

2005Q3 72.7 4.394

2005Q4 65.876 4.739

2006Q1 85.508 5.068

2006Q2 84.196 4.659

2006Q3 51.972 3.788

2006Q4 49.54 4.179

2007Q1 53.412 4.403

2007Q2 47.908 4.203

2007Q3 49.692 4.594

2007Q4 57.7 5.321

Unit: Megabytes (MB)

Figure 5.1. Homepage of the website.

Figure 5.2. The user interface for detecting ADRs caused by drugs interactions.

Figure 5.3. The resulting page of discovered ADRs caused by drug interactions.

Figure 5.4 shows the user interface for detecting suspected ADRs of single symptom caused by single drug with/without demographic attributes. The resulting page of discovered ADRs is shown in Figure 5.5. The last function “Manage” is used for the administrator to perform data preprocessing.

5.2 Performance Evaluations

First, we conducted experiments to study the performance of the two proposed algorithms, CBM-SS and ABCM-MS, from the aspect of variant sizes of data set. The efficiencies were evaluated over five subsets of the data files reported by FDA in 2007, containing different number of transactions, namely T10K, T50K, T100K, T150K and T200K, respectively. Detailed parameter settings of these five datasets are shown in Table 5.5.

Figure 5.4. The user interface for detecting ADRs of single symptom caused by single drug.

Figure 5.5. The resulting page of discovered ADRs caused by single drug.

Table 5.5. Parameter settings for data sets.

Parameters T10K T50K T100K T150K T200K

Number of transactions 10K 50K 100K 150K 200K

Number of drugs 14437

Number of mining attributes 230

Number of symptoms 10436

We consider two query conditions:

(1) No demographic mining attributes selected: Any demographic information are not considered.

(2) All demographic mining attributes selected: All demographic information are considered.

The results of CBM-SS and ABCM-MS with these two conditions are described in subsections 5.2.1 and 5.2.2, respectively.

5.2.1 Performance of CBM-SS

In this experiment, the frequency threshold is set at ξ = 3. The results of running algorithm CBM-SS with the two conditions are shown in Figures 5.6 and 5.7.

From Figures 5.6 and 5.7, we can observe that the response times of queries with both conditions are within a few seconds. The pre-stored process is significantly faster than without pre-stored process. These demonstrate that CBM-SS is very efficient and suitable for on-line ADRs detection and analysis.

No attribute selected

Figure 5.6. Performance of CBM-SS with no demographic mining attribute selected.

All attributes selected

Figure 5.7. Performance of CBM-SS with all demographic mining attributes selected.

5.2.2 Performance of ABCM-MS

In this experiment, the performances of ABCM-MS with two conditions are shown in Figures 5.8 and 5.9, respectively. The frequency threshold is set at ξ = 50.

No attribute selected

Figure 5.8. Performance of ABCM-MS with no demographic attributes selected.

All attributes selected

Figure 5.9. Performance of ABCM-MS with all demographic attributes selected.

From Figures 5.8 and 5.9, we can observe that the response time increases along with the increasing of data size. However, the response time is relatively small.

Similarly, algorithm ABCM-MS with the pre-stored process is significantly faster than its counterpart without pre-stored process. Users do not have to wait a long time for obtaining the results.

5.3 Experimental Results for Signal Detection

We next studied the effectiveness of the proposed algorithms by comparing the resulting ADR signals with those reported in medical documents. The last four quarter data sets of 2007 in the FDA AERS database are used for these experiments. The suspected ADR signals generated by CBM-SS and ABCM-MS are described in subsections 5.3.1 and 5.3.2, respectively.

5.3.1 Single Drug and Single Symptom Signals

Two examples are given to demonstrate the evaluation results.

Example 1.

Drug = ”CAPTOPRIL”

Mining attributes: Null Measure: ROR

There are 263 signals contain distinct symptoms related to CAPTOPRIL. We ranked these signals according to the ROR value and list TOP-10 signals associated with CAPTOPRIL in Table 5.6. Each of them is represented by listing three attributes,

“Symptom”, “ROR value” and “Count”. For example, among the ten signals, the frequency of BASAL GANGLION DEGENERATION is 6 and its ROR value is 1336.8434 in the whole dataset. The frequency of OESOPHAGEAL INFECTION is 3 and its ROR value is 398.9214, and so on.

Table 5.6. TOP-10 suspected symptoms associated with drug “CAPTOPRIL”.

No. Symptom ROR value Count

1 BASAL GANGLION DEGENERATION 1336.8434 6

2 OESOPHAGEAL INFECTION 398.9214 3

3 LARGE INTESTINAL OBSTRUCTION 222.4044 5

4 INJECTION SITE PHLEBITIS 181.325 3

5 NODAL ARRHYTHMIA 97.8079 6

6 VIRAL UPPER RESPIRATORY TRACT INFECTION 79.78 3

7 HYPERTROPHIC CARDIOMYOPATHY 78.3525 4

8 JAW FRACTURE 62.3452 12

9 PALMAR ERYTHEMA 62.327 3

10 PANCREATIC NEOPLASM 62.327 3

A document reported in [27] described that “CAPTOPRIL is an ACE inhibitor.

This medicine is used to treat high blood pressure and heart failure. It is used to treat heart damage after a heart attack. It can also slow the progression of kidney disease in diabetic patients.”

From this report we know that CAPTOPRIL is often used in the treatment of high blood pressure and heart diseases. Besides, another document reported that symptoms BASAL GANGLION DEGENERATION and LARGE INTESTINAL OBSTRUCTION can be treated by integrating CAPTOPRIL with other drugs [32].

Thus, we can find that BASAL GANGLION DEGENERATION (No. 1), LARGE INTESTINAL OBSTRUCTION (No. 3) and HYPERTROPHIC CARDIOMYOPATHY (No. 7) indeed are noises. In addition, we can not find any relevant documents that reported INJECTION SITE PHLEBITIS (No. 4) and JAW FRACTURE (No. 8) are

related to CAPTOPRIL. These two signals need further professional analysis and literature validation. Other remaining signals are the adverse drug reactions of CAPTOPRIL.

Example 2.

Drug = ”RANITIDINE”

Mining attributes: Year, Age, Gender, Weight, Country Measure: IC

There are 78 signals containing distinct symptoms with other mining attributes related to RANITIDINE. We ranked these signals according to the IC value and list TOP-10 signals associated with RANITIDINE in Table 5.7. Each of them is represented by listing eight attributes, including “Year”, “Age”, “Gender”, “Weight”,

“Country”, “Symptom”, “IC value” and “Count”.

A document reported that “RANITIDINE is a type of antihistamine that blocks the release of stomach acid. It is used to treat stomach or intestinal ulcers. It can relieve ulcer pain and discomfort, and the heartburn from acid reflux.” [27]

From the above report we know that RANITIDINE is often used in the treatment of stomach or intestinal ulcers. In these ten distinct symptoms of signals, we observe that the noise is NEUTROPENIC COLITIS (No. 2) which is related to intestinal ulcers.

Other nine symptoms are recorded as the adverse drug reactions of RANITIDINE. In addition, MENINGITIS BACTERIAL is an ADR that is easily caused by RANITIDINE in young children [32].

Table 5.7. TOP-10 suspected symptoms with other mining attributes associated with

“RANITIDINE”.

No. Year Age Gender Weight Country Symptom IC value Count

1 2007 14~20 M 54.0~ UNITED

STATES FLUID IMBALANCE 13.2358 3

2 2007 4~7 F 10.0~15.0 UNITED

STATES NEUTROPENIC COLITIS 12.0983 5 3 2007 7~14 M 30.0~40.0 UNITED

STATES MENINGITIS BACTERIAL 11.0783 3 6 2007 20~60 M 54.0~ UNITED

KINGDOM PO2 DECREASED 10.5905 6

7 2007 7~14 M 30.0~40.0 SPAIN CAPILLARY EAK

5.3.2 Multiple Drug and Single Symptom Signals

In this subsection, we focus on finding the causal relation between drugs interaction and single symptom using the proposed ABCM-MS algorithm. In the following experiments, the frequency threshold is set at 20. Below, two examples are given to demonstrate the ABCM-MS algorithm.

Example 1.

Drug set = {“AREDIA”, “ZOMETA”}

Mining attributes: Null Measure: PRR

Algorithm ABCM-MS exploited nine signals about the interaction of drugs AREDIA and ZOMETA. We ranked these signals according to the PRR value, which are shown in Table 5.8. Each of them is represented by listing three attributes, including “Symptom”, “PRR value” and “Count”.

Table 5.8. Suspected symptoms associated with drug set = {“AREDIA”, “ZOMETA”}.

No. Symptom PRR value Count

1 WOUND TREATMENT 2711.547 22

2 TROPONIN I INCREASED 1182.094 22

3 TOOTHACHE 93.3415 30

4 ILEUS 66.2488 26

5 WHEEZING 38.2239 30

6 LUNG DISORDER 38.0937 32

7 VISUAL DISTURBANCE 19.4125 34

8 RESPIRATORY FAILURE 10.4387 22

9 HYPOTENSION 4.8558 20

A document reported that “Novartis and FDA notified dental healthcare professionals of revisions to the prescribing information to describe the occurrence of osteonecrosis of the jaw (ONJ) observed in cancer patients receiving treatment with intravenous bisphosphonates, Aredia (pamidronate disodium) and Zometa (zoledronic

acid). The prescribing information recommends that cancer patients receive a dental examination prior to initiating therapy with intravenous bisphosphonates (Aredia and Zometa), and avoid invasive dental procedures while receiving bisphosphonate treatment. For patients who develop ONJ while on bisphosphonate therapy, dental surgery may exacerbate the condition.” [30]

From the above description we know that when AREDIA and ZOMETA are used together, it may cause patients to suffer the symptoms about jaw. However, we only find a concomitant symptom of ONJ, TOOTHACHE (No. 3). Although related signals did not exploit from the system, it may be lost because of the high frequency threshold or selected measure. In fact, we found that there were five records related to this ADR

在文檔中藥物不良反應成因的分析與偵測之知識發掘平台 (頁 73-0)