Chapter 1 Introduction
1.3 Thesis Organization
The other chapters of this thesis are organized as follows. In Chapter 2, we introduce background knowledge related to this work, including ADR detection methods and measures in Section 2.1, basic concepts of rough set theory as well as extended rough set theory for handling data with missing values in Section 2.2. We also review related work about applying rough set theory to medical informatics applications, including medical image segmentation, medical classifications and pattern recognitions, and medical diagnosis.
Chapter 3 presents our proposed rough set based method for detecting ADR signals from incomplete SRS data with missing values. The ADR detection problem is first defined in Section 3.1. Then we describe the basic concept on utilizing rough set
4
theory to the problem of concern in Section 3.2. Based on the concept of characteristic sets and approximations, we propose twelve different methods for measuring the strength of a rule, and discuss their feasibility in Section 3.3. Finally, the details of our algorithm that accommodate all different measuring methods are described in Section 3.4.
In Chapter 4, we show and discuss the results of the experiments conducted over FAERs dataset. The experiments examine our proposed rough based method against traditional method that ignores data with the missing values. We compare the effects of both approaches on timeline surveillance and warning of marketed drugs, first the withdrawn drugs in Section 4.2, and non-withdrawn drugs in Section 4.3.
Finally, we describe conclusions and future work in Chapter 5.
5
Chapter 2
Background and Related Work
2.1 ADR Detection
As we have mentioned in Section 1.1, the main purpose of spontaneous reporting systems is to collect as could as possible all adverse drug events to facilitate the detection of suspected ADR signals via some statistical or data mining methods.
Contemporary detection methods of ADR signals can be broadly divided into two categories: frequentist methods and Bayesian methods [6]. In the following of this chapter, we will describe representative methods and measurements of each category.
2.1.1 Frequentist methods
Frequentist methods are widely used in most real ADR monitoring systems due to their simplicity to calculate and interpret. This category is mainly based on the statistical 2*2 contingency table as shown in Table 2.1 to estimatethe proportion of suspected ADRs in spontaneous reporting systems caused by the drug of interest vs.
other drugs. If the ratio is higher than a threshold, then disproportionality occurs, which means the drug of interest is regarded to have a significant association with the suspected reaction. In the past decade, there have been various frequentist methods, each of which differs mainly on the metric for measuring the disproportionality [10].
The most representative metrics are Proportional Reporting Ratio (PRR) and Reporting Odds Ratio (ROR) [20]. In addition, hypothesis tests of independence, i.e., chi-square test, are usually adopted as extra precautionary measures. The formula of chi-square
6
test can be defined as follows:
(2.1)
(2.2)
Table 2.1 The 2×2 contingency table for signal detection of ADR.
reaction of interest other reactions total
drug of interest a b a + b
other drugs c d c + d
total a + c b + d N = a + b +c + d
a: number of reports of the suspected drug lead to the suspected reaction b: number of reports of the suspected drug lead to all other reactions
c: number of reports of all other drugs in the database lead to the suspected reactions d: number of reports of all other drugs lead to all other reactions.
The PRR measure proposed by Evans et al. [11] is used by UK medicines and Healthcare products Regulatory Agency (MHRA) [11]. PRR is defined as the ratio of ADR reports for the suspected drug that are related to a specific adverse reaction, divided by the corresponding ratio for all other drugs in the database, which is defined as follows:
(2.3)
The ROR measure is used by the Netherlands Pharmacovigilance Centre Lareb [9].
ROR is defined as the proportion of a specific ADR caused by the suspected drug to all other drugs in the database, divided by the corresponding proportion of other adverse reactions, which is defined as follows:
7
(2.4)
2.1.2 Bayesian methods
Another category of more complex methods, Bayesian methods were developed based on Bayesian statistics to estimate the (posterior) probability that the suspected adverse reaction occurs given the use of the suspected drug. Representatives of this category are Bayesian Confidence Propagation Neural Network (BCPNN) [2][3][17]
and Multi-item Gamma Poisson Shrinker (MGPS) [1]. The Bayesian confidence propagation neural network (BCPNN) is used by WHO Uppsala Monitoring Centre (UMC), while the Multi-item Gamma Poisson Shrinker (MGPS) is applied by US Food and Drug Administration (FDA). BCPNN is mainly to calculate an information component (IC) value. The IC value measures the strength of association between a drug and an adverse reaction. The IC measure is defined in the following:
(2.5)
wherep(x)is the probability of the drug
x
appears in the reports, p( y)is the probability of the reactionyappears in the reports, and p(x,y) is the probability of both drug x and reaction y appear in the reports. Since the distribution of IC is difficult to determine, instead usually the beta distribution is assigned a prior to the following four probabilities in computing IC.p(x)~ B(1,2), p(y)~ B(1,2), p(x,y)~ B(1,2) (2.6)
8
The computation is very complicated and beyond our discussion. The readers can refer to [14] for more detailed derivation.
MGPS concludes the strength of association between the suspected drug and the suspected reaction by estimating the number of reports containing both the suspected drug and the suspected reaction in database and the expected value. This measure, called relative rate (RR) [29], is defined from the definition of probabilistic independence, defined in the following:
(2.7)
In the field of adverse drug reactions, most of detection methods can be used and every method has its own advantages and disadvantages. Therefore, one can select one or more suitable detection methods according to different analysis purposes. A summary of the mentioned ADR measures are shown in Table 2.2. Table 2.3 summarizes the most frequently used signal detection methods.
Table 2.2 A summary of ADR measurements.
Categories Methods Formula
Frequentist methods
Proportional Reporting Ratio
(PRR) c/(c d)
b) a/(a
Reporting Odds Ratio (ROR)
d
Bayesian Confidence Propagation Neural Network (BCPNN)
Multi-item Gamma Poisson Shrinker (MGPS)
9
Table 2.3 A summary of signal detection methods for identifying ADRs.
Categories Methods Threshold Application Reference
Frequentist
U.K. Yellow Card database
UK Medicines and Healthcare products
Regulatory Agency (MHRA)
10
2.2 Rough Set Theory
The rough set theory originally proposed by Z. Pawlak in 1982 [24] is a useful tool for the analysis of imprecise, uncertainly or incomplete data. The theory is based on the concept of rough set, a formal approximation of a crisp set composed of objects represented by values of attributes. Classically, the set of objects concerned is represented as an information system or information table [28]. In the following, we introduce the basic concepts of rough sets theory and use rough set theory to data with missing values.
2.2.1 Basic Concepts
Information System
An information system is a pair S {U,A}[33], where U denotes a nonempty finite set of objects called the universe and A denotes a nonempty finite set of attributes.
For example, a data set contains the basic information for students in Table 2.4. This data set can use terminology of the rough set theory as an information system, where universe U represents the set of cases U {1,2,3,4} and A represents the set of variables i.e.,A{Height,Weight,Gender}.
Table 2.4 An example of information system.
Case Height Weight Gender
1 170 60 Male
2 165 55 Female
3 155 45 Female
4 150 65 Male
11
Decision Table (Data Table)
A decision table is a special form of information systems, in which the set attribute A is divided into a set of conditional attributes C and a decision attribute d, i.e.
} {d C
A . For example, in Table 2.5 there are three condition attributes }
, ,
{Height Weight Age
A , and one decision attribute d {Overweight}.
Table 2.5 An example of decision table.
Case Height Weight Age Overweight
1 170 75 18 Yes
2 165 50 30 Yes
3 165 60 18 No
4 145 75 18 No
5 145 50 30 No
6 170 45 45 Yes
7 145 50 45 No
8 170 45 30 Yes
Lower and Upper Approximations
The following definition defines two basic concepts for applying rough set theory to data analysis: the lower and upper approximations. Let X represent a subset of elements of the universe U. The lower approximation indicates the set of elements certainly belonging to the set X, while the upper approximation indicates the set of elements possibly belonging to the set X. Given an information system S (U,A) and
A
P , the lower approximation of X induced by P in S, denoted as PX, and the upper approximation of X induced by P in S, denoted as PX , are defined as follows:
12
where [e]P denotes the equivalence class of e induced by attribute set P.
Figure 2.1 is a conceptual illustration of the lower and upper approximations of a subset X in U induced by P. For example, let X = {1, 2, 6, 8} and P = {Weight, Age}, then the lower and upper approximations of X induced by P are:
}
Figure 2.1 Illustration of the lower and upper approximations of rough set theory.
Accuracy of Approximations
The accuracy of an approximation of X induced by P, denoted as P( X), is calculated as dividing the cardinality of the lower approximation by the cardinality of the upper approximation,
Lower approximation ( ) Set X
Upper approximation ( )
13
X P
X X P
P( )
If P(X)1, the lower and upper approximations are identical, which indicates the set X is definable in U. If P(X)1, set X can be defined by its lower and upper approximation and is roughly definable in U. For example, from Table 2.5 we can obtain the accuracy of approximations in the following:
3 1 6 ) 2 (X
P
2.2.2 Rough Set Strategies to Data with Missing Data
In real world applications, any data collections usually contain missing values, making the data incomplete for analysis. Various researchers have extended rough set theory for dealing with data with missing values [13][19][31].
In this section, we present the concepts that are useful in our research, mainly the characteristic relation, characteristic set, and the refined lower and upper approximations.
Classically, the data is usually presented in the form of a decision table, where missing values can be interpreted from two aspects: lost and do not care. A lost missing value, denoted as “?”, indicates that the value is important but is erased (see Table 2.6).
A don’t care missing value, denoted as “*”, indicates that the value is not important or redundant (see Table 2.7). In Table 2.8, we present a decision table with missing values of both categories: lost and don’t care.
14
Table 2.6 An example of an incomplete decision table containing “lost” missing values.
Case Height Weight Gender Overweight
1 170 50 Male Yes
2 165 ? Female No
3 170 80 ? No
4 165 50 Female No
5 ? ? Male Yes
Table 2.7 An example of an incomplete decision table containing “don’t care”
missing values.
Case Height Weight Case Overweight
1 170 50 Male Yes
2 165 * Female No
3 170 80 * No
4 165 50 Female No
5 * * Male Yes
Table 2.8 An example of an incomplete decision table containing “lost” and
“don’t care” missing values.
Case Height Weight Case Overweight
1 170 50 Male Yes
2 165 ? Female No
3 170 80 ? No
4 165 50 Female No
5 * * Male Yes
15
Characteristic relation & Characteristic set
The conventional rough set theory is under the assumption that information systems are complete and relies on the indiscernibility relation to derive other kernel definitions such as lower and upper approximations. However, the indiscernibility relation is not applicable to data with missing values. Different extensions of the indiscernibility relation have been proposed, including the tolerance relation [19], similarity relation [31], and characteristic relation [13].
The tolerance relation was proposed by Kryszkiewicz to process data with “don’t care” missing values, the similarity relation was proposed by Stefanowski and Tsoukias to process data with “lost” missing values, while the characteristic relation, proposed by Grzymala-Busse, considers both “lost” and “don’t care” missing values. Since the characteristic relation is a general form of the tolerance and similarity relations, in this thesis we adopt this term (denoted as R), and use subscripts T and S to denote the tolerance (RT) and similarity versions (RS), respectively.
Definition 2.1 Let PA be a subset of attributes. The similarity characteristic
16
Example 2.1 Consider Table 2.6. Let P be the set of all attributes. Then the similarity characteristic sets of all cases induced by P are:
} in attribute a.
Example 2.2 Consider Table 2.7. Let P be the set of all attributes. Then the tolerance characteristic sets of all cases induced by P are:
17 characteristic sets of all cases induced by P are:
}
Based on the concept of characteristic relation and characteristic set, Grzymala-Busse [13] proposed three different extensions of the lower and upper approximations for processing data with missing values: singleton, subset, and concept approximations.
The first extension is called singleton approximation, which considers all cases in
18
U and is similar to the original definitions of lower and upper approximations.
Definition 2.4 The singleton lower approximation of X induced by P, denoted by pgkX , is the set of all cases whose characteristic set is contained in X, i.e.,
} whose characteristic set of having an non-empty intersection with X, i.e.,
}
Note that the characteristic sets presented in the above definition can be any types of characteristic sets.
Example 2.4 Consider Table 2.6. There are two equivalence classes induced by attribute Overweight, i.e., {1, 5} and {2, 3, 4}. Below are the singleton approximations of these two elementary sets:
} characteristic sets to define approximation.
19
Definition 2.5 The subset lower approximation of X induced by P, PKs X, is the union of characteristic sets that are contained in X, i.e.,
} characteristic sets which have an nonempty intersection with X, i.e.,
} elementary sets induced by Overweight are:
}
The third definition called concept approximation is stricter than the subset version in that it only considers those cases in X.
Definition 2.6 The concept lower and upper approximations of X induced by P are defined as follows:
} elementary sets induced by Overweight are:
20
Note that for complete decision tables, all of the three approximations, singleton, subset, and concept, are amalgamated into the same definition. However, it is not true for incomplete decision tables.
2.3 Rough Set in Medical Informatics Application
Medical databases usually contain a lot of incomplete or ambiguous data, and for some applications the amount of data is very huge [8]. Therefore, it is natural to develop rough set based intelligent methods for analyzing medical data [4][5][15][27][31].
Representative techniques include rough set based medical image segmentation [21], rough set based medical classifications and pattern recognitions [18][32], and rough set based medical diagnosis [22]. Since the ADR signal detection problem bears a resemblance to medical classification and diagnosis, below we only highlight related work on these topics.
Mandal et al. [22] used originally concept of rough set theory for the automated diagnosis of Lung Adenocarcinoma and to predict genes of patients causing the Lung Adenocarcinoma. They used publicly microarray dataset obtained from the NCBI website. Experimental results via cross validation exhibit 100% accuracy for the discovered rules.
Hassanien and Ali [16] used rough set method for generating classification rules
21
from the breast cancer data. They used attribute reduction technique of rough set theory to select the necessary attributes. The purpose is to identify every decision class by using the minimum condition which can increase efficiency for decision making. The generated rules and classification accuracy was compared with decision tree classifier algorithm, which showed that the rough set based approach produces stricter rules and the classification accuracy is higher than that of decision trees.
Stepaniuk [30] used rough set theory to identify the most important condition attributes and according to the condition attributes and decision attribute produced decision rules from the diabetes mellitus dataset. The proposed method can be applied to different kinds of medical datasets.
The above literatures highlight the usefulness and efficiency of rough set theory in medical domain. In summary, the concept of rough set theory can aid building medical information systems or expert systems in medical domain, and can provide medical experts to analyze the problem effectively.
22
Chapter 3
Rough Set Based ADR Detection
3.1 Problem Description
As mentioned in Chapter 1, spontaneous reporting systems are used to monitor and discover suspicious ADR signals. When the patient produces uncomfortable or harmful adverse reaction by normal drug of usage, hospitals, pharmaceutical companies, and the patient himself can report or query by SRS. The reporting data may contain some missing values due to omitting or personal privacy problem. The purpose of this study is to consider records with missing values and use rough set strategies to process the reporting data with missing values.
In this study, the reporting data is obtained from the FDA Adverse Event Reporting System (FAERS) database [12]. The FAERS database is composed of seven data files, as shown in Figure 3.1, including DEMO, DRUG, REAC, OUTC, RPSR, THER, and INDI. We selected three data files that are essential for ADR signal detection, i.e., DEMO, DRUG, and REAC. From the DEMO data file, we chosen four attributes about personal information of patients, including ISR (primary report id), EVENT_DT, AGE, and GNDR_COD. These attributes may contain null values except ISR. From the DRUG and REAC files we chosen the DRUGNAME and PT attributes, which do not contain null values. Details of the chosen attributes are presented in Table 3.1. Table 3.2 is a snapshot of the reporting data extracted from data collected in the first quarter of 2008.
23
Figure 3.1 The schema of FAERS database.
Table 3.1. Description of the attributes selected from the FAERS dataset
File name
Selected attribute name
Containing null values
Null probability (07Q2)
Attribute name descriptions
DEMO
ISR 0 number of patients (unique)
EVENT_DT 31.3 adverse event happen date
AGE 38.8 age of patients
GNDR_COD 5.8 gender of patients
DRUG DRUGNAME 0 Name of drug (trade name)
REAC PT 0 Name of ADR (using PT
level from MedDRA ) DEMO: to record personal information for each patient.
DRUG: to record the medicines taken by each patient.
REAC: to record the observed adverse reactions for each report.
24
Table 3.2. Input data sample for ADR signal detection (08Q1)
To facilitate the discussion, the reporting data is presented as an information system S (U,A) containing missing values. Further, we assume that the reporting data is indicated in the form of an incompletely data table, and that the missing values can be either one of two categories: lost (?) or don’t care (*). Our purpose is to examine the feasibility of rough set theory to the ADR detection, focusing on whether the inclusion of missing data through rough set based approximation can be helpful for the predicting capability of generated signals. Therefore, the problem can be described as given a SRS dataset extracted from the FAERS database that contains missing values and is represented in the form of data table, we like to compute the strength (using PRR or ROR measure) of any given suspected ADR rule of the following form:
Predc, drug symptom (3.1) where Predc denotes extra conditions associated with the signal, e.g., Sex = “female”, Age = “>18” and examine if the strength of this rule is over a specified threshold to becoming a noteworthy ADR signal.
25
3.2 Rough Set Based Method: Basic Idea
Since all contemporary measures relies on the contingency 22 table, our basic idea is applying rough set theory to the calculation of the contingency 22 table.
Consider the rule in (3.1) and the following corresponding contingency table.
Predc symptom Other symptoms
drug a b
Other drugs c d
If the information system is complete, then each of the cell values, a, b, c, d, on the contingency table are deterministic. Unfortunately, as we have shown previously, the attributes involved in the Predicate may contain missing values, causing the cell values imprecise. We thus adopt the concept of lower and upper approximations to obtain an approximate range of each cell values and in accordance compute the strength of the corresponding rule.
For simplicity, let Xa, Xb, Xc, and Xd denote the sets of cases satisfying the corresponding cell conditions in the contingency table. Clearly, for complete data we have a |Xa|, b |Xb|, c |Xc|, and d |Xd|. But for incomplete data we compute the lower and upper approximations for Xa, Xb, Xc, and Xd. Let P denote the set of attributes for the approximation computation. Each cell value can be denoted by a range, i.e.,
a: [𝑎̅, 𝑎], b: [𝑏̅, 𝑏], c: [𝑐̅, 𝑐], d: [𝑑̅, 𝑑], (3.2)
26
and accordingly, we have
𝑎̅ |𝑃KXa|, 𝑎 |PKXa|,
𝑏̅ |𝑃KXb|, 𝑏 |PKXb|, (3.3) 𝑐̅ |𝑃KXc|, 𝑐 |PKXc|,
𝑑̅ |𝑃KXd|, 𝑑 |PKXd|.
We consider two different options for defining the set P: global and local. The global approach specifies all attributes in the data to P, i.e., P = A. The local approach
We consider two different options for defining the set P: global and local. The global approach specifies all attributes in the data to P, i.e., P = A. The local approach