Vol. 29 no. 6 2013, pages 758–764
BIOINFORMATICS
ORIGINAL PAPER
doi:10.1093/bioinformatics/btt037Data and text mining
Advance Access publication February 15, 2013Drug-SNPing: an integrated drug-based, protein interaction-based
tagSNP-based pharmacogenomics platform for SNP genotyping
Cheng-Hong Yang
1
, Yu-Huei Cheng
2,
*, Li-Yeh Chuang
3,
* and Hsueh-Wei Chang
4,5,6,
*
1
Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung,
2Department of
Digital Content Design and Management, Toko University, Chiayi,
3Department of Chemical Engineering & Institute of
Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung,
4Department of Biomedical Science and
Environmental Biology,
5Graduate Institute of Natural Products and
6Cancer Center, Kaohsiung Medical University
Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Associate Editor: Martin Bishop
ABSTRACT
Summary: Many drug or single nucleotide polymorphism (SNP)-related resources and tools have been developed, but connecting and integrating them is still a challenge. Here, we describe a user-friendly web-based software package, named Drug-SNPing, which provides a platform for the integration of drug information (DrugBank and PharmGKB), protein–protein interactions (STRING), tagSNP selection (HapMap) and genotyping information (dbSNP, REBASE and SNP500Cancer). DrugBank-based inputs include the following: (i) common name of the drug, (ii) synonym or drug brand name, (iii) gene name (HUGO) and (iv) keywords. PharmGKB-based inputs include the following: (i) gene name (HUGO), (ii) drug name and (iii) disease-related keywords. The output provides drug-related infor-mation, metabolizing enzymes and drug targets, as well as protein– protein interaction data. Importantly, tagSNPs of the selected genes are retrieved for genotyping analyses. All drug-based and protein–pro-tein interaction-based SNP genotyping information are provided with PCR-RFLP (PCR-restriction enzyme length polymorphism) and TaqMan probes. Thus, users can enter any drug keywords/brand names to obtain immediate information that is highly relevant to gen-otyping for pharmacogenomics research.
Availability and implementation: Drug-SNPing and its user manual are freely available at http://bio.kuas.edu.tw/drug-snping/.
Contact: [email protected]; [email protected]; [email protected]
Received on November 10, 2012; revised on December 20, 2012; accepted on January 19, 2013
1
INTRODUCTION
Pharmacogenomics is a type of personalized medicine that
stu-dies drug therapy in terms of efficacy or adverse events with
respect to each patient’s genetic variation such as single
nucleo-tide polymorphisms (SNPs) (Daly, 2010). SNP profiling in terms
of significantly informative SNPs is an important factor when
carrying out drug selection, assessing dosages and deciding on a
therapeutic approach. Associating drug responses with a
pa-tient’s SNP genotype for genes related to the drug metabolism
and targeting is straight forward if the relevant
pharmacogen-omics information is readily available.
DrugBank (Knox et al., 2011; Wishart, 2008; Wishart et al.,
2008) and PharmGKB (Gong et al., 2008; Owen et al., 2008;
Sangkuhl et al., 2008; Thorn et al., 2010) are two notable
data-bases for pharmacogenomics-related information. Although
many SNP technologies associated with drug discovery have
been widely discussed (Beckstead et al., 2008; Chuang et al.,
2008; Shen et al., 2009; Voisey and Morris, 2008), experimental
information that allows widespread SNP genotyping is
unavail-able in DrugBank and PharmGKB.
Recently, many SNP-related tools, such as SNP500Cancer
(Packer et al., 2006), SNP-RFLPing (Chang et al., 2006, 2010),
Seq4SNPs (Field et al., 2009), SNP ID-info (Yang et al., 2008),
PineSAP (Wegrzyn et al., 2009), Seq-SNPing (Chang et al.,
2009a), MapNext (Bao et al., 2009) and CandiSNPer (Schmitt
et al., 2010) have been developed. However, these lack the ability
to allow communication between SNP genotyping information
and drug targeting/metabolism/transporter/carrier. Moreover,
SNP–SNP and protein–protein interactions are not considered
in these SNP-related tools. In contrast, a powerful mining tool
for protein–protein interaction named STRING (Jensen et al.,
2009) has been developed, but no SNP-related information is
included. Therefore, it is still difficult to integrate
cheminfor-matics with SNP interactivity and genotyping for
pharmacogen-omics purposes (Chang et al., 2012).
To overcome these problems, we developed an integrated
pharmacogenomics-based and protein interaction-based
plat-form for SNP genotyping inplat-formation. TagSNPs retrieved
from HapMap (Deloukas and Bentley, 2004; Thorisson et al.,
2005), which can provide narrowed down informative SNP
geno-types, have been integrated in the proposed Drug-SNPing
system. This study presents a novel and user-friendly web-based
pharmacogenomics tool that allows the study of drug-related
SNP interactions and related genotyping.
2
MATERIALS AND METHODS
2.1
Implementation
The flow chart for the six modules used in Drug-SNPing is shown in Figure 1: (i) input module; (ii) drug info query module; (iii) tagSNPs analysis module; (iv) SNP-RFLP analysis module; (v) STRING *To whom correspondence should be addressed.
758
ß The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected]at National Kaohsiung University of Applied Sciences on July 14, 2014
http://bioinformatics.oxfordjournals.org/
these genes can not be found, e.g. http://www.drugbank.
ca/search/search?query¼FABP7,
http://www.drugbank.ca/sea
rch/search?query¼PADI4 and http://www.drugbank.ca/search/
search?query¼PTGS2. In contrast, Drug-SNPing allows input
based on the HUGO gene name, which improves the
gene-centric search for drugs with input drug targets. This is
demonstrated in the user manual for Drug-SNPing. In
Drug-SNPing, we also integrated PharmGKB (Gong et al.,
2008; Owen et al., 2008; Sangkuhl et al., 2008; Thorn et al.,
2010) for annotation of important genes related to drug
responses and pathways as a complement to DrugBank. The
contributions in Drug-SNPing, PharmGKB and DrugBank are
shown in Table 1.
The importance of SNP–SNP interactions in association
stu-dies is increasing, but most stustu-dies only focus their data analysis
on limited and known SNPs (Lin et al., 2008, 2009; Yang et al.,
2009, 2012; Yen et al., 2008; Zheng et al., 2008). To fill this gap,
we incorporated a widely used protein–protein interaction tool,
STRING (Jensen et al., 2009), in Drug-SNPing. We perform
SNP–SNP interaction analysis based on the protein–protein
interaction data from STRING via online retrieval. In the
cur-rent version of Drug-SNPing, however, many other
protein–pro-tein interaction tools are not yet included, such as GWIDD
(Kundrotas et al., 2010) for genome-wide protein docking
data-base, STITCH 2 (Kuhn et al., 2010) for an interaction network
database for small molecules and proteins, Gene Interaction
Miner (Ikin et al., 2010) for data mining contextual information
for protein–protein interaction analysis, Path (Zamar et al.,
2009) to facilitate pathway-based genetic association analysis
and Polymorphism Interaction Analysis (PIA) (Mechanic
et al., 2008) as a method for investigating complex gene–gene
interactions, and protein–protein interaction databases (Lehne
and Schlitt, 2009). Similarly, more drug-related
chemoinfor-matics resources are also not yet included in the current version
of Drug-SNPing, such as DCDB (Liu et al., 2010) as a drug
combination database, drug-binding databases (Timmers et al.,
2008), SuperSite (Andre Bauer et al., 2008) as a dictionary of
metabolite and drug-binding sites in proteins and the drug
adverse reaction target database (DART) (Ji et al., 2003) for
proteins related to adverse drug reactions. In future, we intend
to integrate the available ftp data released in these tools in order
to add more value to the Drug-SNPing product.
The performance of the Drug-SNPing was assessed by at least
50 different inputs (see the section of ‘Assessment of
Drug-SNPing’
at
http://bio.kuas.edu.tw/drug-snping/user_manual.
jsp), and all of them were functional to get the output by
Drug-SNPing. Moreover, the operating time for integration in
Drug-SNPing can save time compared with non-integration.
5
CONCLUSION
In this article, we developed a novel integrated web-based
inter-face, Drug-SNPing, that provides user-friendly connections
be-tween the tools DrugBank for chemoinformatics, pharmGKB
for pharmcogenomics, STRING for protein–protein interaction,
genotyping information for TaqMan probes and PCR-RFLP.
Moreover, gene-centric inputs can also be used as a means of
finding corresponding drugs for drug targeting and metabolism.
This tool thus provides all the drug-orientated and gene-centric
inputs which are needed to mine all possible metabolic enzymes
and targets for these drugs, as well as bench mark information
that allows SNP genotyping to be performed. This platform will
be helpful for the development of pharmacogenomics as part of
personalized medicine.
Funding:
National Science Council in Taiwan (grants
NSC101-2622-E-151-027-CC3, NSC100-2221-E-214-071,
NSC98-2622-E-151-001-CC2 and NSC101-2221-E-464-001, and the funds of the
Department of Health, Executive Yuan, R.O.C. (TAIWAN)
(DOH102-TD-C-111-002); the Kaohsiung Medical University
Research
Foundation
(KMUER001);
the
NSYSU-KMU
JOINT RESEARCH PROJECT (#NSYSUKMU 102-034; and
the I-Shou University plan (ISU100-02-05).
Conflict of Interest: none declared.
REFERENCES
Andre Bauer,R. et al. (2008) SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res., 37, D195–D200.
Bao,H. et al. (2009) MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads. BMC Genomics, 10 (Suppl. 3), S13. Beckstead,W.A. et al. (2008) SNP2RFLP: a computational tool to facilitate genetic
mapping using benchtop analysis of SNPs. Mamm. Genome, 19, 687–690. Burgarella,S. et al. (2005) MicroGen: a MIAME compliant web system for
micro-array experiment information and workflow management. BMC Bioinformatics, 6 (Suppl. 4), S6.
Burgoon,L.D. et al. (2006) dbZach: a MIAME-compliant toxicogenomic supportive relational database. Toxicol. Sci., 90, 558–568.
Chang,H.W. et al. (2006) SNP-RFLPing: restriction enzyme mining for SNPs in genomes. BMC Genomics, 7, 30.
Chang,H.W. et al. (2009a) Seq-SNPing: multiple-alignment tool for SNP discovery, SNP ID identification, and RFLP genotyping. OMICS, 13, 253–260. Chang,H.W. et al. (2009b) Prim-SNPing: a primer designer for cost-effective SNP
genotyping. Biotechniques, 46, 421–431.
Chang,H.W. et al. (2010) SNP-RFLPing 2: an updated and integrated PCR-RFLP database tool for SNP genotyping. BMC Bioinformatics, 11, 173.
Chang,H.W. et al. (2012) The importance of integrating SNP and cheminformatics resources to pharmacogenomics. Curr. Drug Metab., 13, 991–999.
Chen,X. and Sullivan,P.F. (2003) Single nucleotide polymorphism genotyping: bio-chemistry, protocol, cost and throughput. Pharmacogenomics J., 3, 77–96. Chen,J. et al. (2005) ChemDB: a public database of small molecules and related
chemoinformatics resources. Bioinformatics, 21, 4133–4139.
Table 1. The contributions in Drug-SNPing, PharmGKB and DrugBank
Function Drug-SNPing PharmGKB DrugBank
System
Disease information 3 3 7
TaqMan assays information 3 7 7
PCR-RFLP primers 3 7 7
Protein–protein interaction 3 7 7
Clinical interpretations 7 3 7
Pathways 7 3 3
Rich resources and database download
7 3 3
Drug target information 3 7 3
Chemical formula query 3 7 3
Note: Symbol ‘3’ means that the function is provided, and symbol ‘7’ means that the function is not provided.
763
Drug-SNPing
at National Kaohsiung University of Applied Sciences on July 14, 2014
http://bioinformatics.oxfordjournals.org/
Chuang,L.Y. et al. (2008) Restriction enzyme mining for SNPs in genomes. Anticancer Res., 28, 2001–2007.
Daly,A.K. (2010) Genome-wide association studies in pharmacogenomics. Nat. Rev., 11, 241–246.
Deloukas,P. and Bentley,D. (2004) The HapMap project and its application to genetic studies of drug response. Pharmacogenomics J., 4, 88–90.
Demirci,F.Y. et al. (2007) Association study of Toll-like receptor 5 (TLR5) and Toll-like receptor 9 (TLR9) polymorphisms in systemic lupus erythematosus. J. Rheumatol., 34, 1708–1711.
Field,H.I. et al. (2009) Seq4SNPs: new software for retrieval of multiple, accurately annotated DNA sequences, ready formatted for SNP assay design. BMC Bioinformatics, 10, 180.
Freimuth,R.R. et al. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum. Mutat., 25, 110–117. Gong,L. et al. (2008) PharmGKB: an integrated resource of pharmacogenomic data
and knowledge. Curr. Protoc. Bioinformatics, Chapter 14, Unit14.17. Hayes,K.R. et al. (2005) EDGE: a centralized resource for the comparison, analysis,
and distribution of toxicogenomic information. Mol. Pharmacol., 67, 1360–1368.
Hug,H. et al. (2003) ADRIS—the adverse drug reactions information scheme. Pharmacogenetics, 13, 767–772.
Ikin,A. et al. (2010) The Gene Interaction Miner: a new tool for data mining con-textual information for protein-protein interaction analysis. Bioinformatics, 26, 283–284.
Jensen,L.J. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416.
Ji,Z.L. et al. (2003) Drug Adverse Reaction Target Database (DART): proteins related to adverse drug reactions. Drug Saf., 26, 685–690.
Knox,C. et al. (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res., 39, D1035–D1041.
Kocsis,A.K. et al. (2008) Association of beta-defensin 1 single nucleotide polymorphisms with Crohn’s disease. Scand. J. Gastroenterol., 43, 299–307. Kuhn,M. et al. (2010) STITCH 2: an interaction network database for small
molecules and proteins. Nucleic Acids Res., 38, D552–D556.
Kundrotas,P.J. et al. (2010) GWIDD: genome-wide protein docking database. Nucleic Acids Res., 38, D513–D517.
Lehne,B. and Schlitt,T. (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum. Genomics, 3, 291–297.
Lin,G.T. et al. (2008) SNP combinations in chromosome-wide genes are associated with bone mineral density in Taiwanese women. Chin. J. Physiol., 91, 1–10. Lin,G.T. et al. (2009) Combinational polymorphisms of seven CXCL12-related
genes are protective against breast cancer in Taiwan. OMICS, 13, 165–172. Liu,Y. et al. (2010) DCDB: drug combination database. Bioinformatics, 26,
587–588.
Masciocchi,J. et al. (2009) MMsINC: a large-scale chemoinformatics database. Nucleic Acids Res., 37, D284–D290.
Mattingly,C.J. et al. (2003) The Comparative Toxicogenomics Database (CTD). Environ. Health Perspect., 111, 793–795.
Mattingly,C.J. et al. (2006) The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies. J. Exp. Zool. A Comp. Exp. Biol., 305, 689–692.
Mechanic,L.E. et al. (2008) Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions. BMC Bioinformatics, 9, 146. Miteva,M.A. et al. (2006) FAF-Drugs: free ADME/tox filtering of compound
collections. Nucleic Acids Res., 34, W738–W744.
Neugebauer,A. et al. (2007) Prediction of protein-protein interaction inhibitors by chemoinformatics and machine learning methods. J. Med. Chem., 50, 4665–4668.
Owen,R.P. et al. (2008) PharmGKB and the International Warfarin Pharmacogenetics Consortium: the changing role for pharmacogenomic data-bases and single-drug pharmacogenetics. Hum Mutat, 29, 456–460.
Packer,B.R. et al. (2006) SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes. Nucleic Acids Res., 34, D617–D621.
Peters,E.J. and McLeod,H.L. (2008) Ability of whole-genome SNP arrays to cap-ture ‘must have’ pharmacogenomic variants. Pharmacogenomics, 9, 1573–1577. Roberts,R.J. et al. (2010) REBASE–a database for DNA restriction and
modifica-tion: enzymes, genes and genomes. Nucleic Acids Res., 38, D234–D236. Royo,J.L. and Galan,J.J. (2009) Pyrosequencing for SNP genotyping. Methods
Mol. Biol., 578, 123–133.
Salter,A.H. (2005) Large-scale databases in toxicogenomics. Pharmacogenomics, 6, 749–754.
Sangkuhl,K. et al. (2008) PharmGKB: understanding the effects of individual genetic variants. Drug Metab. Rev., 40, 539–551.
Schmitt,A.O. et al. (2010) CandiSNPer: a web tool for the identification of candidate SNPs for causal variants. Bioinformatics, 26, 969–970.
Shen,G.Q. et al. (2009) The TaqMan method for SNP genotyping. Methods Mol. Biol., 578, 293–306.
Sherry,S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311.
Sun,L.Z. et al. (2002) ADME-AP: a database of ADME associated proteins. Bioinformatics, 18, 1699–1700.
Thorisson,G.A. et al. (2005) The International HapMap Project Web site. Genome Res., 15, 1592–1593.
Thorn,C.F. et al. (2010) Pharmacogenomics and bioinformatics: PharmGKB. Pharmacogenomics, 11, 501–505.
Timmers,L.F. et al. (2008) Drug-binding databases. Curr. Drug Targets, 9, 1092–1099.
Tong,W. et al. (2003) ArrayTrack–supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect., 111, 1819–1826.
Voisey,J. and Morris,C.P. (2008) SNP technologies for drug discovery: a current review. Curr. Drug Discov. Technol., 5, 230–235.
Wang,H. et al. (2007) Chemical data mining of the NCI human tumor cell line database. J. Chem. Inf. Model., 47, 2063–2076.
Wegrzyn,J.L. et al. (2009) PineSAP–sequence alignment and SNP identification pipeline. Bioinformatics, 25, 2609–2610.
Wishart,D.S. (2007) In silico drug exploration and discovery using DrugBank. Curr. Protoc. Bioinformatics, Chapter 14, Unit 14.14.
Wishart,D.S. (2008) DrugBank and its relevance to pharmacogenomics. Pharmacogenomics, 9, 1155–1162.
Wishart,D.S. et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res., 36, D901–D906.
Xirasagar,S. et al. (2006) Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application. Bioinformatics, 22, 874–882.
Yang,C.H. et al. (2008) SNP ID-info: SNP ID searching and visualization platform. OMICS, 12, 217–226.
Yang,C.H. et al. (2009) Novel generating protective single nucleotide polymorphism barcode for breast cancer using particle swarm optimization. Cancer Epidemiol., 33, 147–154.
Yang,C.H. et al. (2012) Single nucleotide polymorphism barcoding to evaluate oral cancer risk using odds ratio-based genetic algorithms. Kaohsiung J. Med. Sci., 28, 362–368.
Yen,C.Y. et al. (2008) Combinational polymorphisms of four DNA repair genes XRCC1, XRCC2, XRCC3, and XRCC4 and their association with oral cancer in Taiwan. J. Oral Pathol. Med., 37, 271–277.
Yoshiya,G. et al. (2008) Influence of cancer-related gene polymorphisms on clinicopathological features in colorectal cancer. J. Gastroenterol. Hepatol., 23, 948–953.
Zamar,D. et al. (2009) Path: a tool to facilitate pathway-based genetic association analysis. Bioinformatics, 25, 2444–2446.
Zheng,C.J. et al. (2007) PharmGED: pharmacogenetic effect database. Nucleic Acids Res., 35, D794–D799.
Zheng,S.L. et al. (2008) Cumulative association of five genetic variants with prostate cancer. N Engl. J. Med., 358, 910–919.
764
C.-H.Yang et al.