Drug-SNPing: an integrated drug-based, protein interaction-based tagSNP-based pharmacogenomics platform for SNP genotyping

(1)

Vol. 29 no. 6 2013, pages 758–764

BIOINFORMATICS

ORIGINAL PAPER

doi:10.1093/bioinformatics/btt037

Data and text mining

Advance Access publication February 15, 2013

Drug-SNPing: an integrated drug-based, protein interaction-based

tagSNP-based pharmacogenomics platform for SNP genotyping

Cheng-Hong Yang

1 , Yu-Huei Cheng

2,

*, Li-Yeh Chuang

3,

_{* and Hsueh-Wei Chang}

4,5,6,

_*

1

Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung,

2

Department of

Digital Content Design and Management, Toko University, Chiayi,

3

Department of Chemical Engineering & Institute of

Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung,

4

Department of Biomedical Science and

Environmental Biology,

5

Graduate Institute of Natural Products and

6

Cancer Center, Kaohsiung Medical University

Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan

Associate Editor: Martin Bishop

ABSTRACT

Summary: Many drug or single nucleotide polymorphism (SNP)-related resources and tools have been developed, but connecting and integrating them is still a challenge. Here, we describe a user-friendly web-based software package, named Drug-SNPing, which provides a platform for the integration of drug information (DrugBank and PharmGKB), protein–protein interactions (STRING), tagSNP selection (HapMap) and genotyping information (dbSNP, REBASE and SNP500Cancer). DrugBank-based inputs include the following: (i) common name of the drug, (ii) synonym or drug brand name, (iii) gene name (HUGO) and (iv) keywords. PharmGKB-based inputs include the following: (i) gene name (HUGO), (ii) drug name and (iii) disease-related keywords. The output provides drug-related infor-mation, metabolizing enzymes and drug targets, as well as protein– protein interaction data. Importantly, tagSNPs of the selected genes are retrieved for genotyping analyses. All drug-based and protein–pro-tein interaction-based SNP genotyping information are provided with PCR-RFLP (PCR-restriction enzyme length polymorphism) and TaqMan probes. Thus, users can enter any drug keywords/brand names to obtain immediate information that is highly relevant to gen-otyping for pharmacogenomics research.

Availability and implementation: Drug-SNPing and its user manual are freely available at http://bio.kuas.edu.tw/drug-snping/.

Contact: [email protected]; [email protected]; [email protected]

Received on November 10, 2012; revised on December 20, 2012; accepted on January 19, 2013

1 INTRODUCTION

Pharmacogenomics is a type of personalized medicine that

stu-dies drug therapy in terms of efficacy or adverse events with

respect to each patient’s genetic variation such as single

nucleo-tide polymorphisms (SNPs) (Daly, 2010). SNP profiling in terms

of significantly informative SNPs is an important factor when

carrying out drug selection, assessing dosages and deciding on a

therapeutic approach. Associating drug responses with a

pa-tient’s SNP genotype for genes related to the drug metabolism

and targeting is straight forward if the relevant

pharmacogen-omics information is readily available.

DrugBank (Knox et al., 2011; Wishart, 2008; Wishart et al.,

2008) and PharmGKB (Gong et al., 2008; Owen et al., 2008;

Sangkuhl et al., 2008; Thorn et al., 2010) are two notable

data-bases for pharmacogenomics-related information. Although

many SNP technologies associated with drug discovery have

been widely discussed (Beckstead et al., 2008; Chuang et al.,

2008; Shen et al., 2009; Voisey and Morris, 2008), experimental

information that allows widespread SNP genotyping is

unavail-able in DrugBank and PharmGKB.

Recently, many SNP-related tools, such as SNP500Cancer

(Packer et al., 2006), SNP-RFLPing (Chang et al., 2006, 2010),

Seq4SNPs (Field et al., 2009), SNP ID-info (Yang et al., 2008),

PineSAP (Wegrzyn et al., 2009), Seq-SNPing (Chang et al.,

2009a), MapNext (Bao et al., 2009) and CandiSNPer (Schmitt

et al., 2010) have been developed. However, these lack the ability

to allow communication between SNP genotyping information

and drug targeting/metabolism/transporter/carrier. Moreover,

SNP–SNP and protein–protein interactions are not considered

in these SNP-related tools. In contrast, a powerful mining tool

for protein–protein interaction named STRING (Jensen et al.,

2009) has been developed, but no SNP-related information is

included. Therefore, it is still difficult to integrate

cheminfor-matics with SNP interactivity and genotyping for

pharmacogen-omics purposes (Chang et al., 2012).

To overcome these problems, we developed an integrated

pharmacogenomics-based and protein interaction-based

plat-form for SNP genotyping inplat-formation. TagSNPs retrieved

from HapMap (Deloukas and Bentley, 2004; Thorisson et al.,

2005), which can provide narrowed down informative SNP

geno-types, have been integrated in the proposed Drug-SNPing

system. This study presents a novel and user-friendly web-based

pharmacogenomics tool that allows the study of drug-related

SNP interactions and related genotyping.

2 MATERIALS AND METHODS

2.1 Implementation

The flow chart for the six modules used in Drug-SNPing is shown in Figure 1: (i) input module; (ii) drug info query module; (iii) tagSNPs analysis module; (iv) SNP-RFLP analysis module; (v) STRING *To whom correspondence should be addressed.

758 at National Kaohsiung University of Applied Sciences on July 14, 2014

http://bioinformatics.oxfordjournals.org/

(2)

these genes can not be found, e.g. http://www.drugbank.

ca/search/search?query¼FABP7,

http://www.drugbank.ca/sea

rch/search?query¼PADI4 and http://www.drugbank.ca/search/

search?query¼PTGS2. In contrast, Drug-SNPing allows input

based on the HUGO gene name, which improves the

gene-centric search for drugs with input drug targets. This is

demonstrated in the user manual for Drug-SNPing. In

Drug-SNPing, we also integrated PharmGKB (Gong et al.,

2008; Owen et al., 2008; Sangkuhl et al., 2008; Thorn et al.,

2010) for annotation of important genes related to drug

responses and pathways as a complement to DrugBank. The

contributions in Drug-SNPing, PharmGKB and DrugBank are

shown in Table 1.

The importance of SNP–SNP interactions in association

stu-dies is increasing, but most stustu-dies only focus their data analysis

on limited and known SNPs (Lin et al., 2008, 2009; Yang et al.,

2009, 2012; Yen et al., 2008; Zheng et al., 2008). To fill this gap,

we incorporated a widely used protein–protein interaction tool,

STRING (Jensen et al., 2009), in Drug-SNPing. We perform

SNP–SNP interaction analysis based on the protein–protein

interaction data from STRING via online retrieval. In the

cur-rent version of Drug-SNPing, however, many other

protein–pro-tein interaction tools are not yet included, such as GWIDD

(Kundrotas et al., 2010) for genome-wide protein docking

data-base, STITCH 2 (Kuhn et al., 2010) for an interaction network

database for small molecules and proteins, Gene Interaction

Miner (Ikin et al., 2010) for data mining contextual information

for protein–protein interaction analysis, Path (Zamar et al.,

2009) to facilitate pathway-based genetic association analysis

and Polymorphism Interaction Analysis (PIA) (Mechanic

et al., 2008) as a method for investigating complex gene–gene

interactions, and protein–protein interaction databases (Lehne

and Schlitt, 2009). Similarly, more drug-related

chemoinfor-matics resources are also not yet included in the current version

of Drug-SNPing, such as DCDB (Liu et al., 2010) as a drug

combination database, drug-binding databases (Timmers et al.,

2008), SuperSite (Andre Bauer et al., 2008) as a dictionary of

metabolite and drug-binding sites in proteins and the drug

adverse reaction target database (DART) (Ji et al., 2003) for

proteins related to adverse drug reactions. In future, we intend

to integrate the available ftp data released in these tools in order

to add more value to the Drug-SNPing product.

The performance of the Drug-SNPing was assessed by at least

50 different inputs (see the section of ‘Assessment of

Drug-SNPing’

at

http://bio.kuas.edu.tw/drug-snping/user_manual.

jsp), and all of them were functional to get the output by

Drug-SNPing. Moreover, the operating time for integration in

Drug-SNPing can save time compared with non-integration.

5 CONCLUSION

In this article, we developed a novel integrated web-based

inter-face, Drug-SNPing, that provides user-friendly connections

be-tween the tools DrugBank for chemoinformatics, pharmGKB

for pharmcogenomics, STRING for protein–protein interaction,

genotyping information for TaqMan probes and PCR-RFLP.

Moreover, gene-centric inputs can also be used as a means of

finding corresponding drugs for drug targeting and metabolism.

This tool thus provides all the drug-orientated and gene-centric

inputs which are needed to mine all possible metabolic enzymes

and targets for these drugs, as well as bench mark information

that allows SNP genotyping to be performed. This platform will

be helpful for the development of pharmacogenomics as part of

personalized medicine.

Funding:

National Science Council in Taiwan (grants

NSC101-2622-E-151-027-CC3, NSC100-2221-E-214-071,

NSC98-2622-E-151-001-CC2 and NSC101-2221-E-464-001, and the funds of the

Department of Health, Executive Yuan, R.O.C. (TAIWAN)

(DOH102-TD-C-111-002); the Kaohsiung Medical University

Research

Foundation

(KMUER001);

the

NSYSU-KMU

JOINT RESEARCH PROJECT (#NSYSUKMU 102-034; and

the I-Shou University plan (ISU100-02-05).

Conflict of Interest: none declared.

REFERENCES

Andre Bauer,R. et al. (2008) SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res., 37, D195–D200.

Bao,H. et al. (2009) MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads. BMC Genomics, 10 (Suppl. 3), S13. Beckstead,W.A. et al. (2008) SNP2RFLP: a computational tool to facilitate genetic

mapping using benchtop analysis of SNPs. Mamm. Genome, 19, 687–690. Burgarella,S. et al. (2005) MicroGen: a MIAME compliant web system for

micro-array experiment information and workflow management. BMC Bioinformatics, 6 (Suppl. 4), S6.

Burgoon,L.D. et al. (2006) dbZach: a MIAME-compliant toxicogenomic supportive relational database. Toxicol. Sci., 90, 558–568.

Chang,H.W. et al. (2006) SNP-RFLPing: restriction enzyme mining for SNPs in genomes. BMC Genomics, 7, 30.

Chang,H.W. et al. (2009a) Seq-SNPing: multiple-alignment tool for SNP discovery, SNP ID identification, and RFLP genotyping. OMICS, 13, 253–260. Chang,H.W. et al. (2009b) Prim-SNPing: a primer designer for cost-effective SNP

genotyping. Biotechniques, 46, 421–431.

Chang,H.W. et al. (2010) SNP-RFLPing 2: an updated and integrated PCR-RFLP database tool for SNP genotyping. BMC Bioinformatics, 11, 173.

Chang,H.W. et al. (2012) The importance of integrating SNP and cheminformatics resources to pharmacogenomics. Curr. Drug Metab., 13, 991–999.

Chen,X. and Sullivan,P.F. (2003) Single nucleotide polymorphism genotyping: bio-chemistry, protocol, cost and throughput. Pharmacogenomics J., 3, 77–96. Chen,J. et al. (2005) ChemDB: a public database of small molecules and related

chemoinformatics resources. Bioinformatics, 21, 4133–4139.

Table 1. The contributions in Drug-SNPing, PharmGKB and DrugBank

Function Drug-SNPing PharmGKB DrugBank

System

Disease information 3 3 7

TaqMan assays information 3 7 7

PCR-RFLP primers 3 7 7

Protein–protein interaction 3 7 7

Clinical interpretations 7 3 7

Pathways 7 3 3

Rich resources and database download

7 3 3

Drug target information 3 7 3

Chemical formula query 3 7 3

Note: Symbol ‘3’ means that the function is provided, and symbol ‘7’ means that the function is not provided.

763

Drug-SNPing

at National Kaohsiung University of Applied Sciences on July 14, 2014

http://bioinformatics.oxfordjournals.org/

(3)

Chuang,L.Y. et al. (2008) Restriction enzyme mining for SNPs in genomes. Anticancer Res., 28, 2001–2007.

Daly,A.K. (2010) Genome-wide association studies in pharmacogenomics. Nat. Rev., 11, 241–246.

Deloukas,P. and Bentley,D. (2004) The HapMap project and its application to genetic studies of drug response. Pharmacogenomics J., 4, 88–90.

Demirci,F.Y. et al. (2007) Association study of Toll-like receptor 5 (TLR5) and Toll-like receptor 9 (TLR9) polymorphisms in systemic lupus erythematosus. J. Rheumatol., 34, 1708–1711.

Field,H.I. et al. (2009) Seq4SNPs: new software for retrieval of multiple, accurately annotated DNA sequences, ready formatted for SNP assay design. BMC Bioinformatics, 10, 180.

Freimuth,R.R. et al. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum. Mutat., 25, 110–117. Gong,L. et al. (2008) PharmGKB: an integrated resource of pharmacogenomic data

and knowledge. Curr. Protoc. Bioinformatics, Chapter 14, Unit14.17. Hayes,K.R. et al. (2005) EDGE: a centralized resource for the comparison, analysis,

and distribution of toxicogenomic information. Mol. Pharmacol., 67, 1360–1368.

Hug,H. et al. (2003) ADRIS—the adverse drug reactions information scheme. Pharmacogenetics, 13, 767–772.

Ikin,A. et al. (2010) The Gene Interaction Miner: a new tool for data mining con-textual information for protein-protein interaction analysis. Bioinformatics, 26, 283–284.

Jensen,L.J. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416.

Ji,Z.L. et al. (2003) Drug Adverse Reaction Target Database (DART): proteins related to adverse drug reactions. Drug Saf., 26, 685–690.

Knox,C. et al. (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res., 39, D1035–D1041.

Kocsis,A.K. et al. (2008) Association of beta-defensin 1 single nucleotide polymorphisms with Crohn’s disease. Scand. J. Gastroenterol., 43, 299–307. Kuhn,M. et al. (2010) STITCH 2: an interaction network database for small

molecules and proteins. Nucleic Acids Res., 38, D552–D556.

Kundrotas,P.J. et al. (2010) GWIDD: genome-wide protein docking database. Nucleic Acids Res., 38, D513–D517.

Lehne,B. and Schlitt,T. (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum. Genomics, 3, 291–297.

Lin,G.T. et al. (2008) SNP combinations in chromosome-wide genes are associated with bone mineral density in Taiwanese women. Chin. J. Physiol., 91, 1–10. Lin,G.T. et al. (2009) Combinational polymorphisms of seven CXCL12-related

genes are protective against breast cancer in Taiwan. OMICS, 13, 165–172. Liu,Y. et al. (2010) DCDB: drug combination database. Bioinformatics, 26,

587–588.

Masciocchi,J. et al. (2009) MMsINC: a large-scale chemoinformatics database. Nucleic Acids Res., 37, D284–D290.

Mattingly,C.J. et al. (2003) The Comparative Toxicogenomics Database (CTD). Environ. Health Perspect., 111, 793–795.

Mattingly,C.J. et al. (2006) The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies. J. Exp. Zool. A Comp. Exp. Biol., 305, 689–692.

Mechanic,L.E. et al. (2008) Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions. BMC Bioinformatics, 9, 146. Miteva,M.A. et al. (2006) FAF-Drugs: free ADME/tox filtering of compound

collections. Nucleic Acids Res., 34, W738–W744.

Neugebauer,A. et al. (2007) Prediction of protein-protein interaction inhibitors by chemoinformatics and machine learning methods. J. Med. Chem., 50, 4665–4668.

Owen,R.P. et al. (2008) PharmGKB and the International Warfarin Pharmacogenetics Consortium: the changing role for pharmacogenomic data-bases and single-drug pharmacogenetics. Hum Mutat, 29, 456–460.

Packer,B.R. et al. (2006) SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes. Nucleic Acids Res., 34, D617–D621.

Peters,E.J. and McLeod,H.L. (2008) Ability of whole-genome SNP arrays to cap-ture ‘must have’ pharmacogenomic variants. Pharmacogenomics, 9, 1573–1577. Roberts,R.J. et al. (2010) REBASE–a database for DNA restriction and

modifica-tion: enzymes, genes and genomes. Nucleic Acids Res., 38, D234–D236. Royo,J.L. and Galan,J.J. (2009) Pyrosequencing for SNP genotyping. Methods

Mol. Biol., 578, 123–133.

Salter,A.H. (2005) Large-scale databases in toxicogenomics. Pharmacogenomics, 6, 749–754.

Sangkuhl,K. et al. (2008) PharmGKB: understanding the effects of individual genetic variants. Drug Metab. Rev., 40, 539–551.

Schmitt,A.O. et al. (2010) CandiSNPer: a web tool for the identification of candidate SNPs for causal variants. Bioinformatics, 26, 969–970.

Shen,G.Q. et al. (2009) The TaqMan method for SNP genotyping. Methods Mol. Biol., 578, 293–306.

Sherry,S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311.

Sun,L.Z. et al. (2002) ADME-AP: a database of ADME associated proteins. Bioinformatics, 18, 1699–1700.

Thorisson,G.A. et al. (2005) The International HapMap Project Web site. Genome Res., 15, 1592–1593.

Thorn,C.F. et al. (2010) Pharmacogenomics and bioinformatics: PharmGKB. Pharmacogenomics, 11, 501–505.

Timmers,L.F. et al. (2008) Drug-binding databases. Curr. Drug Targets, 9, 1092–1099.

Tong,W. et al. (2003) ArrayTrack–supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect., 111, 1819–1826.

Voisey,J. and Morris,C.P. (2008) SNP technologies for drug discovery: a current review. Curr. Drug Discov. Technol., 5, 230–235.

Wang,H. et al. (2007) Chemical data mining of the NCI human tumor cell line database. J. Chem. Inf. Model., 47, 2063–2076.

Wegrzyn,J.L. et al. (2009) PineSAP–sequence alignment and SNP identification pipeline. Bioinformatics, 25, 2609–2610.

Wishart,D.S. (2007) In silico drug exploration and discovery using DrugBank. Curr. Protoc. Bioinformatics, Chapter 14, Unit 14.14.

Wishart,D.S. (2008) DrugBank and its relevance to pharmacogenomics. Pharmacogenomics, 9, 1155–1162.

Wishart,D.S. et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res., 36, D901–D906.

Xirasagar,S. et al. (2006) Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application. Bioinformatics, 22, 874–882.

Yang,C.H. et al. (2008) SNP ID-info: SNP ID searching and visualization platform. OMICS, 12, 217–226.

Yang,C.H. et al. (2009) Novel generating protective single nucleotide polymorphism barcode for breast cancer using particle swarm optimization. Cancer Epidemiol., 33, 147–154.

Yang,C.H. et al. (2012) Single nucleotide polymorphism barcoding to evaluate oral cancer risk using odds ratio-based genetic algorithms. Kaohsiung J. Med. Sci., 28, 362–368.

Yen,C.Y. et al. (2008) Combinational polymorphisms of four DNA repair genes XRCC1, XRCC2, XRCC3, and XRCC4 and their association with oral cancer in Taiwan. J. Oral Pathol. Med., 37, 271–277.

Yoshiya,G. et al. (2008) Influence of cancer-related gene polymorphisms on clinicopathological features in colorectal cancer. J. Gastroenterol. Hepatol., 23, 948–953.

Zamar,D. et al. (2009) Path: a tool to facilitate pathway-based genetic association analysis. Bioinformatics, 25, 2444–2446.

Zheng,C.J. et al. (2007) PharmGED: pharmacogenetic effect database. Nucleic Acids Res., 35, D794–D799.

Zheng,S.L. et al. (2008) Cumulative association of five genetic variants with prostate cancer. N Engl. J. Med., 358, 910–919.

764

C.-H.Yang et al.