iScreen: world’s first cloud-computing web server
for virtual screening and de novo drug design based
on TCM database@Taiwan
Tsung-Ying Tsai• Kai-Wei Chang•
Calvin Yu-Chian Chen
Received: 9 February 2011 / Accepted: 16 May 2011 / Published online: 7 June 2011 Ó Springer Science+Business Media B.V. 2011
Abstract The rapidly advancing researches on traditional Chinese medicine (TCM) have greatly intrigued pharma-ceutical industries worldwide. To take initiative in the next generation of drug development, we constructed a cloud-computing system for TCM intelligent screening system (iScreen) based on TCM Database@Taiwan. iScreen is compacted web server for TCM docking and followed by customized de novo drug design. We further implemented a protein preparation tool that both extract protein of interest from a raw input file and estimate the size of ligand bind site. In addition, iScreen is designed in user-friendly graphic interface for users who have less experience with the command line systems. For customized docking, mul-tiple docking services, including standard, in-water, pH environment, and flexible docking modes are implemented. Users can download first 200 TCM compounds of best docking results. For TCM de novo drug design, iScreen provides multiple molecular descriptors for a user’s inter-est. iScreen is the world’s first web server that employs world’s largest TCM database for virtual screening and de novo drug design. We believe our web server can lead TCM research to a new era of drug development. The TCM
docking and screening server is available athttp://iScreen. cmu.edu.tw/.
Keywords Traditional Chinese medicine (TCM) Cloud-computing Docking Screening De novo
Introduction
Traditional Chinese medicine (TCM) is a popular medical practice among Eastern Asia. For thousands of years, TCM has become a vast knowledge bank of medicine [1–7]. As more bioactive compounds isolated from TCM, a com-prehensive drug design operating platform is required for systematically analyzing the therapeutic values of the TCM components. It came to our interest in developing a user-friendly online TCM-based computer-aided drug design (CADD) web server for both TCM docking and de novo drug design. Hence, we introduced a cloud-computing system for intelligent TCM screening (iScreen), which is world’s first TCM docking and de novo drug design web server using TCM Database@Taiwan [8].
At present, there are several web servers available for virtual screening, such as PLANTS [9], GOLD [10, 11], DOCK Blaster [12], LEA3D [13], 3DLigandSite [14], and PharmMapper [15], as well as web servers for de novo drug design, such as GANDI [16], SPROUT [17], HISTE [18], LEGND [19], LEA3D [13], 3DLigandSite [14], PRO_LI-GAND [20], GENSTAR [21], LUDI [22], BUILDER v.2 [23], CONCERTS [24], SYNOPSIS [25], and CoG [26]. However, iScreen is the first web server that introduces TCM into the web-based CADD service platform.
The schematic dataflow of iScreen is illustrated in Fig.1. The web server utilized the uses of world’s largest TCM database [8] to provide a novel CADD service for T.-Y. Tsai K.-W. Chang C. Y.-C. Chen (&)
Laboratory of Computational and Systems Biology, School of Chinese Medicine, China Medical University, Taichung 40402, Taiwan
e-mail: [email protected]; [email protected]
C. Y.-C. Chen
Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
C. Y.-C. Chen
Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
investigating therapeutic values of the TCM compounds. iScreen is free and open to all users and there is no login requirement.
Web server components
iScreen was optimized with the communication between web-based graphic interface and the core system, which comprises with the highly accurate PLANTS [9] and LEA3D [13] software packages. Our web server was fur-ther implemented with a tool for protein preparation and binding site estimation. In addition, iScreen provided cus-tomized parameter definition that better suit a user’s needs.
TCM database
The database used in iScreen was synchronized with self-developed TCM Databse@Taiwan [8]. TCM and the cor-responding information were collected from multiple TCM literature and text, including ‘‘Ben cao gang mu’’, ‘‘Shang han lun’’, ‘‘Shen nong ben cao jing’’, and others [27–31]. These texts characterized the medical properties and the uses of the TCM ingredients. Most of the TMC ingredients come from herbs, and some others come from animals and minerals. The bioactive components of each TCM were obtained from the research reports listed in Medline [32] or ISI Web of Knowledge [33]. The compound 2D and 3D
structures were automated with ChemBioOffice2008 [34]. In addition, the 3D compound structures were optimized by MM2 force field [35]. The database was organized based on TCM classifications. All compounds have passed Li-pinski’s Rule of Five [36] for basic drug-like properties. In addition, for better customized docking analysis, each TCM compound was further optimized in all acidic, neu-tral, and alkaline forms.
Protein preparation tool
Based on the SPORE module in PLANTS docking package [9], the protein preparation tool can extract ligands from the input protein–ligand complex and to perform protonation on the purified protein. In addition, this tool was imple-mented with a function that calculates binding site radius based on user definition. The algorithm was implemented with an iterative evaluation on the space intersection between the binding site and the proposed radius until the customized maximum binding space allocation reached. Fig. 1 Schematic dataflow diagram of iScreen. Users perform virtual
screening for drug-like TCM compounds. The results compounds can be further derived using the de novo function
Fig. 2 Protein preparation tool for ligand extraction and protein protonation. a Graphical display of the functions of protein prepa-ration tool. b Interface of the protein prepaprepa-ration tool. Based on the input protein, ligands are extracted and the binding site information is shown. The size of the binding site can be recalculated on user definition. The prepared proteins are downloadable
Virtual screening
iScreen was optimized with the docking algorithm pro-vided by PLANTS package. The docking algorithm used by PLANTS was based on ant colony optimization [9]. This optimization algorithm has been demonstrated having higher prediction accuracy than GOLD [9]. iScreen further permitted users to adjust docking speed as well as docking under various conditions, such as dock in water, dock in different pH, or flexible dock.
De novo drug design
After virtual screening, iScreen further provided an optional de novo drug design service based on genetic algorithm implemented in LEA3D software package [13]. iScreen reads each customized molecular descriptors, such as XLogP, atom number, and polar solvent accessible surface area, and then evaluates the maximum, minimum, and significances of each for the de novo algorithm.
Function and features
iScreen is an open web server for all users and there is no login requirement. The server focuses on TCM-based vir-tual screening and de novo drug design. Since all available TCM compounds were implemented within the system’s database, users would only need to upload the proteins of interest and optional control ligands for virtual screening. The web system accepts .pdb or .mol2 as input formats and provides customized docking options. iScreen assigns a job ID to each screening or de novo service request. Users can check the job status in the server’s queuing system.
Protein preparation tool in iScreen
The protein preparation tool can be accessed from the ‘‘Tool’’ panel iScreen main screen. The tool only accepts file in pdb format. On submission, the tool recognizes and pulls out all ligands, including water molecule, from the protein of interest. The protein would be prepared by Fig. 3 iScreen provide
customized docking modes for virtual screening function: aStandard mode that runs basic docking algorithm; b Dock in water mode, which considered the involvement of water as an additional parameter; c Specific pH dock mode, which performs docking algorithms based on the pH conditions; and d Flexible dock, which considered flexibilities of the residues for docking
protonation once the ligands were removed. The prepared protein and the ligands from the source file would be ready for download in separate links. Additionally, the binding site data, including the coordinates and the sizes of the cavities, would be measured and displayed. An option is provided to estimate the size of the binding site that best fit user definition. The schematic diagram for protein prepa-ration tool is shown in Fig.2. It is an independent iScreen feature aimed to reduce erroneous docking results due to wrong input data.
Virtual screening
The virtual screening function identifies potential TCM compounds by docking algorithm based on the protein structure and the binding site information. Both pdb and mol2 are acceptable input formats for proteins. The dock-ing function gives users an option to upload a control ligand in either mol2 or sdf format. This ligand can be prepared independently, or obtained from the iScreen preparation tool (Fig.2). For binding site definition, users can define key residues using standard 3-letter amino acid annotations and the residue number numbers with regard to the input protein (Fig.3a). The binding site is defined by either key binding residues or coordination and cavity sizes, which can be calculated from the protein preparation tool. The docking function operates with 19, 29, or 49
speed options, where the 19 speed gives best docking results and the 49 speed runs faster with slightly reduced accuracy. Virtual screening function can be accessed from the ‘‘Protein Docking’’ panel. Five docking modes are available:
(a) Example Mode: Users can operate a sample protein to be familiarized with the iScreen docking system in the Example Mode. For the interactive demonstration, three prepared proteins are provided for sample docking. After selecting the protein of interest, a user can upload an optional control molecule, define binding site information, and choose running speed. Users are given three small ligand sets for the demonstration run.
(b) Standard Mode: The Standard Mode provides basic virtual screening options based on user-defined pro-tein of interest. Most input settings in this mode are similar to the Example Mode, including optional control ligand, binding site information, and running speeds. Comparatively, Standard Mode employs the build-in TCM database for virtual screening (Fig.3a). (c) Docking in Water: This mode mimics molecular docking in solution condition. Users can customize the size of the virtual water globe on top of the Standard Mode (Fig.3b).
(d) Specific pH dock: Since a protein may function differently depends on the pH condition, this mode Fig. 4 Sample webpage for
queuing system and virtual screening results. For queuing system, job information, masked user information, and job status are displayed. The virtual screening results shows protein sequence and ligands with multiple dock scores. Links for download ligands and de novo function are also displayed
simulates the docking scenario in according to the pH condition. iScreen offers ligands under acidic, or neutral, or alkaline conditions (Fig.3c). The web server provides a hyperlink to the H?? web server [37] for adjusting protein states for docking under the desired pH environment.
(e) Flexible dock: This mode is provided for experienced users who are familiar with the given protein struc-ture. This mode allows users to define flexible residues by either or both residue annotation and sequence number. In addition, users can define fixed bonds to constrain the protein movement during flexible docking (Fig.3d).
On submission, the job enters the queuing system for screening as shown in Fig.4. A job ID and a hyperlink would be provided to view the status after job submission. Virtual screening usually takes several hours to several days to complete. Users can record the job IDs and check the job status under the ‘‘Browse Result’’ panel. In addi-tion, iScreen will notify users through email when their jobs are completed. A result page provides the sequence of
input protein, top 200 ligands with docking scores, ligands for download, and the de novo drug design option (Fig.4).
De novo drug design
The de novo drug design option is a follow-up option that offers to one of the top 200 compounds after virtual screening. For customization, the de novo interface pro-vides a list of molecular descriptors, including Molecular weight, XLogp, Number of atoms (H excluded), Number of H-donors, Number of H-acceptors, Polar solvent accessible surface area, Volume, Area, Molecular refractivity, Radius of gyration, Moment of inertia lxx & lyy, Number of rotatable bonds, Number of rings, and Number of aromatics rings. Users can define the minimum and maximum values of each descriptors as well as the corresponding signifi-cance by weight in the final score (Fig. 5). In addition, users are required to provide the number of generations (max = 50) and the population size per generation (max = 40) for the genetic algorithm used in the de novo drug design system. The de novo compounds will be re-evaluated by docking with the proteins of interest (Fig.5). Fig. 5 The flowchart of de
novo drug design: a follow-up option offers to one of the top 10 compounds after virtual screening. The de novo interface provides a list of molecular descriptors for customization. After de novo evolution process, users can visually inspect the new compounds with JMol visualization interface and download the results for further analysis
The de novo jobs are listed in an independent queuing system that can be accessed through the ‘‘Browse Results’’ panel. By selecting the results from de novo job browser, a new window will pop up and list the result compounds with the corresponding evaluations. In addition, users can visually inspect the new compounds with JMol visualiza-tion interface. The de novo results are downloadable for further analysis (Fig.5).
Performance and discussion
The core components of iScreen, including SPORE [9], PLANTS [9], and LEA3D [13], have been thoroughly tested according to the relevant publications. To validate the success rate of the docking algorithm, we redocked 20 known disease-related proteins and measured the root mean square deviation (RMSD) between ligands in crystal structure and the docking result. As shown in Table1, we obtained the success rate of above 70% with the least accurate 49 speed. The outcome suggested the reliability of iScreen docking since these results matched with the validation data from the original publication. Considering iScreen provide TCM-based CADD services, the outcomes could provide insights of scientific approach for the rele-vant medical uses of TCM.
The uniqueness of the TCM database used in iScreen is that the compounds are derived from natural sources which have been recorded for medical uses. These compounds are likely to have implicit therapeutic effects. However, such property is usually not considered in other small molecule databases. It is also possible that the safety profiles of TCM derived compounds may be more favorable because of extended exposure by human beings spanning several 1000 years for human body to develop tolerance.
iScreen was built as a simple and comprehensive TCM-based CADD solution for users who are interested in TCM studies but lack operation experience for CADD software. Hence, the web server provides user-friendly interface and freedoms for the parameter settings. The 3D results of the de novo drug design function, however, are limited by Jmol viewer for website display, which the graphic option was less comparable to the commercial viewers. Nevertheless, all data are downloadable for users’ own display software.
Conclusion
iScreen is the world’s first web server that employs world’s largest TCM database for virtual screening and de novo drug design. This web server is designed with user-friendly graphic interface, multiple docking modes, independent
protein preparation tools, and customized de novo drug design options. With cloud-computing architecture, users are able to operate TCM-based CADD through internet. iScreen is implemented with highly accurate PLANTS and LEA3D software package as the core system. On top of the core system, the attached tool provides both protein prep-aration and binding site estimation functions. iScreen is a sophisticated web server that not only offers a complete CADD service, but also a great contribution to scientific TCM researches.
Acknowledgments The research was supported by grants from the National Science Council of Taiwan (NSC 99-2221-E-039-013-), Committee on Chinese Medicine and Pharmacy (CCMP100-RD-030) China Medical University and Asia University (CMU99-TCM, CMU99-S-02, 25, 26 CMU99-ASIA-27 CMU99-ASIA-28). This study is also supported in part by Taiwan Department of Health Clinical Trial and Research Center of Excel-lence (DOH100-TD-B-111-004) and Taiwan Department of Health Cancer Research Center of Excellence (DOH100-TD-C-111-005). We are grateful to the Asia University cloud-computing facilities. Table 1 Redock of protein–ligand crystalized structures
Protein name (PDB ID) Tested ligand PLANTS (RMSD) Speed 1x Speed 2x Speed 4x Glycosyltransferase A (1LZI) BGH 1.4794 2.1000 1.4569
PDE5 (1UDT) VIA 2.0974 2.1061 2.1108 HSP 90-alpha (1UY8) PU5 5.9181 6.3677 6.2270 NAGAT (1ZI3) NLC 4.1712 4.1426 4.1280 NR2A (2A5S) GLU 0.3429 0.3449 0.8892 IDE (2G56) DIO 1.0003 1.0136 1.0298 Src (2H8H) H8H 2.3955 2.3868 1.2887 N1 (2HU0) G39 1.7498 1.7389 1.6521 EGFR (2ITY) IRE 2.0682 3.1272 6.832 GTB (2RIY) BHE 5.2959 6.0614 6.1026 H1 (2WRG) SIA 5.0830 5.0809 5.0806 p53 (2X0 W) X0 W 4.6246 4.6288 4.1973 N1 (3CKZ) SRT 1.6892 2.2944 2.2209 IDE-inhibitor (3E4A) QIX 1.2224 1.4916 1.2855 CRFR1 (3EHT) MAL 0.5642 0.6732 0.5714 IGF1R (3I81) EBI 1.5962 1.5925 1.5965 PPARg (3K8S) Z27 1.4043 1.4174 1.4131 HSP90 (3K97) 4CD 0.6169 0.6220 0.6177 COX2 (3LN0) 52B 0.6795 0.6507 0.6795 COX1 (3N8X) NIM 0.6995 0.6976 0.7191
Success rate 75% 70% 70%
Twenty disease-linked proteins were randomly selected for the vali-dation test. RMSD was calculated between ligands in the crystalli-zation and after redock. Redock was run in 19, 29, and 49 speed. A test case is considered success with RMSD lower than 2.5 A˚
References
1. Chen CY, Chen CYC (2010) Insights into designing the dual-targeted HER2/HSP90 inhibitors. J Mol Graphics Model 29(1):21–31
2. Chang TT, Huang HJ, Lee KJ, Yu HW, Chen HY, Tsai FJ, Sun MF, Chen CY (2010) Key features for designing phosphodies-terase-5 inhibitors. J Biomol Struct Dyn 28(3):309–321 3. Chen CY, Huang HJ, Tsai FJ, Chen CYC (2010) Drug design for
Influenza A virus subtype H1N1. J Taiwan Inst Chem Eng 41(1):8–15
4. Chen CYC (2010) Virtual screening and drug design for PDE-5 receptor from traditional chinese medicine database. J Biomol Struct Dyn 27(5):627–640
5. Chen CYC (2009) Chemoinformatics and pharmacoinformatics approach for exploring the GABA-A agonist from Chinese herb suanzaoren. J Taiwan Inst Chem Eng 40(1):36–47
6. Chen CYC (2009) Computational screening and design of tradi-tional Chinese medicine (TCM) to block phosphodiesterase-5. J Mol Graphics Model 28(4):261–269
7. Chen KC, Chen CYC (2011) Stroke Prevention by traditional Chinese medicine? A genetic algorithm, support vector machine and molecular dynamics approach. Soft Matter 7(8):4001–4008 8. Chen CYC (2011) TCM Database@Taiwan: the world’s largest
traditional chinese medicine database for drug screening in silico. PLoS One 6(1):e15939
9. Korb O, Stutzle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 49(1):84–96
10. Jones G, Willett P, Glen RC (1995) Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol 245(1):43–53
11. Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267(3):727–748
12. Irwin JJ, Shoichet BK, Mysinger MM, Huang N, Colizzi F, Wassam P, Cao Y (2009) Automated docking screens: a feasi-bility study. J Med Chem 52(18):5712–5720
13. Douguet D, Munier-Lehmann H, Labesse G, Pochet S (2005) LEA3D: a computer-aided ligand design for structure-based drug design. J Med Chem 48(7):2457–2468
14. Wass MN, Kelley LA, Sternberg MJ (2010) 3DLigandSite: pre-dicting ligand-binding sites using similar structures. Nucleic Acids Res 38(Web Server issue):W469–W473
15. Liu X, Ouyang S, Yu B, Liu Y, Huang K, Gong J, Zheng S, Li Z, Li H, Jiang H (2010) PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach. Nucleic Acids Res 38(Web Server issue):W609–W614 16. Dey F, Caflisch A (2008) Fragment-based de novo ligand design by multiobjective evolutionary optimization. J Chem Inf Model 48(3):679–690
17. Law JMS, Fung DYK, Zsoldos Z, Simon A, Szabo Z, Csizmadia IG, Johnson AP (2003) Validation of the SPROUT de novo design program. Theochem-J Mol Struct 666:651–657
18. Danziger DJ, Dean PM (1989) Automated site-directed drug design: the prediction and observation of ligand point positions at hydrogen-bonding regions on protein surfaces. Proc R Soc Lond B Biol Sci 236(1283):115–124
19. Nishibata Y, Itai A (1993) Confirmation of usefulness of a structure construction program based on three-dimensional
receptor structure for rational lead generation. J Med Chem 36(20):2921–2928
20. Murray CW, Clark DE, Byrne DG (1995) PRO_LIGAND: an approach to de novo molecular design. 6. Flexible fitting in the design of peptides. J Comput-Aided Mol Des 9(5):381–395 21. Rotstein SH, Murcko MA (1993) GenStar: a method for de novo
drug design. J Comput-Aided Mol Des 7(1):23–43
22. Bohm HJ (1992) The computer program LUDI: a new method for the de novo design of enzyme inhibitors. J Comput-Aided Mol Des 6(1):61–78
23. Roe DC, Kuntz ID (1995) BUILDER v.2: improving the chem-istry of a de novo design strategy. J Comput-Aided Mol Des 9(3):269–282
24. Pearlman DA, Murcko MA (1996) CONCERTS: dynamic con-nection of fragments as an approach to de novo ligand design. J Med Chem 39(8):1651–1663
25. Vinkers HM, de Jonge MR, Daeyaert FF, Heeres J, Koymans LM, van Lenthe JH, Lewi PJ, Timmerman H, Van Aken K, Janssen PA (2003) SYNOPSIS: synthesize and optimize system in silico. J Med Chem 46(13):2765–2773
26. Brown N, McKay B, Gilardoni F, Gasteiger J (2004) A graph-based genetic algorithm and its application to the multiobjective evolution of median molecules. J Chem Inf Comput Sci 44(3): 1079–1087
27. Chen G, Li S (1992) Ben cao gang mu tong shi = general explanation of compendium of materia medica. Xue yuan chu ban she, Beijing Shi
28. Fang Y, Zhang Z, Miao X (1991) Shang han lun tiao bian, Shanghai gu ji chu ban she : Xin hua shu dian Shanghai fa xing suo fa xing, Shanghai
29. Lu¨ X (2002) Zhong yao jian bie da quan. Hunan ke xue ji shu chu ban she, Changsha Shi
30. Miao X, Zheng J (2002) Shennong ben cao jing shu. Zhong yi gu ji chu ban she, Beijing
31. Nanjing Zhong yi yao da xue, Zhao G, Dai S, Chen R (2006) Zhong yao da ci dian. Shanghai ke xue ji shu chu ban she, Shanghai
32. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu ZY, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang YL, Wilbur WJ, Yaschenko E, Ye J (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38:D5–D16
33. Clark M (1998) The ISI web of science. Abstracts of Papers of the American Chemical Society 216:U525
34. Gunda T (2007) ChemBioOffice. Chem World 4(11):70 35. Karzazi Y, Surpateanu G (1999) An empirical MM2 augmented
force field for the cycloimmonium ylides. J Mol Struct 510(1–3): 197–205
36. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubil-ity and permeabilsolubil-ity in drug discovery and development settings. Adv Drug Del Rev 46(1–3):3–26
37. Gordon JC, Myers JB, Folta T, Shoja V, Heath LS, Onufriev A (2005) H ??: a server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic Acids Res 33(Web Server issue):W368–W371