Chapter 2. Related Studies and Review of Related Methods
2.2 The Use of Protein-Ligand Interaction Profiles in the discovery of Molecular
Since protein-ligand and protein-protein complexes are components of a great number of pharmaceutical [5, 41], nutritional [10] and industrial compounds [29-31] it is reasonable to employ computer-aided lead compound design and discovery methods for other applications besides pharmaceutics. Due to its significant role and impact on the quality of human life, drug design was the main focus in early days of virtual screening and bioinformatics. However, as methods and studies in drug design reveal that VS and post screening analysis are relatively inexpensive and efficient we want to explore the other fields (nutrition, agriculture and industry) which were not given as much attention. Protein-ligand complexes of various compounds interact through similar properties [40] and necessitate similar methods of screening, retrieval and analysis of their crystal structures (Figure 1) regardless what their final application may be.
Therefore, the first part of this research focuses to conduct comparative studies on features and properties of protein-ligand interaction profiles to better understand their relevance in the mining of novel compounds. Additionally, we investigate possibilities of employing interaction profiles in the mining of compounds to be used in other applications besides drug design such as
12
cosmetics, skin care, nutrition, safe fertilizers and pesticides, compounds for scents in perfumes and deodorants and safe detergents. Furthermore, we employ interaction profiles in investigating mechanisms of significant molecules for human health and nutrition (e.g. uptake of vitamin D in the human body by Betalactoglobulin).
Although the interest of researchers in mining novel compounds for other uses besides pharmaceutics is minimal at the present time, as computer-aided methods continue to improve and increase in use, other industries (e.g. cosmetics, agriculture, nutrition) look to employ their benefits. Therefore, the approaches and techniques used in computer-aided drug design can be of particular interest for different biotechnological approaches. VS combined with post screening analysis are seemingly efficient in investigating transporter proteins such as β-lactoglobulin (β-LG), their mechanisms and various functions in the human body. Many compounds having various functions and mechanisms in the body are protein-ligand complexes which can be investigated based on protein-ligand interactions and physico-chemical features.
13
CHAPTER 3
The Relevance of Protein-Ligand Interaction Profiles in Computer-Aided Lead Compound Discovery, Functions and Applications
3.1 Introduction
Identification of protein-ligand interaction networks on a proteome scale is crucial in addressing a wide range of biological issues such as correlating molecular functions to physiological processes and designing safe and efficient target compounds which can be used in therapeutics, nutrition, cosmetics, skin care products, agriculture and industry. In order to understand the role and significance of protein-ligand interactions (Fig. 4) in various applications throughout the field of bioinformatics and biotechnology the properties and functions of a ligand [42, 43] must be well addressed. As seen previously, the ligand (vitamin D, Fig. 1) is a molecule, ion or atom which can bind to a specific location or the binding site of a protein [39, 44].
Currently, antibodies are the most commonly used ligands in biotechnology and life-science investigations, although protein scaffolds (protein regulators), nucleic acids and peptides (repeating structural units in amino acids) are also employed. Since protein-ligands complexes of various compounds are used in cosmetics, hair dyes, skin care products, fertilizers, detergents [29-31] and nutrition supplements [10], protein-ligand interaction profiles and physico-chemical features could be used in the identification of such lead compounds.
a b
Figure 4. View of protein-ligand binding interactions in Betalactoglobulin (a transporter protein) complexed with vitamin D using Swiss PDB viewer. a) Electrostatic potential and molecular surface. b) Hydrogen bond interactions among atoms (green dotted lines).
14
The ligand binding site of the primary target is extracted or predicated from a 3D experimental structure or homology model of proteins [35, 45] and characterized by a geometric potential. Protein-ligand interactions occur when a ligand binds to a protein which is usually integral to the function of its cognate (assimilated or symbiotic) protein. In the binding of a ligand to a protein, the following interactions are of significance: electrostatic forces (interaction between electrically charged particles explained by Coulomb’s law), van der Walls forces (the sum of the attractive or repulsive forces between molecules or parts of the same molecule) and hydrogen bonding (the attractive interaction of a hydrogen atom with an electronegative atom which can occur inter or intramolecularly) [39, 40]. Based on these interactions, evaluations are made using ligand-based approaches employed commonly in pharmacophore modeling by using physical and chemical traits of known ligands to identify novel inhibitors. Another approach, the receptor-based, identifies ligands that use structural and other features on the target receptor to identify the best inhibitor.
Docking [18, 26, 32, 33, 46] is then used to identify the fit between a receptor and the potential ligand by screening a database of ligands against one or more target receptors via two distinct parts: docking (the search scheme to identify suitable conformations or poses) and scoring (a measure of the affinity of various poses). Scoring methods must discriminate between non-native docked conformations and correct binding states of compounds during molecular docking phase to distinguish active compounds (usually a small number) from non-active compounds (an extremely large number) during the post-docking analysis. Although there are over 60 docking programs and tools available [24], we present some of the most popular programs made publicly available (Table 1). DOCK [18], incremental construction (FlexX) [32]
and evolutionary algorithms (GEMDOCK, GOLD, AutoDock) [26, 33, 46] are used to screen and downsize compound groups in order to select suitable candidates for post-screening analysis.
However, inconsistencies in the performance of scoring functions results in inadequate prediction of true binding affinity of a ligand to a receptor; thus, combining various scoring methods in VS may improve performance than in the average individual scoring functions.
Similar inconsistencies have been noticed in information retrieval (IR) and Charifson et al. [15]
proposed a study in which they used an interaction-based consensus approach to combine scoring functions which revealed enrichment in discrimination between active and inactive enzyme inhibitors. Studies by Bissantz et al. [3], Stahl and Rarey [11] and Verdonk et al. [16]
15
showed works on consensus scores which further improved VS enrichment. However, the remaining issue for VS users rather than researchers is when and how these scoring functions should be combined in either drug design or industrial compounds design.
Docking programs URLs REFERENCES
DOCK http://dock.compbio.ucsf.edu/ 18 FlexX http://biosolveit.de/flexx/index.html?ct=1 32 AutoDock http://autodock.scripps.edu/ 46 GEMDOCK http://gemdock.life.nctu.edu.tw/dock/igemdock.php 26
GOLD http://www.ccdc.cam.ac.uk/products/life_sciences/gold/ 33
Table 1. Popular docking tools and evolutionary algorithms currently used in VS
Furthermore, certain VS methods can identify important interactions or binding-site hot spots obtained from known active ligands and target proteins [17]. However, due to biases towards higher molecular weight and charged polar compounds [18] docking alone is not sufficient to analyse, determine and retrieve the most adequate lead compounds therefore post screening analyses are emerging as useful methods to aid with further elimination of false positive hits obtained from VS.
Methods for post-screening analysis employing clustering to identify key features obtained via docked compounds and the understanding of binding mechanisms are of great use in bioinformatics. Therefore, computer-aided drug and industrial target design require VS as a primary step to generate interaction and structure profiles followed by post screening analysis for adequate filtering, visualization and mining of the final candidates.
3.2 The Significance of Protein-Ligand Interaction Profiles in Methods of Compound Retrieval and Post Screening Analysis
Interactions between molecules (Fig. 4) are important for understanding many biological phenomena. From gene expression to enzyme reactions, the activities are dictated by molecular interactions. Because of DNA microarray success, researchers are studying the protein counterpart in greater detail [47]. Protein microarray can be used for studying a variety of
16
biological phenomena such as interactions of protein-ligand, protein–protein, antibody–antigen, protein–DNA, analysis of subunits in protein complexes, screening of target proteins expressed from phage library, analysis of mutant proteins, quantitative assay, discovery of diagnostic markers, analysis of protein expression profiles, development of diagnostic microarray and development of microarray-based lead screening system. The interactions of significance in analysis and retrieval of lead compounds for drug design are intermolecular interactions such as van der Walls forces, electrostatic forces and Hydrogen bonds interactions [39, 40]. Also called interaction energies, they can be obtained from virtual screening of docked compounds calculations [13]. The calculations of interaction energies are organized into data sets of interaction profiles (IPFs) and can be used as one of the criteria in a cluster analysis to further filter out and select more specific or the final target compounds. Thus, cluster analysis of various compounds with similar interaction energies will group the various compounds into separate clusters from which a representative is chosen usually based on RMSD values while undergoing what is termed a post screening analysis.
3.2.1 Post Screening Analysis
Methods of post screening analysis [21-23] are designed to facilitate the visualization (interpretation of binding interaction), organization (cluster and organize structures in a meaningful way), analysis (compare and profile the binding interactions of different structures) and data mining (search for structures containing key interactions or specific features) of virtually screened compounds. As mentioned earlier, binding interactions [39] (e.g. van der Walls forces, electrostatic forces and hydrogen bond interactions) of protein-ligand complexes are a critical part of mining and selecting the target representatives in post analysis methods.
Descriptions of binding interactions and interaction strength measures for protein-ligand complexes are very important for better mining of appropriate candidates from selection lists generated by VS [48]. Thorough an in-depth study of protein-ligand interactions in various post screening analysis, we attempt to develop an integrated method of VS and post screening analysis in order to speed up the screening and analysis of compounds, generate better interaction-specific information and to obtain suitable representatives. The overall details of this study are shown in Figure 5.
17
Figure 5. Methods from previous works investigated and our studies done in the designing of our TSCC method.
Bellow we investigate and compare a few pioneering methods of post screening analysis which were all originally designed to enrich virtual screening. Later in our work we will perform some comparative studies and inductive analysis which provide a foundation for expanding the use of virtual screening and post screening analysis into the mining and analysis of targets used in various other applications besides pharmaceutics.
3.2.2 Structural Interaction Fingerprint (SIFt)
SIFt [23] uses a simple, generic and robust approach for representing and analyzing 3D protein- ligand interactions. Its key feature is the generation of an interaction fingerprint that converts 3D structural binding information into a one-dimensional (1D) binary string (Fig. 6).
The fingerprint representation of the interaction patterns is compact, and allows for rapid clustering and analysis of large numbers of complexes. The SIFt is calculated on a set of input 3D protein–small molecule complexes. The protein structure may have been determined
18
experimentally by NMR or crystallography, or generated through homology modeling. The SIFt is generated by first defining the union of those residues that are in contact between the protein and the small molecule complex. The resulting panel of ligand binding site residues, which act as a mask covering all of the interactions occurring between the protein and the ligands, is then used as the common reference frame to construct the interaction fingerprints.
Figure 6. The 3D binding site of protein with an inhibitor (ligand) revealed as a sequence of positions in the binding site in contact with the ligand and their location in the structure of the protein (loop and β). Each binding site position is represented by a bitstring. The joining of all bitstrings end-to-end for each binding site residue is repeated for all ligands and is used in the selection process.
To analyse SIFTs the Tanimoto coefficient (Tc) [38] is used as the quantitative measure of bit string similarity. The Tc between two bit strings A and B is defined as:
Tc(A,B)=AIB/AUB
where is the number of ON bits common in both A and B and is the number of ON bits present in either A or B. Tanimoto coefficients between random bit strings with a length of 400 bits adopt a near-Gaussian distribution centered at approximately 0.33, with a sigma of about 0.03. This representation of interactions as fingerprints using the SIFt method enables clustering, filtering and profiling of large libraries of docking results as well as crystal structures of the protein kinase family in complexes with various inhibitors.
19
3.2.3 VISCANA (Visualized Cluster Analysis of Protein-Ligand Interaction)
VISCANA [22] (Fig. 7) is a method based on the ab Initio Fragment Molecular Orbital Method (FMO) [24] used for analysis of virtual ligand screening. The ab initio FMO method at the Hartree-Fock level is shown in the details following the method figure.
Figure 7. a) The overall approach of VISCANA (from VS to the selection of representatives).
b) The fragmentation of a polypeptide at different bonds. c) Division of biomolecules into a collection of small fragments in the molecular orbital calculations (FMO method).
First, biomolecules or molecular clusters are divided into small fragments, and the ab initio MO calculations on the fragments (monomers) under the electrostatic potential from surrounding fragment pair as seen in Fig 7b and c. This is then solved repeatedly until all monomer densities become self-consistent. Finally, through the use of the total energies of the monomer EI and the dimer EIJ, the total energy of the system E is calculated by the following equation:
20
The FMO method has the advantage of describing the charge-transfer between a receptor and a ligand in comparison to a conventional force field method using fixed atomic charges.
Based on this principle Amari et al. developed a cluster analysis using the dissimilarity defined as the squared Euclidean distance between interfragment interaction energies (IFIEs) of two ligands. VISCANA combines a clustering method with a graphical representation of the IFIEs by representing each data point with colors that quantitatively and qualitatively reflect the IFIEs.
This method classifies structurally different ligands into functionally similar clusters according to the interaction pattern of a ligand and amino acid residues of a receptor protein. VISCANA also estimates docking conformation by analyzing patterns of the receptor-ligand interactions of some conformations through the docking calculations. VISCANA could be applied not only to the FMO method but also any molecular interaction system which can provide interaction energies or other properties of interest such as charge distribution.
3.2.4 iGEMDOCK: A Graphical Environment for Recognizing Pharmacological Interactions and Virtual Screening
iGEMDOCK (Fig. 8) is an extension of the original docking tool GEMDOCK developed by Yang et el. [26] which adds a post screening analysis method to the original docking algorithm (http://gemdock.life.nctu.edu.tw/dock/igemdock.php). GEMDOCK’s two key functions for VS are used: 1) the searching algorithm [49] and 2) the scoring function [50] which is based on an empirical energy function:
ligpre pharma
bind
tot E E E
E = + +
where Ebind is the empirical binding energy, Epharma is the energy of binding site pharmacophores (hot spots), and Eligpre is a penalty value if a ligand does not satisfy the ligand preferences. Epharma and Eligpre are especially helpful in selecting active compounds from hundreds of thousands of non-active compounds by excluding ligands that violate the characteristics of known active ligands, thereby improving the selection of true positives.
21
Figure 8. The virtual screening and post screening analysis processes in iGEMDOCK
The integration of different-stage programs of VS environments into GEMDOCK constituted the emergence of iGEMDOCK for docking, virtual screening and post screening analysis of database compounds using a friendly interface. In post-screening analysis iGEMDOCK enriches the hit rate and derives pharmacological interactions from screened compounds to provide biological insights. The pharmacological interactions represent conserved interacting residues which form binding pockets with specific physico-chemical properties expressing the essential functions of the target protein.
This new algorithm provides both virtual screening and post screening analysis as well as a more detailed and complete understanding of ligand binding mechanisms which makes the study and discovery of lead compounds much easier and less time consuming than other similar post screening analyses. iGEMDOCK is based on the efficiency of GEMDOCK which was able to mine various inhibitors such as aurintricarboxylic acid tetracycline derivatives which inhibit flaviviruses [6] and influenza virus neuraminidase inhibitors [8].
3.3 Summary
Methods of post screening analysis that enhance virtual screening enrichment and retrieve target compounds more accurately are of great use and interest in current bioinformatics. In this review we summarized and compared methods of VS and post screening analysis of lead compounds which emphasize the relevance of interaction profiles in mining suitable candidates.
22
SIFt (structural interaction fingerprint) is one of the pioneer methods in post screening analysis to include interaction-specific information into the real number strings. This enables the visualization, organization, analysis and retrieval of structures containing key interactions or specific features. A combination of SIFt and ChemScore (an empirical scoring function) contributed to a modest increase in the enrichment factor (EF) which was calculated based on the ability to recover known inhibitors. The enrichment increased from 37.0 EFa (SIFt) to 42. 3 EFa (SIFt + ChemScore) [23].
VISCANA (Visualized Cluster Analysis of Protein-Ligand Interaction) uses a different approach through the FMO method. It has the advantage of describing the charge-transfer between a receptor and a ligand in comparison to a conventional force field method using fixed atomic charges. The difference between VISCANA and other conventional screening methods is that most methods choose the higher rank of a docking score on a point. In VISCANA a compound with a low docking score may belong to the same cluster that contains active compounds and the compound could be a suitable candidate. However, Amari et al. affirmed in their study VISCANA needs further development of quantum mechanical methods (the second-order Møller-Plesset perturbation theory based on the FMO method) to obtain more reliable descriptions of van der Walls interactions and hydrogen bonds which are important in determining receptor-ligand binding [22]. Other post screening studies reveal that unreliable or insufficient descriptions of important interactions account for increased numbers of false positives [48].
iGEMDOCK, an integration of VS and post screening methods is based on the original evolutionary docking algorithm GEMDOCK, currently one of the pioneer methods used for combining VS with visualizing, organizing, analysing and data mining of lead compounds. It has an advantage over SIFt and VISCANA primarily due to the attempt of eliminating two key issues: 1) if a docking tool is used for VS, which post screening analysis can complement it best and 2) if a post screening analysis method is decided, which docking tool or VS method is most suitable. The difference in the post screening approach of iGEMDOCK and other methods (VISCANA and SIFt) is the use of a module which clusters compounds based on interaction profiles and atomic compositions. Selecting representative compounds from each cluster enables the maintaining of compound diversity and reduces the number of false positives. In addition, its pharmacological scoring function can reduce the ill-effect of energy-based scoring functions
23
which often favor high molecular weight or highly-polar compounds. This improves the screening accuracy when the molecular weights of the active compounds are less than 400 Daltons (Da) [52]. Most notably, GEMDOCK, the earlier version of iGEMDOCK was used successfully to screen and identify inhibitors for influenza virus neuraminidases and flaviviruses [6, 8].
We also emphasize on the use of VS and post screening analysis in the mining of novel compounds for various other applications (e.g. industry, agriculture, cosmetics and nutritional supplements). These areas have not been getting much attention in comparison to drug design whereas certain protein-ligand complexes constitute key compounds in developing various biochemical products [29-31]. VS and post screening analysis used in computer-aided drug design reveal great potential in such applications since prospect candidates used in cosmetics and other industries may be retrieved employing interaction profiles.
Although the methods investigated in this study, SIFt, VISCANA and iGEMDOCK employ different techniques (structural interaction fingerprint, ab initio FMO method and interaction energy modules) they have one common feature; the use of protein-ligand interaction
Although the methods investigated in this study, SIFt, VISCANA and iGEMDOCK employ different techniques (structural interaction fingerprint, ab initio FMO method and interaction energy modules) they have one common feature; the use of protein-ligand interaction