Rigid-body protein-protein docking using GEMDOCK

Chapter 3 Results

3.3 Rigid-body protein-protein docking using GEMDOCK

(A) 1AVW (A:B) (B) 1FBI (LH:X) (C) 2BTF (A:P)

Figure 3. Good test cases for GEMDOCK. Hits within 2.0 Å RMSD were found

for (a) 1AVW, (b) 1FBI, (c) 2BTF. The bound receptor surface is shown. The best ranked hit is shown in blue, the original bound ligand is shown in red.

We have modified GEMDOCK for rigid-body protein-protein docking and using original empirical scoring function which works well in protein-ligand docking. The former combines both discrete and continuous global search strategies with local search strategies to speed up convergence, whereas the latter results in rapid recognition of possible protein-protein interacting conformations. We have tested on 52 bound protein complexes which are used in our training set and the results are listed on Table 6. The results show that modified GEMDOCK predicts 3 times for each complex and the performance of enzyme-inhibitor (50%) better than antibody-antigen (11%) and others (27%). Figure 3 shows that modified GEMDOCK could give us confident binding conformations in some good test cases, and RMSD of these cases are smaller than 2Å. However, the overall performance is not satisfied

22

since scoring function using here is for protein-ligand docking, fortunately, the search strategies of GEMDOCK is work for protein-protein docking. In the future, we will improve scoring function of GEMDOCK for protein-protein docking and develop soft-body protein-protein docking strategies for solving unbound-unbound protein docking problems.

Table 6. Results of protein-protein docking using GEMDOCK

Complexes ∆ASA

(Å

) R2Å

R5Å

Best RMSD

(Å)

1WEJ(LH:F) 1180 0 0 49.428

a∆ASA: change in accessible surface area (ASA) on complex formation was calculated, by using the program NACCESS.[53]

bR2Å: Number of predictions with RMSD smaller than 2 Å among 3 rounds

cR5Å: Number of predictions with RMSD smaller than 5 Å among 3 rounds

dBest RMSD: The smallest RMSD among 3 rounds

24 4 Discussion

Figure 4 shows six examples of the prediction outcome of the training set (figure 4a, 4b and 4c) and testing set (figure 4d, 4e and 4f). Predicted interface and

non-interface residues, identified by the GEM, are shown as color coded patches as follows: Red spheres = true positives (TP), actual interface residues that are predicted as such; Blue strands = true negatives (TN), non-interface residues that are predicted as such; Yellow spheres = false negatives (FN), interface residues that are misclassified as non-interface residues; Green spheres = false positives (FP), non-interface residues that are misclassified as interface residues. From the figure 4, one clearly sees that not all the interface was predicted, but that the predicted part fits the interface well.

(a) (b) (c)

(d) (e) (f)

True Positives False Positives False Negatives True Negatives

Figure 4. Prediction results of training set and testing set. The partner molecule(s)

in the bound conformation after superimposition of the corresponding molecule in the complex is represented in ribbon. (a) Prediction on 1dqj_r of the Hyhel-63 Fab, (b) prediction on 1dfj_r of the ribonuclease inhibitor, (c) prediction on 1acb_r of the α-Chymotrypsin, (d) prediction on 2cpl of the Cyclophilin a, (e) prediction on 1ctm of the Cytochrome f, (f) prediction on 1a19A of the Barstar.

(c) (d)

(a) (b)

Figure 5. Case study of limitations. (a) 1ahw_l, green : prediction area, yellow :

interface, blue : others, red block : fibronectin type III modules; (b) The target protein 1wej_l is shown in ribbons, green : prediction area, yellow : interface, blue : others, purple : heme; (c) pink : 1noc A chain, green : 1noc B chain and grey : 1nos; (d) 1pco, green : prediction area, yellow : interface, blue : others; and 1eth A chain (ribbon with pink color).

There are some limitations in the current implementation of the method. Figure 5 shows the limitations in the performance between the training set (figure 5a and figure

5b) and testing set (figure 5c and figure 5d). Figure 5a shows the structure of 1ahw_l.

Although our prediction area is far from the interface, this structure consists of two fibronectin type III modules whose hydrophobic cores merge in the domain-domain interface and our prediction is almost invariably symmetrical. Figure 5b shows the structure of 1wej_l. The prediction of our method is located nearby heme propionate, this result may due to the residues nearby the heme are more hydrophobic than protein-protein interaction site. Figure 5c shows the structures of bound protein complex : 1noc A chain and B chain and unbound protein : 1nos. After structure

26

alignment of 1noc A chain and 1nos, unfortunately, contact residues between 1nos and 1noc B chain are less than the bound protein complex, and it is difficult for our method to identify interaction site of 1nos. Figure 5d shows the structure of 1pco and 1eth A chain. The surface of 1eth A chain (colipase) can be divided into a rather hydrophilic part, interacting with 1pco (lipase), and a more hydrophobic part, formed by the tips of the fingers [60]. This suggests that interface of 1pco is more hydrophilic than the surface, and our method do not prove to be very useful in this case.

5 Conclusion

We have developed a method for predicting protein-protein binding sites using GEM. To train the GEM and to test the prediction method we collected dataset of 104 unbound proteins—the nonredundant benchmark for testing protein-protein docking algorithms. We were able to successfully predict the location of the binding site on 65.4% of the 104 proteins in training set. In addition, we tested GEM to predict 50 unbound proteins and had 46% successfully prediction in testing set. The performance were achieved using only 18 attributes so prediction results should be improved when more properties that distinguish between interfaces and the rest of the protein surface become available.

This method can be further improved on several aspects. First, we notice that hydrophilic effect may be main force of protein-protein interaction in some cases (figure 5d), and our predictions are poor. This is due to the fact that most interfaces of training set are hydrophobic and our parameters perform this characteristic faithfully.

Therefore, it may be useful to classify interfaces of training set according to hydrophobic or hydrophilic, and each protein has two predicting areas which are hydrophobic patch and hydrophilic patch. Second, sequence conservation tends to be important attribute to identify protein-protein interface [35]. Third, the effect of 2nd structure information is not very clear, therefore, we intend to understand it of our model. Finally, we will apply our approach to other data set and to study the behavior of our model. In the future, we will combine protein-protein interaction sites prediction into GEMDOCK and improve scoring function of GEMDOCK for protein-protein docking and develop soft-body protein-protein docking strategies for solving unbound-unbound protein docking problems.

28 Reference

[1] A. C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J.

Schultz, J. M. Rick, A. M. Michon, C. M. Cruciat, M. Remor, C. Hofert, M.

Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D.

Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.

A. Heurtier, R. R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M.

Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, and G.

Superti-Furga, "Functional organization of the yeast proteome by systematic analysis of protein complexes," Nature, vol. 415, pp. 141-7, 2002.

[2] L. Giot, J. S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. L. Hao, C.

E. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, H. Machineni, M.

Welsh, Y. Kong, B. Zerhusen, R. Malcolm, Z. Varrone, A. Collis, M. Minto, S.

Burgess, L. McDaniel, E. Stimpson, F. Spriggs, J. Williams, K. Neurath, N.

Ioime, M. Agee, E. Voss, K. Furtak, R. Renzulli, N. Aanensen, S. Carrolla, E.

Bickelhaupt, Y. Lazovatsky, A. DaSilva, J. Zhong, C. A. Stanyon, R. L. Finley, Jr., K. P. White, M. Braverman, T. Jarvie, S. Gold, M. Leach, J. Knight, R. A.

Shimkets, M. P. McKenna, J. Chant, and J. M. Rothberg, "A protein interaction map of Drosophila melanogaster," Science, vol. 302, pp. 1727-36, 2003.

[3] Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S.

Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C.

Alfarano, D. Dewar, Z. Lin, K. Michalickova, A. R. Willems, H. Sassi, P. A.

Nielsen, K. J. Rasmussen, J. R. Andersen, L. E. Johansen, L. H. Hansen, H.

Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B. D.

Sorensen, J. Matthiesen, R. C. Hendrickson, F. Gleeson, T. Pawson, M. F.

Moran, D. Durocher, M. Mann, C. W. Hogue, D. Figeys, and M. Tyers,

"Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry," Nature, vol. 415, pp. 180-3, 2002.

[4] S. Li, C. M. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P. O.

Vidalain, J. D. Han, A. Chesneau, T. Hao, D. S. Goldberg, N. Li, M. Martinez, J. F. Rual, P. Lamesch, L. Xu, M. Tewari, S. L. Wong, L. V. Zhang, G. F. Berriz, L. Jacotot, P. Vaglio, J. Reboul, T. Hirozane-Kishikawa, Q. Li, H. W. Gabel, A.

Elewa, B. Baumgartner, D. J. Rose, H. Yu, S. Bosak, R. Sequerra, A. Fraser, S.

E. Mango, W. M. Saxton, S. Strome, S. Van Den Heuvel, F. Piano, J.

Vandenhaute, C. Sardet, M. Gerstein, L. Doucette-Stamm, K. C. Gunsalus, J.

W. Harper, M. E. Cusick, F. P. Roth, D. E. Hill, and M. Vidal, "A map of the interactome network of the metazoan C. elegans," Science, vol. 303, pp. 540-3,

2004.

[5] P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, R. S. Judson, J. R. Knight, D.

Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B.

Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J. M. Rothberg, "A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae," Nature, vol. 403, pp. 623-7, 2000.

[6] S. Jones and J. M. Thornton, "Principles of protein-protein interactions,"

Proceedings of the National Academy of Sciences of the United States of America, vol. 93, pp. 13-20, 1996.

[7] S. Jones and J. M. Thornton, "Prediction of protein-protein interaction sites using patch analysis," Journal of Molecular Biology, vol. 272, pp. 133-143, 1997.

[8] S. Jones and J. M. Thornton, "Analysis of protein-protein interaction sites using surface patches," Journal of Molecular Biology, vol. 272, pp. 121-132, 1997.

[9] I. M. Nooren and J. M. Thornton, "Diversity of protein-protein interactions,"

EMBO Journal, vol. 22, pp. 3486-3492, 2003.

[10] I. A. Vakser and C. Aflalo, "Hydrophobic docking: a proposed enhancement to molecular recognition techniques," Proteins: Structure, Function and Genetics, vol. 20, pp. 320-329, 1994.

[11] L. Young, R. L. Jernigan, and D. G. Covell, "A role for surface hydrophobicity in protein-protein recognition," Protein Science, vol. 3, pp. 717-729, 1994.

[12] J. Fernandez-Recio, M. Totrov, and R. Abagyan, "Identification of protein-protein interaction sites from docking energy landscapes," Journal of

Molecular Biology, vol. 335, pp. 843-865, 2004.

[13] J. Fernandez-Recio, M. Totrov, C. Skorodumov, and R. Abagyan, "Optimal docking area: a new method for predicting protein-protein interaction sites,"

Proteins: Structure, Function, and Bioinformatics, vol. 58, pp. 134-143, 2005.

[14] P. Fariselli, F. Pazos, A. Valencia, and R. Casadio, "Prediction of protein--protein interaction sites in heterocomplexes with neural networks,"

European Journal of Biochemistry, vol. 269, pp. 1356-1361, 2002.

[15] M. Keil, T. E. Exner, and J. Brickmann, "Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network,"

Journal of Computational Chemistry, vol. 25, pp. 779-789, 2004.

[16] H. Neuvirth, R. Raz, and G. Schreiber, "ProMate: a structure based prediction program to identify the location of protein-protein binding sites," Journal of

Molecular Biology, vol. 338, pp. 181-199, 2004.

[17] H. X. Zhou and Y. Shan, "Prediction of protein interaction sites from sequence

30

profile and residue neighbor list," Proteins: Structure, Function and Genetics, vol. 44, pp. 336-343, 2001.

[18] J. M. Yang, "Development and evaluation of a generic evolutionary method for protein-ligand docking," Journal of Computational Chemistry, vol. 25, pp.

843-857, 2004.

[19] J. M. Yang and C. C. Chen, "GEMDOCK: a generic evolutionary method for molecular docking," Proteins: Structure, Function, and Bioinformatics, vol. 55, pp. 288-304, 2004.

[20] J. M. Yang, J. T. Horng, and C. Y. Kao, "A genetic algorithm with adaptive mutations and family competition for training neural networks," International

Journal of Neural Systems, vol. 10, pp. 333-352, 2000.

[21] J. M. Yang and C. Y. Kao, "A family competition evolutionary algorithm for automated docking of flexible ligands to proteins," IEEE Transactions on

Information Technology in Biomedicine, vol. 4, pp. 225-237, 2000.

[22] J. M. Yang and T. W. Shen, "A pharmacophore-based evolutionary approach for screening selective estrogen receptor modulators," Proteins: Structure,

Function, and Bioinformatics, vol. 59, pp. 205-220, 2005.

[23] J. M. Yang, C. H. Tsai, M. J. Hwang, H. K. Tsai, J. K. Hwang, and C. Y. Kao,

"GEM: a Gaussian Evolutionary Method for predicting protein side-chain conformations," Protein Science, vol. 11, pp. 1897-1907, 2002.

[24] W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features," Biopolymers, vol.

22, pp. 2577-2637, 1983.

[25] Q. Dong, X. Wang, L. Lin, and Y. Guan, "Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins," BMC Bioinformatics, vol. 8, pp. 147, 2007.

[26] F. Glaser, D. M. Steinberg, I. A. Vakser, and N. Ben-Tal, "Residue frequencies and pairing preferences at protein-protein interfaces," Proteins: Structure,

Function and Genetics, vol. 43, pp. 89-102, 2001.

[27] T. A. Larsen, A. J. Olson, and D. S. Goodsell, "Morphology of protein-protein interfaces," Structure, vol. 6, pp. 421-7, 1998.

[28] X. Gallet, B. Charloteaux, A. Thomas, and R. Brasseur, "A fast method to predict protein interaction sites from sequences," Journal of Molecular

Biology, vol. 302, pp. 917-26, 2000.

[29] A. Koike and T. Takagi, "Prediction of protein-protein interaction sites using support vector machines," Protein Engineering Design and Selection, vol. 17, pp. 165-73, 2004.

[30] A. P. Korn and R. M. Burnett, "Distribution and complementarity of

hydropathy in multisubunit proteins," Proteins: structure, function, and

bioinformatics, vol. 9, pp. 37-55, 1991.

[31] L. Lo Conte, C. Chothia, and J. Janin, "The atomic structure of protein-protein recognition sites," Journal of Molecular Biology, vol. 285, pp. 2177-98, 1999.

[32] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Research, vol. 25, pp.

3389-402, 1997.

[33] W. S. Valdar, "Scoring residue conservation," Proteins: structure, function,

and bioinformatics, vol. 48, pp. 227-41, 2002.

[34] J. R. Bradford and D. R. Westhead, "Asymmetric mutation rates at enzyme-inhibitor interfaces: implications for the protein-protein docking problem," Protein Science, vol. 12, pp. 2099-103, 2003.

[35] D. R. Caffrey, S. Somaroo, J. D. Hughes, J. Mintseris, and E. S. Huang, "Are protein-protein interfaces more conserved in sequence than the rest of the protein surface?" Protein Science, vol. 13, pp. 190-202, 2004.

[36] N. V. Grishin and M. A. Phillips, "The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences,"

Protein Science, vol. 3, pp. 2455-8, 1994.

[37] M. Guharoy and P. Chakrabarti, "Conservation and relative importance of residues across protein-protein interfaces," Proceedings of the National

Academy of Sciences of the United States of America, vol. 102, pp. 15447-52,

2005.

[38] A. Porollo and J. Meller, "Prediction-based fingerprints of protein-protein interactions," Proteins: structure, function, and bioinformatics, vol. 66, pp.

630-45, 2007.

[39] C. Cole and J. Warwicker, "Side-chain conformational entropy at protein-protein interfaces," Protein Science, vol. 11, pp. 2860-70, 2002.

[40] J. L. Chung, W. Wang, and P. E. Bourne, "Exploiting sequence and structure homologs to identify protein-protein binding sites," Proteins: structure,

function, and bioinformatics, vol. 62, pp. 630-40, 2006.

[41] M. C. Lawrence and P. M. Colman, "Shape complementarity at protein/protein interfaces," Journal of Molecular Biology, vol. 234, pp. 946-50, 1993.

[42] A. J. McCoy, V. Chandana Epa, and P. M. Colman, "Electrostatic complementarity at protein/protein interfaces," Journal of Molecular Biology, vol. 268, pp. 570-84, 1997.

[43] F. B. Sheinerman, R. Norel, and B. Honig, "Electrostatic aspects of protein-protein interactions," Current Opinion in Structural Biology, vol. 10,

32

pp. 153-9, 2000.

[44] D. Xu, S. L. Lin, and R. Nussinov, "Protein binding versus protein folding: the role of hydrophilic bridges in protein associations," Journal of Molecular

Biology, vol. 265, pp. 68-84, 1997.

[45] J. R. Bradford and D. R. Westhead, "Improved prediction of protein-protein binding sites using a support vector machines approach," Bioinformatics, vol.

21, pp. 1487-1494, 2005.

[46] P. Aloy, E. Querol, F. X. Aviles, and M. J. Sternberg, "Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking," Journal of Molecular Biology, vol. 311, pp.

395-408, 2001.

[47] G. Casari, C. Sander, and A. Valencia, "A method to predict functional residues in proteins," Nature Structural Biology, vol. 2, pp. 171-8, 1995.

[48] S. Madabushi, H. Yao, M. Marsh, D. M. Kristensen, A. Philippi, M. E. Sowa, and O. Lichtarge, "Structural clusters of evolutionary trace residues are statistically significant and common in proteins," Journal of Molecular

Biology, vol. 316, pp. 139-54, 2002.

[49] Y. Ofran and B. Rost, "Predicted protein-protein interaction sites from local sequence information," FEBS Letters, vol. 544, pp. 236-239, 2003.

[50] F. Pazos, M. Helmer-Citterich, G. Ausiello, and A. Valencia, "Correlated mutations contain information about protein-protein interaction," Journal of

Molecular Biology, vol. 271, pp. 511-23, 1997.

[51] B. Wang, P. Chen, D. S. Huang, J. J. Li, T. M. Lok, and M. R. Lyu, "Predicting protein interaction sites from residue spatial sequence profile and evolution rate," FEBS Letters, vol. 580, pp. 380-4, 2006.

[52] D. K. Gehlhaar, G. M. Verkhivker, P. A. Rejto, C. J. Sherman, D. B. Fogel, L. J.

Fogel, and S. T. Freer, "Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming," Chemistry and Biology, vol. 2, pp. 317-24, 1995.

[53] R. Chen, J. Mintseris, J. Janin, and Z. Weng, "A protein-protein docking benchmark," Proteins: Structure, Function and Genetics, vol. 52, pp. 88-91, 2003.

[54] C. Yan, D. Dobbs, and V. Honavar, "A two-stage classifier for identification of protein-protein interface residues," Bioinformatics, vol. 20 Suppl 1, pp.

I371-I378, 2004.

[55] B. Rost and C. Sander, "Conservation and prediction of solvent accessibility in protein families," Proteins: Structure, Function, and Genetics, vol. 20, pp.

216-226, 1994.

[56] S. Ansari and V. Helms, "Statistical analysis of predominantly transient protein-protein interfaces," Proteins: Structure, Function, and Bioinformatics, vol. 61, pp. 344-355, 2005.

[57] M. J. Sternberg, H. A. Gabb, and R. M. Jackson, "Predictive docking of protein-protein and protein-DNA complexes," Current Opinion in Structural

Biology, vol. 8, pp. 250-256, 1998.

[58] D. Eisenberg and A. D. McLachlan, "Solvation energy in protein folding and binding," Nature, vol. 319, pp. 199-203, 1986.

[59] L. Wesson and D. Eisenberg, "Atomic solvation parameters applied to molecular dynamics of proteins in solution," Protein Science, vol. 1, pp.

227-235, 1992.

[60] M. P. Egloff, L. Sarda, R. Verger, C. Cambillau, and H. van Tilbeurgh,

"Crystallographic study of the structure of colipase and of the interaction with pancreatic lipase," Protein Science, vol. 4, pp. 44-57, 1995.

在文檔中以高斯演化方式預測蛋白質-蛋白質嵌合位置 (頁 32-0)