(PS)
2
: protein structure prediction server version 3.0
Tsun-Tsao Huang
1,2, Jenn-Kang Hwang
1,2,3,4, Chu-Huang Chen
5,6,7,8, Chih-Sheng Chu
6,9,
Chi-Wen Lee
2,3and Chih-Chieh Chen
6,10,11,*1Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 30068, Taiwan,2Center for
Bioinformatics Research, National Chiao Tung University, Hsinchu 30068, Taiwan,3Department of Biological Science
and Technology, National Chiao Tung University, Hsinchu 30068, Taiwan,4Department of Bioinformatics and Medical
Engineering, Asia University, Taichung 41354, Taiwan,5Department of Vascular and Medicinal Research, Texas
Heart Institute, Houston, TX 77030, USA,6Center for Lipid Biosciences, Kaohsiung Medical University Hospital,
Kaohsiung Medical University, Kaohsiung 80708, Taiwan,7L5 Research Center, China Medical University Hospital,
China Medical University, Taichung 40402, Taiwan,8Section of Cardiovascular Research, Department of Medicine,
Baylor College of Medicine, Houston, TX 77030, USA,9Department of Internal Medicine, Kaohsiung Medical
University Hospital, Kaohsiung Medical University, Kaohsiung 80708, Taiwan,10Center for Lipid and Glycomedicine
Research, Kaohsiung Medical University, Kaohsiung 80708, Taiwan and11Institute of Medical Science and
Technology, National Sun Yat-sen University, Kaohsiung 80424, Taiwan Received February 25, 2015; Revised April 10, 2015; Accepted April 24, 2015
ABSTRACT
Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecu-lar basis of protein function. Here, our updated (PS)2
web server predicts the three-dimensional structures of protein complexes based on comparative mod-eling; furthermore, this server examines the cou-pling between subunits of the predicted complex by combining structural and evolutionary considera-tions. The predicted complex structure could be in-dicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the pack-ing contribution of other subunits cause the differ-ences in similarities between structural and evolu-tionary profiles, and these differences imply which form, complex or monomeric, is preferred in the bi-ological condition for the subunit. We believe that the (PS)2server would be a useful tool for biologists
who are interested not only in the structures of pro-tein complexes but also in the coupling between sub-units of the complexes. The (PS)2is freely available athttp://ps2v3.life.nctu.edu.tw/.
INTRODUCTION
Proteins are important molecules involved in almost all bio-logical processes. The analysis of protein three-dimensional
(3D) structures plays important role in understanding the molecular basis of their functions. Comparative modeling is one of the most important computational approaches to predict structures from amino acid sequences. Our
previ-ous comparative modeling web tools, (PS)2(1,2), have been
widely used by many studies for proteomic research and have been cited in various journals such as The American Journal of Human Genetics (3), Human mutation (4,5), BBA Reviews on Cancer (6), Journal of Clinical Microbiology
(7). Due to many fundamental cellular processes are
medi-ated by protein–protein interactions, the prediction of tein complex structure is more and more important in pro-teomic studies. Indeed, many comparative modeling tools
have been published to build complex structures (8–15).
However, the predicted complex structures emerge a rele-vant question: whether individual subunit itself is enough to perform biological function or the whole complex struc-ture is necessary in biological condition. Recently, Chang et al. addressed this issue only through comparing pack-ing density and sequence conservation profiles, where only structure coordinates and sequence homologs are required
to obtain these two properties (16).
Previous studies have shown that packing density is
cor-related with the sequence conservation (17–20).
Further-more, Shih et al. showed that the reciprocal value of pack-ing density is proportional to the sequence conservation at
residue level (20) and this linear relationship could be
ex-plained by a mechanistic model based on statistical physics
(21). Using various methods to calculate packing density
and sequence conservation, Yeh et al. found that the high-est correlation between these two properties occurs when the packing density is estimated by weighted contact num-*To whom correspondence should be addressed. Tel: +886 3 5712121 (Ext 56921); Fax: +886 3 5729288; Email: [email protected]
C
The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
at UNIVERSITY OF ARIZONA on July 10, 2015
http://nar.oxfordjournals.org/
ber (WCN) (22) as well as the sequence conservation is
estimated by ConSurf (23,24). In order to obtain
posi-tive correlation between packing density and ConSurf pro-files, the reciprocal WCN (rWCN) is used in these studies
(16,18,20,25). For the given 3D structures of protein
com-plexes, there are two ways to compute rWCN profile for each protein subunit: one ignores the packing contribution of the other subunits of the complex (denoted by rWCN I) and the other considers packing contribution of the other subunits (denoted by rWCN II). Chang et al. showed that rWCN I better agrees with sequence conservation for the set of enzymes whose active sites are located in individ-ual subunits; in contrast, rWCN II better agrees with se-quence conservation for the set of enzymes whose active
sites comprise catalytic residues from multiple subunits (16).
These findings imply that an individual subunit might be enough to perform biological function when its rWCN I profile better agrees with sequence conservation profile; on the contrary, complex form is preferred in biological condi-tion when rWCN II profile better agrees with sequence con-servation profile. As a consequence, Chang et al. suggested that comparing packing density and conservation profiles is a novel way of looking at couplings between subunits of a complex by comparing structural and sequence
conserva-tion profiles (16).
Here, we report the updated (PS)2 server that
automat-ically predicts protein complex structures from query se-quences and shows the coupling between subunits of pre-dicted complex structures based on the findings of Chang et al. The updated (PS)2accepts protein sequence in FASTA format and tries to build homologous 3D structures using
(PS)2 methodology (1,2), which was previously developed
by one of the authors and is based on effective consen-sus strategy in both template selection and target-template alignment to build homologous structures; afterward, the
updated (PS)2 provides the packing density profiles
calcu-lated by rWCN I and rWCN II (16) for the predicted
struc-ture, as well as provides the sequence conservation profile
calculated by ConSurf (23,24). The correlation coefficients
between these structure-derived and sequence-derived pro-files give a clue about the degree of coupling among the sub-units of the predicted complex structure. Finally, the pre-dicted tertiary structure and the structure-sequence profile comparisons can be visualized and indicated by Java-based 3D graphics viewers. This web server is freely available at http://ps2v3.life.nctu.edu.tw.
MATERIALS AND METHODS
The workflow of (PS)2is schematically shown in Figure1.
One of the new features of the updated (PS)2server is 3D
structure prediction for protein complexes. Users can en-ter two query protein sequences, then the server models 3D
complex structure through the (PS)2 homology modeling
strategies (1,2). The complex template dataset, consisting
of 56547 3D structures, contains all available protein com-plexes in Protein Data Bank. With the predicted 3D struc-ture and the query sequences, the packing density and the sequence conservation of each residue are respectively
cal-culated by WCN (22) and ConSurf (24).
Figure 1. The schematic workflow of (PS)2version 3.0.
For a monomeric structure, the WCN of the ith residue
is WC Ni=
N
i= j d12
i j
, the summation of the square of
in-versed C␣–C␣ distances to the other residues (22). In order
to compare with the conservation score calculated by
Con-Surf (24), the reciprocal value of WCN (rWCN) is used. For
a dimeric structure, rWCN could be calculated by two ways
(16): rWCN I only considers residues in the same subunit
(chain), while rWCN II considers residues in the whole com-plex. In the result page, profiles of rWCN I, rWCN II and
ConSurf are z-score normalized by z= xσ− ¯x
x , where ¯x andσx
are the mean and the standard deviation of x, respectively.
The new (PS)2 server is built on the basis of the WCN,
ConSurf and the original (PS)2 systems, where the
MOD-ELLER is upgraded to version 9v8 (26) and Perl scripts
are used to integrate the prediction pipeline. The web page is constructed by HTML and PHP. This web server runs on a Linux system with 2.40 GHz Intel Xeon
proces-sors consisting of 24 cores. The OpenAstexViewer (http:
//openastexviewer.net) was used for visualization of the pre-dicted models.
WEB SERVER Input format
The (PS)2web server accepts two types of input (Figure2A):
users could paste protein sequence or upload a sequence file in FASTA format. Furthermore, there are three ways for template selection: (i) our web server automatically se-lects templates based on the E-values from sequence align-ments, (ii) users could assign a PDB structure and choose which chains would be used as template, (iii) users could
at UNIVERSITY OF ARIZONA on July 10, 2015
http://nar.oxfordjournals.org/
Figure 2. The features of the (PS)2web server: (A) Input: users can enter or upload protein sequences in FASTA format. (B) Output: users can view the predicted structure in 3D graphics viewer (top-right panel) as well as get the structural and sequence properties for each residue (top-left panel) and can compare the sequence conservation and packing density profiles chain-by-chain (bottom panel).
load their own 3D structure as template. It should be noted that, for the ‘prediction for complex’, the entered specific template must be a complex structure and the input chain IDs, ‘Chain1’ and ‘Chain2’, must be different. For a protein with about 400 amino acids, it costs about 10 min for the template-based structure modeling, packing density calcu-lation with and without the consideration of other subunits, but it costs about 30 min for sequence conservation calcula-tion. For a longer sequence, it may take more than one hour in run time because of the time-consuming calculation for sequence conservation. Therefore, users are encouraged to enter their e-mail addresses so that notification will be sent by e-mail when the submitted job is finished.
Output format
After the prediction, the updated (PS)2 server will return
two types of results (Figure2B): (i) if the 3D structure being
successfully built, the visualization of its predicted complex structure (Display Structure) will be shown in the right re-gion of the top panel. (ii) The sequence conservation and structure packing density of each residue will be shown in the left region of the top panel, including rWCN I, rWCN II and ConSurf. The server integrates these two types of results for users to easily analyze and view 3D structures and ma-nipulate their orientations in space. If clicked on the check box of the (Label) column of the top-left panel, the selected residues will be labeled in 3D graphics viewer of the top-right panel. The different structure display modes (e.g. Car-toon, Lines, Spheres and Surface) can be visualized together or individually in the 3D graphics viewer for easy analysis.
The (PS)2server also allows user to download the predicted
structure coordinates in the PDB format.
at UNIVERSITY OF ARIZONA on July 10, 2015
http://nar.oxfordjournals.org/
Figure 3. The predicted complex structure of alanine racemase
(ALR BACPS) by (PS)2 server. Structure models of ALR BACPS complex are built using sequence homolog (PDB ID: 1BD0, chains A and B) as their templates. The active-site residues Lys41 and Tyr266 are shown in spheres mode.
EXAMPLE ANALYSIS
The alanine racemase (gene name: ALR BACPS, UniProt ID: Q9S5V6) from Bacillus psychrosaccharolyticus catalyzes the alanine racemization between the L and D forms. Based
on the phenylhydrazine method of Wada and Snell (27),
the study of Inagaki et al. indicated that alanine racemases
are dimeric complexes (28). In order to build ALR BACPS
complex model, the (PS)2 server automatically selects
the alanine racemase from Geobacillus stearothermophilus
(PDB ID: 1BD0, chains A and B) (29) as the template whose
sequences are most similar to the query sequence (about 58%) for the whole PDB database. The complex structure
predicted by (PS)2 server is shown in the top-right panel
of Figure2B, while the packing density (both rWCN I and
rWCN II) as well as sequence conservation (ConSurf) of each residue are shown in the top-left panel. In order to compare structural and sequence profiles for each subunit, the profiles of rWCN I, rWCN II and ConSurf are shown in the bottom panel chain-by-chain. For each individual chain, the Pearson’s correlation coefficients for rWCN I-ConSurf and rWCN II-I-ConSurf are shown in the bottom of the profile comparison figure. Take the profiles of chain A as example, it is clear that rWCN II have better agree-ment with ConSurf (0.769) than rWCN I (0.498) and this is
the same for chain B (0.759> 0.486). Based on the findings
of Chang et al. (16) and the profile comparisons results, we
infer that alanine racemases are dimers in biological condi-tion. This inference is consistent with the experimental
re-sults from Inagaki et al. (28).
Figure 3 shows the predicted complex structure of
ALR BACPS by the (PS)2server. As a result, the left active
site consists of Lys41 from chain A and Tyr266 from chain
B and so the right active site did it too. Since each active site of alanine racemase consisting of residues from differ-ent chains, dimer form is necessary to perform the function. This is consistent with the inference that alanine racemase is
dimer based on the method proposed by Chang et al. (16).
In summary, this example shows that our (PS)2 server not
only correctly predicts the protein complex structure but also has the potential to predict biological functional unit. CONCLUSION
Here we present an updated (PS)2 web server for
predict-ing complex structures and analyzpredict-ing these structures by
integrating ConSurf, WCN and (PS)2 systems. One of the
unique features of (PS)2is the integration of packing
den-sity and sequence conservation analysis into complex struc-ture prediction, and this integration predicts strucstruc-tures of protein complexes as well as points out that the biological functional unit should be monomeric or complex form. The
example demonstrates that the updated (PS)2 server is
ef-fective for complex structure prediction and provides a new way to look at the coupling between subunits. We believe
that the (PS)2server will be useful to general biologists.
ACKNOWLEDGEMENT
We are grateful for both the hardware and software support provided from the Center for Bioinformatics Research, Na-tion Chiao Tung University, Taiwan, Center for Lipid Bio-sciences, Kaohsiung Medical University Hospital, Taiwan and Center for Lipid and Glycomedicine Research, Kaoh-siung Medical University, Taiwan.
FUNDING
Academic Summit Program of Ministry of Science and
Technology [MOST-103-2321-B-009-002]; ’Center for
Bioinformatics Research of Aiming for the Top University Program’ of the National Chiao Tung University and Min-istry of Education, Taiwan; NSYSU-KMU joint research project, [NSYSUKMU 104-P027]. Funding for open access charge: ’Center for Bioinformatics Research of Aiming for the Top University Program’ of the National Chiao Tung University and Ministry of Education, Taiwan.
Conflict of interest statement. None declared.
REFERENCES
1. Chen,C.C., Hwang,J.K. and Yang,J.M. (2006) (PS)2: protein structure prediction server. Nucleic Acids Res., 34, W152–W157. 2. Chen,C.C., Hwang,J.K. and Yang,J.M. (2009) (PS)2-v2:
template-based protein structure prediction server. BMC
Bioinformatics, 10, 366.
3. Weber,S., Thiele,H., Mir,S., Toliat,M.R., Sozeri,B., Reutter,H., Draaken,M., Ludwig,M., Altmuller,J., Frommolt,P. et al. (2011) Muscarinic acetylcholine receptor M3 mutation causes urinary bladder disease and a prune-belly-like syndrome. Am. J. Hum. Genet.,
89, 668–674.
4. Soltys,D.T., Rocha,C.R., Lerner,L.K., de Souza,T.A., Munford,V., Cabral,F., Nardo,T., Stefanini,M., Sarasin,A., Cabral-Neto,J.B. et al. (2013) Novel XPG (ERCC5) mutations affect DNA repair and cell survival after ultraviolet but not oxidative stress. Hum. Mutat., 34, 481–489.
at UNIVERSITY OF ARIZONA on July 10, 2015
http://nar.oxfordjournals.org/
5. Kuo,Y.C., Lin,Y.H., Chen,H.I., Wang,Y.Y., Chiou,Y.W., Lin,H.H., Pan,H.A., Wu,C.M., Su,S.M., Hsu,C.C. et al. (2012) SEPT12 mutations cause male infertility with defective sperm annulus. Hum.
Mutat., 33, 710–719.
6. Friedman,R., Boye,K. and Flatmark,K. (2013) Molecular modelling and simulations in cancer research. Biochim. Biophys. Acta, 1836, 1–14.
7. Huang,S.W., Hsu,Y.W., Smith,D.J., Kiang,D., Tsai,H.P., Lin,K.H., Wang,S.M., Liu,C.C., Su,I.J. and Wang,J.R. (2009) Reemergence of enterovirus 71 in 2008 in taiwan: dynamics of genetic and antigenic evolution from 1998 to 2008. J. Clin. Microbiol., 47, 3653–3662. 8. Mosca,R., Ceol,A. and Aloy,P. (2013) Interactome3D: adding
structural details to protein networks. Nat. Methods, 10, 47–53. 9. Mukherjee,S. and Zhang,Y. (2011) Protein-protein complex structure
predictions by multimeric threading and template recombination.
Structure, 19, 955–966.
10. Guerler,A., Govindarajoo,B. and Zhang,Y. (2013) Mapping monomeric threading to protein-protein structure prediction. J.
Chem. Inf. Model., 53, 717–725.
11. Szilagyi,A. and Zhang,Y. (2014) Template-based structure modeling of protein-protein interactions. Curr. Opin. Struct. Biol., 24, 10–23. 12. Lu,L., Lu,H. and Skolnick,J. (2002) MULTIPROSPECTOR: an
algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins, 49, 350–364.
13. Baspinar,A., Cukuroglu,E., Nussinov,R., Keskin,O. and Gursoy,A. (2014) PRISM: a web server and repository for prediction of protein-protein interactions and modeling their 3D complexes.
Nucleic Acids Res., 42, W285–W289.
14. Kundrotas,P.J., Lensink,M.F. and Alexov,E. (2008) Homology-based modeling of 3D structures of protein-protein complexes using alignments of modified sequence profiles. Int. J. Biol. Macromol., 43, 198–208.
15. Fukuhara,N. and Kawabata,T. (2008) HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures. Nucleic Acids Res., 36, W185–W189. 16. Chang,C.M., Huang,Y.W., Shih,C.H. and Hwang,J.K. (2013) On the
relationship between the sequence conservation and the packing density profiles of the protein complexes. Proteins, 81, 1192–1199. 17. Franzosa,E.A. and Xia,Y. (2009) Structural determinants of protein
evolution are context-sensitive at the residue level. Mol. Biol. Evol.,
26, 2387–2395.
18. Yeh,S.W., Liu,J.W., Yu,S.H., Shih,C.H., Hwang,J.K. and Echave,J. (2014) Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol. Biol. Evol., 31, 135–139.
19. Liao,H., Yeh,W., Chiang,D., Jernigan,R.L. and Lustig,B. (2005) Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Eng. Des. Sel., 18, 59–64.
20. Shih,C.H., Chang,C.M., Lin,Y.S., Lo,W.C. and Hwang,J.K. (2012) Evolutionary information hidden in a single protein structure.
Proteins, 80, 1647–1657.
21. Huang,T.T., del Valle Marcos,M.L., Hwang,J.K. and Echave,J. (2014) A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility. BMC Evol. Biol., 14, 78.
22. Lin,C.P., Huang,S.W., Lai,Y.L., Yen,S.C., Shih,C.H., Lu,C.H., Huang,C.C. and Hwang,J.K. (2008) Deriving protein dynamical properties from weighted protein contact number. Proteins, 72, 929–935.
23. Goldenberg,O., Erez,E., Nimrod,G. and Ben-Tal,N. (2009) The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures. Nucleic Acids Res., 37, D323–D327.
24. Ashkenazy,H., Erez,E., Martz,E., Pupko,T. and Ben-Tal,N. (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res., 38, W529–W533.
25. Yeh,S.W., Huang,T.T., Liu,J.W., Yu,S.H., Shih,C.H., Hwang,J.K. and Echave,J. (2014) Local packing density is the main structural determinant of the rate of protein sequence evolution at site level.
Biomed. Res. Int., 2014, 572409.
26. Webb,B. and Sali,A. (2014) Comparative protein structure modeling using MODELLER. Curr. Protoc. Bioinformatics, 47, 5.6.1–5.6.32. 27. Wada,H. and Snell,E.E. (1961) The enzymatic oxidation of
pyridoxine and pyridoxamine phosphates. J. Biol. Chem., 236, 2089–2095.
28. Inagaki,K., Tanizawa,K., Badet,B., Walsh,C.T., Tanaka,H. and Soda,K. (1986) Thermostable alanine racemase from Bacillus stearothermophilus: molecular cloning of the gene, enzyme purification, and characterization. Biochemistry, 25, 3268–3274. 29. Stamper,G.F., Morollo,A.A. and Ringe,D. (1998) Reaction of alanine
racemase with 1-aminoethylphosphonic acid forms a stable external aldimine. Biochemistry, 37, 10438–10445.
at UNIVERSITY OF ARIZONA on July 10, 2015
http://nar.oxfordjournals.org/