Author(s): Chang, WC (Chang, Wen-Chi); Lee, TY (Lee, Tzong-Yi); Shien, DM (Shien, Dray- Ming); Hsu, JBK (Hsu, Justin Bo-Kai); Horng, JT (Horng, Jorng-Tzong); Hsu, PC (Hsu, Po- Chiang); Wang, TY (Wang, Ting-Yuan); Huang, HD (Huang, Hsien-Da); Pan, RL (Pan, Rong- Long)
Title: Incorporating Support Vector Machine for Identifying Protein Tyrosine Sulfation Sites Source: JOURNAL OF COMPUTATIONAL CHEMISTRY, 30 (15): 2526-2537 NOV 30 2009 Language: English
Document Type: Article
Author Keywords: protein; sulfation; prediction
KeyWords Plus: O-SULFATION; ELECTROSTATIC POTENTIALS; MASS-
SPECTROMETRY; PREDICTION; PHOSPHORYLATION; SEQUENCES; BINDING;
RECOGNITION; RECEPTORS; PEPTIDES
Abstract: Tyrosine sulfation is a post-translational modification of many secreted and membrane-bound proteins. It governs protein-protein interactions that are involved in leukocyte adhesion, hemostasis, and chemokine signaling. However, the intrinsic feature of sulfated protein remains elusive and remains to be delineated. This investigation presents SulfoSite, which is a computational method based on a support vector machine (SVM) for predicting protein sulfotyrosine sites. The approach was developed to consider structural information such as concerning the secondary structure and solvent accessibility of amino acids that surround the sulfotyrosine sites. One hundred sixty-two experimentally verified tyrosine sulfation sites were identified using UniProtKB/SwissProt release 53.0. The results of a five-fold cross-validation evaluation suggest that the accessibility of the solvent around the sulfotyrosine sites contributes substantially to predictive accuracy. The SVM classifier can achieve an accuracy of 94.2% in fivefold cross validation when sequence positional weighted matrix (PWM) is coupled with values of the accessible surface area (ASA). The proposed method significantly outperforms previous methods for accurately predicting the location of tyrosine sulfation sites. (C) 2009 Wiley Periodicals, Inc. J Comput Chem 30: 2526-2537, 2009 Addresses: [Chang, Wen-Chi; Huang, Hsien-Da] Natl Chiao Tung Univ, Dept Biol Sci &
Technol, Hsinchu, Taiwan; [Chang, Wen-Chi; Lee, Tzong-Yi; Hsu, Justin Bo-Kai; Hsu, Po- Chiang; Wang, Ting-Yuan; Huang, Hsien-Da] Natl Chiao Tung Univ, Inst Bioinformat & Syst Biol, Hsinchu, Taiwan; [Pan, Rong-Long] Natl Tsing Hua Univ, Coll Life Sci, Inst Bioinformat &
Struct Biol, Hsinchu, Taiwan; [Shien, Dray-Ming; Horng, Jorng-Tzong] Natl Cent Univ, Dept Comp Sci & Informat Engn, Chungli 320, Taiwan; [Horng, Jorng-Tzong] Asia Univ, Dept Bioinformat, Taichung, Taiwan; [Shien, Dray-Ming] Chin Min Inst Technol, Dept Elect Engn, Miaoli, Taiwan
Reprint Address: Huang, HD, Natl Chiao Tung Univ, Dept Biol Sci & Technol, Hsinchu, Taiwan.
E-mail Address: bryan@mail.nctu.edu.tw; rlpan@life.nthu.edu.tw Funding Acknowledgement:
Funding Agency Grant Number
National Science Council of the Republic of China NSC 97-2811-B-009-001
National Research Program for Genomic Medicine (NRPGM), Taiwan
Contract/grant sponsor: National Science Council of the Republic of China; contract/grant number: NSC 97-2811-B-009-001
Contract/grant sponsor: National Research Program for Genomic Medicine (NRPGM), Taiwan
Cited References: AHMAD S, 2003, BIOINFORMATICS, V19, P1849, DOI 10.1093/bioinformatics/btg249.
AHMAD S, 2003, PROTEINS, V50, P629, DOI 10.1002/prot.10328.
ALTSCHUL SF, 1997, NUCLEIC ACIDS RES, V25, P3389.
BEISSWANGER R, 1998, P NATL ACAD SCI USA, V95, P11134.
BERNIMOULIN MP, 2003, J BIOL CHEM, V278, P37, DOI 10.1074/jbc.M204360200.
BOECKMANN B, 2003, NUCLEIC ACIDS RES, V31, P365, DOI 10.1093/nar/gkg095.
BORGES CJC, 1998, DATA MIN KNOWL DISC, V2, P121.
BRYSON K, 2005, NUCLEIC ACIDS RES S2, V33, W36, DOI 10.1093/nar/gki410.
BUNDGAARD JR, 1997, J BIOL CHEM, V272, P21700.
CHOE H, 2003, CELL, V114, P161.
DANAN LM, 2008, J AM SOC MASS SPECTR, V19, P1459, DOI 10.1016/j.jasms.2008.06.021.
DESHPANDE N, 2005, NUCLEIC ACIDS RES, V33, D233.
GAO JM, 2003, J BIOL CHEM, V278, P37902, DOI 10.1074/jbc.M306061200.
GONZALEZDIAZ H, 2005, FEBS LETT, V579, P4297, DOI 10.1016/j..febslet.2005.06.065.
GONZALEZDIAZ H, 2007, CURR TOP MED CHEM, V7, P15.
GONZALEZDIAZ H, 2007, J COMPUT CHEM, V28, P1049, DOI 10.1002/jcc.20576.
GONZALEZDIAZ H, 2007, J COMPUT CHEM, V28, P1990, DOI 10.1002/jcc.20700.
GONZALEZDIAZ H, 2008, PROTEOMICS, V8, P750.
HUANG HD, 2005, J COMPUT CHEM, V26, P1032, DOI 10.1002/jcc.20235.
HUNTER T, 1998, PHILOS T ROY SOC B, V353, P583.
KEHOE JW, 2000, CHEM BIOL, V7, R57.
LEE TY, 2006, NUCLEIC ACIDS RES, V34, D622, DOI 10.1093/nar/gkj083.
LIN HC, 2003, BIOCHEM BIOPH RES CO, V312, P1154, DOI 10.1016/j.bbrc.2003.11.047.
LIU J, 2008, AM J RESP CELL MOL, V38, P738, DOI 10.1165/rcmb.2007-0118OC.
MCGUFFIN LJ, 2000, BIOINFORMATICS, V16, P404.
MONIGATTI F, 2002, BIOINFORMATICS, V18, P769.
MONIGATTI F, 2006, BBA-PROTEINS PROTEOM, V1764, P1904, DOI 10.1016/j.bbapap.2006.07.002.
MOORE KL, 2003, J BIOL CHEM, V278, P24243, DOI 10.1074/jbc.R300008200.
ONNERFJORD P, 2004, J BIOL CHEM, V279, P26, DOI 10.1074/jbc.M308689200.
OUYANG YB, 1998, J BIOL CHEM, V273, P24770.
ROSENQUIST GL, 1993, PROTEIN SCI, V2, P215.
SCHNEIDER TD, 1990, NUCLEIC ACIDS RES, V18, P6097.
SEIBERT C, 2008, BIOPOLYMERS, V90, P459, DOI 10.1002/bip.20821.
VAPNIK V, 1995, NATURE STAT LEARNING.
VILAR S, 2008, J COMPUT CHEM, V29, P2613, DOI 10.1002/jcc.21016.
WILKINS PP, 1995, J BIOL CHEM, V270, P22677.
YU KM, 2002, ENDOCRINE, V19, P333.
YU YH, 2007, NAT METHODS, V4, P583, DOI 10.1038/NMETH1056.
ZHANG Y, 2006, J AM SOC MASS SPECTR, V17, P1282, DOI 10.1016/j.jasms.2006.05.013.
Cited Reference Count: 39 Times Cited: 0
Publisher: JOHN WILEY & SONS INC
Publisher Address: 111 RIVER ST, HOBOKEN, NJ 07030 USA ISSN: 0192-8651
DOI: 10.1002/jcc.21258
29-char Source Abbrev.: J COMPUT CHEM ISO Source Abbrev.: J. Comput. Chem.
Source Item Page Count: 12
Subject Category: Chemistry, Multidisciplinary ISI Document Delivery No.: 507QN