• 沒有找到結果。

Incorporating Support Vector Machine for Identifying Protein Tyrosine Sulfation Sites

N/A
N/A
Protected

Academic year: 2022

Share "Incorporating Support Vector Machine for Identifying Protein Tyrosine Sulfation Sites"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

Author(s): Chang, WC (Chang, Wen-Chi); Lee, TY (Lee, Tzong-Yi); Shien, DM (Shien, Dray- Ming); Hsu, JBK (Hsu, Justin Bo-Kai); Horng, JT (Horng, Jorng-Tzong); Hsu, PC (Hsu, Po- Chiang); Wang, TY (Wang, Ting-Yuan); Huang, HD (Huang, Hsien-Da); Pan, RL (Pan, Rong- Long)

Title: Incorporating Support Vector Machine for Identifying Protein Tyrosine Sulfation Sites Source: JOURNAL OF COMPUTATIONAL CHEMISTRY, 30 (15): 2526-2537 NOV 30 2009 Language: English

Document Type: Article

Author Keywords: protein; sulfation; prediction

KeyWords Plus: O-SULFATION; ELECTROSTATIC POTENTIALS; MASS-

SPECTROMETRY; PREDICTION; PHOSPHORYLATION; SEQUENCES; BINDING;

RECOGNITION; RECEPTORS; PEPTIDES

Abstract: Tyrosine sulfation is a post-translational modification of many secreted and membrane-bound proteins. It governs protein-protein interactions that are involved in leukocyte adhesion, hemostasis, and chemokine signaling. However, the intrinsic feature of sulfated protein remains elusive and remains to be delineated. This investigation presents SulfoSite, which is a computational method based on a support vector machine (SVM) for predicting protein sulfotyrosine sites. The approach was developed to consider structural information such as concerning the secondary structure and solvent accessibility of amino acids that surround the sulfotyrosine sites. One hundred sixty-two experimentally verified tyrosine sulfation sites were identified using UniProtKB/SwissProt release 53.0. The results of a five-fold cross-validation evaluation suggest that the accessibility of the solvent around the sulfotyrosine sites contributes substantially to predictive accuracy. The SVM classifier can achieve an accuracy of 94.2% in fivefold cross validation when sequence positional weighted matrix (PWM) is coupled with values of the accessible surface area (ASA). The proposed method significantly outperforms previous methods for accurately predicting the location of tyrosine sulfation sites. (C) 2009 Wiley Periodicals, Inc. J Comput Chem 30: 2526-2537, 2009 Addresses: [Chang, Wen-Chi; Huang, Hsien-Da] Natl Chiao Tung Univ, Dept Biol Sci &

Technol, Hsinchu, Taiwan; [Chang, Wen-Chi; Lee, Tzong-Yi; Hsu, Justin Bo-Kai; Hsu, Po- Chiang; Wang, Ting-Yuan; Huang, Hsien-Da] Natl Chiao Tung Univ, Inst Bioinformat & Syst Biol, Hsinchu, Taiwan; [Pan, Rong-Long] Natl Tsing Hua Univ, Coll Life Sci, Inst Bioinformat &

Struct Biol, Hsinchu, Taiwan; [Shien, Dray-Ming; Horng, Jorng-Tzong] Natl Cent Univ, Dept Comp Sci & Informat Engn, Chungli 320, Taiwan; [Horng, Jorng-Tzong] Asia Univ, Dept Bioinformat, Taichung, Taiwan; [Shien, Dray-Ming] Chin Min Inst Technol, Dept Elect Engn, Miaoli, Taiwan

Reprint Address: Huang, HD, Natl Chiao Tung Univ, Dept Biol Sci & Technol, Hsinchu, Taiwan.

(2)

E-mail Address: bryan@mail.nctu.edu.tw; rlpan@life.nthu.edu.tw Funding Acknowledgement:

Funding Agency Grant Number

National Science Council of the Republic of China NSC 97-2811-B-009-001

National Research Program for Genomic Medicine (NRPGM), Taiwan

Contract/grant sponsor: National Science Council of the Republic of China; contract/grant number: NSC 97-2811-B-009-001

Contract/grant sponsor: National Research Program for Genomic Medicine (NRPGM), Taiwan

Cited References: AHMAD S, 2003, BIOINFORMATICS, V19, P1849, DOI 10.1093/bioinformatics/btg249.

AHMAD S, 2003, PROTEINS, V50, P629, DOI 10.1002/prot.10328.

ALTSCHUL SF, 1997, NUCLEIC ACIDS RES, V25, P3389.

BEISSWANGER R, 1998, P NATL ACAD SCI USA, V95, P11134.

BERNIMOULIN MP, 2003, J BIOL CHEM, V278, P37, DOI 10.1074/jbc.M204360200.

BOECKMANN B, 2003, NUCLEIC ACIDS RES, V31, P365, DOI 10.1093/nar/gkg095.

BORGES CJC, 1998, DATA MIN KNOWL DISC, V2, P121.

BRYSON K, 2005, NUCLEIC ACIDS RES S2, V33, W36, DOI 10.1093/nar/gki410.

BUNDGAARD JR, 1997, J BIOL CHEM, V272, P21700.

CHOE H, 2003, CELL, V114, P161.

DANAN LM, 2008, J AM SOC MASS SPECTR, V19, P1459, DOI 10.1016/j.jasms.2008.06.021.

DESHPANDE N, 2005, NUCLEIC ACIDS RES, V33, D233.

GAO JM, 2003, J BIOL CHEM, V278, P37902, DOI 10.1074/jbc.M306061200.

GONZALEZDIAZ H, 2005, FEBS LETT, V579, P4297, DOI 10.1016/j..febslet.2005.06.065.

GONZALEZDIAZ H, 2007, CURR TOP MED CHEM, V7, P15.

GONZALEZDIAZ H, 2007, J COMPUT CHEM, V28, P1049, DOI 10.1002/jcc.20576.

GONZALEZDIAZ H, 2007, J COMPUT CHEM, V28, P1990, DOI 10.1002/jcc.20700.

GONZALEZDIAZ H, 2008, PROTEOMICS, V8, P750.

HUANG HD, 2005, J COMPUT CHEM, V26, P1032, DOI 10.1002/jcc.20235.

HUNTER T, 1998, PHILOS T ROY SOC B, V353, P583.

KEHOE JW, 2000, CHEM BIOL, V7, R57.

LEE TY, 2006, NUCLEIC ACIDS RES, V34, D622, DOI 10.1093/nar/gkj083.

(3)

LIN HC, 2003, BIOCHEM BIOPH RES CO, V312, P1154, DOI 10.1016/j.bbrc.2003.11.047.

LIU J, 2008, AM J RESP CELL MOL, V38, P738, DOI 10.1165/rcmb.2007-0118OC.

MCGUFFIN LJ, 2000, BIOINFORMATICS, V16, P404.

MONIGATTI F, 2002, BIOINFORMATICS, V18, P769.

MONIGATTI F, 2006, BBA-PROTEINS PROTEOM, V1764, P1904, DOI 10.1016/j.bbapap.2006.07.002.

MOORE KL, 2003, J BIOL CHEM, V278, P24243, DOI 10.1074/jbc.R300008200.

ONNERFJORD P, 2004, J BIOL CHEM, V279, P26, DOI 10.1074/jbc.M308689200.

OUYANG YB, 1998, J BIOL CHEM, V273, P24770.

ROSENQUIST GL, 1993, PROTEIN SCI, V2, P215.

SCHNEIDER TD, 1990, NUCLEIC ACIDS RES, V18, P6097.

SEIBERT C, 2008, BIOPOLYMERS, V90, P459, DOI 10.1002/bip.20821.

VAPNIK V, 1995, NATURE STAT LEARNING.

VILAR S, 2008, J COMPUT CHEM, V29, P2613, DOI 10.1002/jcc.21016.

WILKINS PP, 1995, J BIOL CHEM, V270, P22677.

YU KM, 2002, ENDOCRINE, V19, P333.

YU YH, 2007, NAT METHODS, V4, P583, DOI 10.1038/NMETH1056.

ZHANG Y, 2006, J AM SOC MASS SPECTR, V17, P1282, DOI 10.1016/j.jasms.2006.05.013.

Cited Reference Count: 39 Times Cited: 0

Publisher: JOHN WILEY & SONS INC

Publisher Address: 111 RIVER ST, HOBOKEN, NJ 07030 USA ISSN: 0192-8651

DOI: 10.1002/jcc.21258

29-char Source Abbrev.: J COMPUT CHEM ISO Source Abbrev.: J. Comput. Chem.

Source Item Page Count: 12

Subject Category: Chemistry, Multidisciplinary ISI Document Delivery No.: 507QN

參考文獻

相關文件

B3-4 DEEP LEARNING MODEL COMPRESSION BY NETWORK SLIMMING Ching-Hao Wang (王敬豪), Shih-Che Chien (簡士哲), Feng-Chia Chang (張峰嘉), and Wen-Huang Cheng (鄭文皇). B3-5

--coexistence between d+i d singlet and p+ip-wave triplet superconductivity --coexistence between helical and choral Majorana

Keywords Support vector machine · ε-insensitive loss function · ε-smooth support vector regression · Smoothing Newton algorithm..

support vector machine, ε-insensitive loss function, ε-smooth support vector regression, smoothing Newton algorithm..

Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun Yang, Byung- Gon Chun,.. Ling Huang, Petros Maniatis, Mayur Naik,

 Sequence-to-sequence learning: both input and output are both sequences with different lengths..

• Zhen Yang, Wei Chen, Feng Wang, Bo Xu, „Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets“, arXiv 2017. • Supervised

• Softmax Layer: Classifier Convolutional.. Layer