• 沒有找到結果。

In our study, we have proposed a hybrid method using SVM in conjunction with the PSSM features for prediction of DNA-binding sites in proteins from amino acid sequences by achieving high accuracy for novel proteins. Using the same PSSM features, simulation results show that our method SVM-PSSM is better than fuzzy k-NN method and much better than the existing neural network based method in terms of net prediction (NP) accuracy by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Although previous researches proposed that amino acids physico-chemical properties such as ASA, electric charge, and hydropathy are related to DNA-proteins interactions, when using PSSM combines these physico-chemical properties as features, they only keep the original performance. It seems that the further well design of combining PSSM and physico-chemical properties features are needed to enhance the performance.

To best of our knowledge, up to now, the proposed method is the most effective method for recognizing mechanism of binding residues in proteins based on protein sequence without using 3D structural information, such as hydrogen bond, hydrophobic, hydrophilic, ion interaction, etc. By adjusting the cut-off value of the SVM classifier, the proposed prediction method would be helpful to biologist for filtering novel proteins without significant homology with known protein to find out the potential binding regions in proteins.

References

Ahmad, S., Gromiha, M.M., Sarai, A., 2004. Analysis and prediction of DNA-binding proteins and their binding residues based on position, sequence and structural information. Bioinformatics 20, 477-486.

Ahmad, S., Sarai, A., 2004. Moment-based Prediction of DNA-binding Proteins. Journal of Molecular Biology 341, 65-71.

Ahmad, S., Sarai, A., 2005. PSSM-based prediction of DNA-binding sites in proteins. BMC Bioinformatics 6, 33.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389-3402.

Ansari, A.Z., Mapp, A.K., 2002. Modular design of artificial transcription factors. Current Opinion in Chemical Biology 6, 765-772.

Bhardwaj, N., Langlois, R.E., Zhao, G.J., Lu, H., 2005. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Research 33, 6486-6493.

Blancafort, P., Segal, D.J., Barbas, C.F., 2004. Designing Transcription Factor Architectures for Drug Discovery. Molecular Pharmacology 66, 1361-1371.

Burges, C.J.C., 1998. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121-167.

Chang, C.C., Lin, C.J. 2003. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Chen, Y.C., Hwang, J.K., 2005. Prediction of disulfide connectivity from protein sequences.

PROTEINS: Structure, Function, and Bioinformatics 61, 507-512.

Cheng, A.C., Chen, W.W., Fuhrmann, C.N., Frankel, A.D., 2003. Recognition of Nucleic Acid Bases and Base-pairs by Hydrogen Bonding to Amino Acid Side-chains. Journal of

Molecular Biology 327, 781-796.

Chothia, C., 1976. The nature of the accessible and buried surfaces in proteins. Journal of Molecular Biology 105, 1-12.

Cortes, C., and Vapnik, V. 1995. Support-vector network. Machine Learning 20, 273-297.

Davis, I.W., Murray, L.W., Richardson, J.S., Richardson, D.C., 2004. MolProbity: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Research 32, W615-W619.

Frishman, D., Mewes, H.W., 1997. PEDANTic genome analysis. Trends in Genetics 13, 415-416.

Gunther, S., Rother, K., Frommel, C., 2006. Molecular flexibility in protein-DNA interactions.

Biosystems (in press)

Guo, J., Chen, H., Sun, Z.R., Lin, Y.L., 2004. A Novel Method for Protein Secondary Structure Prediction Using Dual-Layer SVM and Profiles. PROTEINS: Structure, Function, and Bioinformatics 54, 738-743.

Huang,Y., Li,Y., 2004. Prediction of protein subcellular locations using fuzzy k-NN method.

Bioinformatics 20, 21-28.

Kel, A.E., Gossling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., Wingender, E., 2003. MATCH: a tool for searching transcription factor binding sites in DNA sequences.

Nucleic Acids Research 31, 3576-3579.

Keller, J.M., Gray, M.R., Givens, J.A, 1985. A fuzzy k-nearest neighbor algorithm. IEEE Transaction on Systems, Man and Cybernetics 15, 580-585.

Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157, 105-132.

Lejeune, D., Delsaux, N., Charloteaux, B., Thomas, A., Brasseur, R., 2005. Protein-Nucleic Acid Recognition: Statistical Analysis of Atomic Interactions and Influence of DNA

Structure. PROTEINS: Structure, Function, and Bioinformatics 61, 258-271.

Luscombe, N.M., Austin, S.E., Berman, H.M., Thornton, J.M., 2000. An overview of the structures of protein-DNA complexes. Genome Biology 1, 1-10.

Luscombe, N.M., Thornton, J.M., 2002. Protein-DNA Interactions: Amino Acid Conservation and the Effects of Mutations on Binding Specificity. Journal of Molecular Biology 320, 991-1009.

Nadassy, K., Wodak, S.J., Janin, J., 1999. Structural Features of Protein-Nucleic Acid Recognition Sites. Biochemistry 38, 1999-2017.

Natt, N.K., Kaur, H., Raghava, G.P.S., 2004. Prediction of transmembrane regions of β-barrel proteins using ANN- and SVM-based methods. PROTEINS: Structure, Function, and Bioinformatics 56, 11-18.

Nguyen, M.N., Rajapakse, J.C., 2005. Prediction of Protein Relative Solvent Accessibility With a Two-Stage SVM Approach. PROTEINS: Structure, Function, and Bioinformatics 59, 30-37.

O’Flanagan, R.A., Paillard, G., Lavery, R., Sengupta, A.M., 2005. Non-additivity in protein-DNA-binding. Bioinformatics 21, 2254-2263.

Pabo, C.O., Nekludova, L., 2000. Geometric Analysis and Comparison of Protein-DNA Interfaces: Why is there no Simple Code for Recognition?. Journal of Molecular Biology 301. 597-624.

Paul, T.K., Iba, H., 2006. Gene selection for classification of cancers using probabilistic model building genetic algorithm. Biosystems 82, 208-205.

Pudimat, R., Schukat-Talamazzini, E.G., Backofen, R., 2005. A multiple-feature framework for modeling and predicting transcription factor binding sites. Bioinformatics 21, 3082-3088.

Sarai, A., Kono, H., 2005. Protein-DNA Recognition Patterns and Predictions. Annual Review

of Biophysics and Bimolecular Structure 34, 379-398.

Sim, J., Kim, S.Y., Lee, J. 2005. Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics 21, 2844-2849.

Segal, D.J., Barbas, C.F., 2001. Custom DNA-binding proteins come of age: polydactyl zinc-finger proteins. Current Opinion in Biotechnology 12, 632-637.

Stawiski, E.W., Gregoret, L.M., Mandel-Gutfreund, Y., 2003. Annotating Nucleic Acid-Binding Function Based on Protein Structure. Journal of Molecular Biology 326, 1065-1079.

Vapnik, V., 1995. The nature of statistical learning theory. Springer-Verlag, New York.

Wang, G.L., Dunbrack, R.L., 2003. PISCES: a protein sequence culling server. Bioinformatics 19, 1589-1591.

Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F., 2000. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research 28, 316-319.

Yaghmai, R., Cutting, G.R., 2002. Optimized Regulation of Gene Expression Using Artificial Transcription Factors. Molecular Therapy 5, 686-694.

Zimmerman, J.M., Eliezer, N., Simha, R., 1968. The characterization of amino acid sequences in proteins by statistical methods. Journal of theoretical biology 21, 170-201.

相關文件