In this work, we present a structural analysis of cellulase binding sites using a dataset of 219 enzymes which was chosen from NCBI and CSA. This dataset is nonredundant, and we selected 4 enzymes for each endo-glucanase and exo-glucanase, total 8 enzymes that have their own experimental catalytic residues data form literature. The conclusion that we analysis is that through our methods, the WCN (z- score) threshold for endo-glucanases is larger than exo-glucanases, it is reasonable that endo-glucanses need more space for hydrolyzing cellulose. Besides, the performance value of predicting binding sites let us know that we can increase the specificity values based on WCN and RSA. It means that most of binding sites are rigid than other residues and exposed although they are hydrophobic.
Based on all these characteristics with binding sites may enable people to understand more information for structure- function relationships; furthermore, it will be helpful for predicting binding sites in cellulase of unknown function from protein structures and maybe we could tell endo-glucanase and exo-glucanase by their binding sites in the further work.
REFERENCES
1. Richmond T. Higher plant cellulose synthases. Genome Biol 2000;1(4):REVIEWS3001.
2. Sulzenbacher G, Mackenzie LF, Wilson KS, Withers SG, Dupont C, Davies GJ. The crystal structure of a 2-fluorocellotriosyl complex of the Streptomyces lividans endoglucanase CelB2 at 1.2 A resolution. Biochemistry
1999;38(15):4826-4833.
3. Zhang YH, Lynd LR. Toward an aggregated understanding of enzymatic hydrolysis of cellulose: noncomplexed cellulase systems. Biotechnol Bioeng 2004;88(7):797-824.
4. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B.
The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 2009;37(Database issue):D233-238.
5. Lynd LR, Weimer PJ, van Zyl WH, Pretorius IS. Microbial cellulose utilization: fundamentals and biotechnology. Microbiol Mol Biol Rev 2002;66(3):506-577, table of contents.
6. Lu CH, Huang SW, Lai YL, Lin CP, Shih CH, Huang CC, Hsu WL, Hwang JK. On the relationship between the protein structure and protein dynamics.
Proteins-Structure Function and Bioinformatics 2008;72(2):625-634.
7. Shih CH, Huang SW, Yen SC, Lai YL, Yu SH, Hwang JK. A simple way to compute protein dynamics without a mechanical model. Proteins-Structure Function and Bioinformatics 2007;68(1):34-38.
8. Lin CP, Huang SW, Lai YL, Yen SC, Shih CH, Lu CH, Huang CC, Hwang JK.
Deriving protein dynamical properties from weighted protein contact number.
Proteins-Structure Function and Bioinformatics 2008;72(3):929-935.
9. Zou JY, Kleywegt GJ, Stahlberg J, Driguez H, Nerinckx W, Claeyssens M, Koivula A, Teerii TT, Jones TA. Crystallographic evidence for substrate ring distortion and protein conformational changes during catalysis in
cellobiohydrolase Cel6A from Trichoderma reesei. Structure with Folding &
Design 1999;7(9):1035-1045.
10. Russell RB. Detection of protein three-dimensional side-chain patterns: New examples of convergent evolution. Journal of Molecular Biology
1998;279(5):1211-1227.
11. Notenboom V, Birsan C, Nitz M, Rose DR, Warren RAJ, Withers SG.
Insights into transition state stabilization of the beta-1,4-glycosidase Cex by covalent intermediate accumulation in active site mutants. Nature Structural Biology 1998;5(9):812-818.
12. Sakon J, Irwin D, Wilson DB, Karplus PA. Structure and mechanism of endo/exocellulase E4 from Thermomonospora fusca. Nature Structural Biology 1997;4(10):810-818.
13. Davies GJ, Tolley SP, Henrissat B, Hjort C, Schulein M. Structures of
oligosaccharide-bound forms of the endoglucanase V from Humicola insolens at 1.9 angstrom resolution. Biochemistry 1995;34(49):16210-16220.
14. Spezio M, Wilson DB, Karplus PA. Crystal-Structure of the Catalytic Domain of a Thermophilic Endocellulase. Biochemistry 1993;32(38):9906-9916.
15. Grassick A, Murray PG, Thompson R, Collins CM, Byrnes L, Birrane G, Higgins TM, Tuohy MG. Three-dimensional structure of a thermostable native cellobiohydrolase, CBHIB, and molecular characterization of the cel7 gene from the filamentous fungus, Talaromyces emersonii. European Journal of Biochemistry 2004;271(22):4495-4506.
16. Divne C, Stahlberg J, Teeri TT, Jones TA. High-resolution crystal structures reveal how a cellulose chain is bound in the 50 angstrom long tunnel of cellobiohydrolase I from Trichoderma reesei. Journal of Molecular Biology 1998;275(2):309-325.
17. Beguin P, Gilkes NR, Kilburn DG, Miller RC, Oneill GP, Warren RAJ.
Cloning of Cellulase Genes. Crc Critical Reviews in Biotechnology 1987;6(2):129-162.
18. Halle B. Flexibility and packing in proteins. Proceedings of the National Academy of Sciences of the United States of America 2002;99(3):1274-1279.
19. Samanta U, Bahadur RP, Chakrabarti P. Quantifying the accessible surface area of protein residues in their local environment. Protein Engineering 2002;15(8):659-667.
20. Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families. Proteins 1994;20(3):216-226.
21. Miller S, Lesk AM, Janin J, Chothia C. The Accessible Surface-Area and Stability of Oligomeric Proteins. Nature 1987;328(6133):834-836.
TABLES
Table1. The dataset of Exo-glucanases from NCBI, CSA
PDB GH Catalytic Residues
1BVW 6 Y174 R179 D180 D226 align 1qk2
Table2. The dataset of Endo-glucanases from NCBI, CSA
PDB GH Catalytic Residues
1A3H 5 N138 E139 H200 Y202 E228 align 1bqc
1IA6 9 D56 D59 E410 align 1js4
Table3. Proteins have own catalytic residues data from literature
PDB GH Catalytic Residues
1JS4 9 D55 D58
Table4. Bindings site residues of each protein from literatures
PDB Binding Residues
1JS49 H125 W128 F205 W209 W256 D261 W313 R317 R378 Y429
1TML10 W41 Y73 D117 D155 H159 W162 S189 W231 E263 D265 A271
2ENG11 T6 R7 Y8 D10 K13 W18 A19 K21 S45 E82 S110 H119 D121 G127 G128 V129 Y147 G148 D178 N179
Endo- glucanase
2NLR12 F8 N22 W24 H65 Y66 N100 D104 W106 E120 M122 N155 S157 Q199 E203
1CEL15 N141 Y145 E217 H228 W367
Table 5.1. The measurement of all dataset
Threshold TPR FPR TP: true positive, TN: true negative, FN:
false negative, FP: false positive.
Table 5.2. The measurement of endo-glucanase
Threshold TPR FPR
Table 5.3. The measurement of exo-glucanases
Threshold TPR FPR
Table 6. Endo-glucanase 1TML. The comparison with WCN and WCN & RSA included.
1TML TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.5) 7 94 4 181 64 66
WCN (< -0.5) & RSA (≥ 0.05) 5 24 6 251 45 91
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 7. Endo-glucanase 2ENG. The comparison with WCN and WCN & RSA included.
2ENG TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.5) 10 63 10 122 50 66
WCN (< -0.5) & RSA (≥ 0.05) 9 19 11 167 47 90
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 8. Endo-glucanase 1JS4. The comparison with WCN and WCN & RSA included.
1JS4 TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.5) 4 220 6 375 40 63
WCN (< -0.5) & RSA (≥ 0.05) 3 52 7 543 30 91
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 9. Endo-glucanase 2NLR. The comparison with WCN and WCN & RSA included.
2NLR TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.5) 9 71 5 137 64 66
WCN (< -0.5) & RSA (≥ 0.05) 5 12 9 196 36 94
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 10. Exo-glucanase 1CEL. The comparison with WCN and WCN & RSA included.
1CEL TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.8) 5 96 0 333 100 78
WCN (< -0.8) & RSA (≥ 0.08) 3 19 2 410 60 96
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 11. Exo-glucanase 1QK2. The comparison with WCN and WCN & RSA included.
1QK2 TP FP FN TN Sensitivity (%) Specificity (%)
WCN (< -0.8) 9 74 5 275 64 79
WCN (< -0.8) & RSA (≥ 0.08) 8 8 6 314 57 97
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 12. Exo-glucanase 1EXP. The comparison with WCN and WCN & RSA included.
1EXP TP FP FN TN Sensitivity (%) Specificity (%)
WCN <-0.8 5 67 0 240 100 78
WCN (< -0.8) & RSA (≥ 0.08) 4 6 1 301 80 98
TP: true positive, FP: false positive, FN: false negative, TN: true negative, Sensitivity: TP/(TP+FN), Specificity: 1-(FP/(FP+TN)), All statistical measures are percentage value (%).
Table 13. The comparison of WCN with WCN include RSA
Sensitivity (%) Specificity (%)
Endo-glucanase (< -0.5) 79 78
FIGURES
Figure 1.The processive synergy mechnism of cellulose hydrolysis. (A) Cellulose
consist crystalline region and amorphous region. (B) Endo-glucanase cut at theinternal amorphous sites. (C) Exo-glucanase acts on the reducing or nonreducing ends of chains. β-glucosidases hydrolyze soluble cellodextrins and cellobiose to glucose.
(A)
(B)
Figure 2.WCN z-score distribution of literature binding site residues. The frequency
of endo-glucanase (A) binding residues colored in black compared withexo-glucanase (B) binding residues colored in white.
(A)
Figure3. The diagram of relationship TPR and FPR from top to the bottom are (A) all selected cellulase dataset, (B) endo-glucanase group, (C) exo-glucanase group.
(A)
(B)
Figure 4. (A) 1TML protein WCN model in putty form. (B) The WCN z- score distribution of protein 1TML.
(C)
(D) (E)
Figure 4. Proteins are surface form. (C) 1TML experimental binding site residues colored in red. (D) The residues under WCN threshold (< -0.5) are colored in orange.
(E) The residues selected include WCN and RSA threshold are also colored in orange.
(F)
(G) (H)
Figure 4. Proteins are cartoon form. (F) 1TML experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.5) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
(B)
Figure 5. (A) 2ENG protein WCN model in putty form. (B) The WCN z- score distribution of protein 2ENG.
(C)
(D) (E)
Figure 5. Proteins are surface form. (C) 2ENG experimental binding site residues colored in red. (D) The residues under WCN threshold (< -0.5) are colored in orange.
(E) The residues selected include WCN and RSA threshold are also colored in orange.
(F)
(G) (H)
Figure 5. Proteins are cartoon form. (F) 2ENG experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.5) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
(B)
Figure 6. (A) 1JS4 protein WCN model in putty form. (B) The WCN z- score distribution of protein 1JS4.
(C)
(D)
(E)
Figure 6. Proteins are surface form. (C) 1JS4 experimental binding site residues colored in red. (D) The residues under WCN threshold (< -0.5) are colored in orange.
(E) The residues selected include WCN and RSA threshold are also colored in orange.
(F)
(G)
(H)
Figure 6. Proteins are cartoon form. (F) 1CEL experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.5) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in
(A)
(B)
Figure 7. (A) 2NLR protein WCN model in putty form. (B) The WCN z- score distribution of protein 2NLR.
(C)
(D) (E)
Figure 7. Proteins are surface form. (C) 2NLR experimental binding site residues colored in red. (D) The residues under WCN threshold (< -0.5) are colored in orange.
(E) The residues selected include WCN and RSA threshold are also colored in orange.
(F)
(G) (H)
Figure 7. Proteins are cartoon form. (F) 2NLR experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.5) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
(B)
Figure 8. (A) 1CEL protein WCN model in putty form. (B) The WCN z- score distribution of protein 1CEL.
(C)
(D) (E)
Figure 8. Proteins are cartoon form. (C) 1CEL experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.8) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
(B)
Figure 9. (A) 1QK2 protein WCN model in putty form. (B) The WCN z- score distribution of protein 1QK2.
(C)
(D) (E)
Figure 9. Proteins are cartoon form. (C) 1QK2 experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.8) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
(B)
Figure 10. (A) 1EXP protein WCN model in putty form. (B) The WCN z- score distribution of protein 1EXP.
(C)
(D)
(E)
Figure 10. Proteins are surface form. (C) 1EXP experimental binding site residues colored in red. (D) The residues under WCN threshold (< -0.8) are colored in orange.
(E) The residues selected include WCN and RSA threshold are also colored in orange.
(F)
(G) (H)
Figure 10. Proteins are cartoon form. (C) 1EXP experimental binding site residues colored in red. (G) The residues under WCN threshold (< -0.8) are colored in orange.
(H) The residues selected include WCN and RSA threshold are also colored in orange.
(A)
Figure 11. The frequency of amino acid type in cellulase experimental binding site compared with our method of binding site prediction base on WCN including RSA.
Figures (A) to (D) show as follows, Endo-glucanases, Exo-glucanases in experimental and Endo-glucanases, Exo-glucanases based on our method.
-‐1