Materials and Methods - 結構生物資訊－雙硫鍵資料庫之建立與分析工具之發展(II)

Non-redundant structural database

A non-redundant structure database, NRPDB, has been used through out this work.

NRPDB is a subset of PDB containing non-redundant proteins with homologues filtered out by sequence comparison using BLAST (Altschul et al., 1990). NRPDB is compiled by National Center for Biotechnology and Information (NCBI) and can be found at http://www.ncbi.nlm.nih.gov/Structure/VAST/nrpdb.html. The size of NRPDB varies with different cut-off values; the smallest one contains 2288 PDB entries, whereas the largest one contains 10212 entries. The statistics are obtained from the Oct. 3, 2001 release of NRPDB. There are no significant differences in using databases with different sizes.

Entropy calculation

For each residue xi in a specific amino acid sequence x of given lengths l , there is an associated distribution of structural classes p p



₁, p₂,, p_



, where p_i is the

probability of finding the i^th structure class in the set composed of these amino acid sequences, and  is the total number of structure classes. The structural information

content or entropy of xi is calculated using the following equation (Shannon, 1948):

S x

 

_i   p_jln p_j





If there is only one structure available for the amino acid sequence x, the value S x

 

is zero. On the other hand, if the structures of the sequence are evenly distributed over all classes, we will obtain a maximum value of entropy, ln . The average of the

entropy of each residual position gives an estimation of S x

 

. We define the relative entropy of a sequence by

S(x)  S(x)  S⁰ x ,

where S⁰ x is the reference entropy of x. The reference entropy is calculated using the probabilities of the structural classes in the database queried. For example, the reference entropy for secondary structure elements is calculated using the

probabilities of finding each of the secondary structural classes in NRPDB. The value of S x

 

gives the relative measure of structural information content, or structure

conservation, of sequence x . Using S x

 

, we compute the entropy profile by

 

S   S  S x

   



where 

 

y  1, for y0 and 

 

y 0, for y0. The function g S gives the distribution of structure entropies of the amino acid sequences in a data set.

Structure representations

Since we do not have a unique way to describe the local conformations of the sequences, we tried three structure representations in this work: the secondary

structural elements, the backbone torsional angles, and the virtual angles, referred to as  angles, defined by three alternative C atoms. In the secondary structure

representation, we use the DSSP method (Kabsch & Sander, 1983) to assign secondary structural elements. DSSP defines eight secondary structural elements, seven of which are based on hydrogen bonding patterns between amino acid residues,

and one is for the undefined structures excluded from the previous types. In the backbone torsional angle representation, we use the usual  and  torsional angles

used in the Ramachandran plot. As a third structure representation, we employ the virtual angles, referred to as  angles, to represent the backbone conformation. The

angle of residue i is a virtual angle defined by three alternate C atoms of residues

i2, i and i 2. The -angle has been used in structure search and comparison (D.

Chang and J.-K. Hwang, unpublished results). The structure entropies of these representations are referred to as S_ss, S_ and S_{ }, respectively.

Structure entropy along a protein chain

For a given protein chain, we may calculate the structure entropy of the fragments along the sequence. We have applied a fix width sliding window to the sequence, and compute the structure entropy of each fragment scanned by this sliding window. The entropy values computed are then assigned to the central residues of the fragments.

For a protein chain, we will have a profile of the structure entropy along the chain.

Since the structure entropy could be used to indicate the intrinsic structure variation of the local conformations formed by peptide fragments, this profile may provide

indications to regions in the sequence where the structure formed early in the folding process or regions that have high tolerance to mutagenesis.

Acknowledgement

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science 181, 223-230.

Argos, P. (1987). Analysis of Sequence-similar Pentapeptides in Unrelated Protein Tertiary Structures Strategies for Protein Folding and a Guide for Site-directed Mutagenesis. J. Mol. Biol. 197, 331-348.

Baumann, B. E., Rould, M. A., Pabo, C. O. & Sauer, R. T. (1994). DNA recognition by beta-sheets in the Arc repressor-operator crystal structure. Nature 367, 754-757.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic

Acids Res. 28, 235-242.

Blanco, F. J. & Serrano, L. (1995). Folding of protein G B1 domain studied by the conformational characterization of fragments comprising its secondary structure elements. Eur. J. Biochem. 230, 634-649.

Bonvin, A. M., Vis, H., Breg, J. N., Burgering, M. J., Boelens, R. & Kaptein, R.

(1994). Nuclear magnetic resonance solution structure of the Arc repressor using relaxation matrix calculations. J. Mol. Biol. 236, 328-341.

Breg, J. N., van Opheusden, J. H., Burgering, M. J., Boelens, R. & Kaptein, R. (1990).

Structure of Arc repressor in solution: evidence for a family of beta-sheet DNA-binding proteins. Nature 346, 586-589.

Brown, B. M. & Sauer, R. T. (1999). Tolerance of Arc repressor to multiple-alanine substitutions. Proc. Natl. Acad. Sci. U.S.A. 96, 1983-1988.

Burgering, M. J., Hald, M., Boelens, R., Breg, J. N. & Kaptein, R. (1995). Hydrogen exchange studies of the Arc repressor: evidence for a monomeric folding intermediate. Biochemistry 35, 217-226.

Cordes, M. H. J., Burton, R. E., Walsh, N. P., McKnight, C. J. & Sauer, R. T. (2000).

An evolutionary bridge to a new protein fold. Nat. Struct. Biol. 7, 1129-1132.

Cordes, M. H. J., Walsh, N. P., McKnight, C. J. & Sauer, R. T. (1999). Evolution of a Protein Fold in Vitro. Science 284, 325-327.

Cregut, D., Civera, C., Macias, M. J., Wallon, G. & Serrano, L. (1999). A tale of two secondary structure elements: when a beta-hairpin becomes an alpha-helix. J.

Mol. Biol. 292, 389-401.

Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C. J. A., Hofmann, K. & Bairoch, A. (2002). The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235-238.

Fersht, A. R., Matouschek, A., Sancho, J., Serrano, L. & Vuilleumier, S. (1992).

Pathways of Protein Folding. Faraday Discuss. 93, 183-193.

Gegg, C. V., Bowers, K. E. & Matthews, C. R. (1997). Probing minimal independent folding units in dihydrofolate reductase by molecular dissection. Protein Sci. 6, 1885-1892.

Hamada, D., Kuroda, Y., Tanaka, T. & Goto, Y. (1995). High Helical Propensity of the Peptide Fragments Derived from -Lactoglobulin, a Predominantly -sheet

Protein. J. Mol. Biol. 254, 737-746.

Horovitz, A. & Fersht, A. R. (1992). Co-operative Interactions during Protein Folding.

J. Mol. Biol. 224, 733-740.

Itzhaki, L. S., Neira, J. L. & Fersht, A. R. (1997). Hydrogen Exchange in

Chymotrypsin Inhibitor 1 Probed by Denaturants and Temperature. J. Mol.

Biol. 270, 89-90.

Jackson, S. E. & Fersht, A. R. (1991). Folding of chymotrypsin inhibitor 2. 1.

Evidence for a two-state transition. Biochemistry 30, 10428-10435.

Kabsch, W. & Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22, 2577-2673.

Kabsch, W. & Sander, C. (1984). On the use of sequence homologies to predict protein structure: Identical pentapeptides can have completely different conformations. Proc. Natl. Acad. Sci. U. S. A. 81, 1075-1078.

Kasuya, A. & Thornton, J. M. (1999). Three-dimensional Structure Analysis of PROSITE Patterns. J. Mol. Biol. 286, 1673-1691.

Kuhlman, B., O'Neill, J. W., Kim, D. E., Zhang, K. Y. J. & Baker, D. (2002). Accurate Computer-based Design of a New Backbone Conformation in the Second Turn of Protein L. J. Mol. Biol. 315, 471-477.

Kuroda, Y. & Kim, P. S. (2000). Folding of Bovine Pancreatic Trypsin Inhibitor (BPTI) Variants in which Almost Half the Residues are Alanine. J. Mol. Biol. 298, 493-501.

Matouschek, A., Serrano, L., Meiering, E. M., Bycroft, M. & Fersht, A. R. (1992).

The Folding of an Enzyme V. H/2H Exchange-Nuclear Magnetic Resonance Studies on the Folding Pathway of Barnase: Complementarity to and

Agreement with Protein Engineering Studies. J. Mol. Biol. 224, 837-845.

Minor, D. L. J. & Kim, P. S. (1996). Context-dependent secondary structure formation of a designed protein sequence. Nature 380, 730-734.

Neira, J. L., Itzhaki, L. S., Otzen, D. E., Davis, B. & Fersht, A. R. (1997). Hydrogen Exchange in Chymotrypsin Inhibitor 2 Probed by Mutagensis. J. Mol. Biol.

270, 99-100.

Raschke, T. M. & Marqusee, S. (1998). Hydrogen exchange studies of protein structure. Curr. Opin. Biotech. 9, 80-86.

Raumann, B. E., Rould, M. A., Pabo, C. O. & Sauer, R. T. (1994). DNA recognition by beta-sheets in the Arc repressor-operator crystal structure. Nature 367, 754-757.

Reddy, B. V. B., Datta, S. & Tiwari, S. (1998). Use of propensities of amino acids to the local structural environments to understand effect of substitution mutations on protein stability. Protein Eng. 11, 1137-1145.

Reymond, M. T., Merutka, G., Dyson, H. J. & Wright, P. E. (1997). Folding propensities of peptide fragments of myoglobin. Protein Sci. 6, 706-716.

Serrano, L., Kellis Jr, J. T., Cann, P., Matouschek, A. & Fersht, A. R. (1992). The Folding of an Enzyme II. Substructure of Barnase and the Contribution of Different Interactions to Protein Stability. J. Mol. Biol. 224, 783-804.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Tech. J. 27, 379-423, 623-656.

Sivaraman, T., Kumar, T. K. S., Chang, D. K., Lin, W. Y. & Yu, C. (1998). Events in the Kinetic Folding Pathway of a Small, All -Sheet Protein. J. Biol. Chem.

273, 10181-10189.

Solis, A. D. & Rackovsky, S. (2000). Optimized Representation and Maximal Information in Protsins. Proteins 38, 149-164.

Tsuji, T., Toshida, K., Satoh, A., Kohno, T., Kobayashi, K. & Yanagawa, H. (1999).

Foldability of Barnase Mutants Obtained by Permutation of Modules or

Secondary Structure Units. J. Mol. Biol. 268, 1581-1596.

Yanagawa, H., Yoshida, K., Torigoe, C., Park, J.-S., Sato, K., Shirai, T. & Go, M.

(1993). Protein Anatomy: Functional Roles of Barnase Module. J. Biol. Chem.

268, 5861-5865.

Figure Captions

Figure 1

Distributions of structural entropy for NRPDB-5 (solid lines) and PROSITE patterns

(dashed lines). The structural entropies are calculated using (a) secondary structure, (b) virtual -angles, and (c) torsional angles representations. Pentapeptides in NRPDB-5

have different structure entropy values, implying their different intrinsic structure variations. PROSITE patterns, on the other hand, located mostly on the lower entropy end; this has suggested that most PROSITE patterns are structurally conserved.

Figure 2

Superimposed structures of patterns with low and high structural entropies. The patterns are illustrated as (a) the low entropy patterns, and (b) the high entropy patterns.

Figure 3

Local conformation changes among wild type Arc repressor, N11L, and N11L/L12N mutants. The structures are shown in ribbon representation. Secondary structures affected by the mutations are colored in red. The side chains of the fragment FNLR and FLNR are drawn. The PDB id for the wild type is 1ARR and the PDB id for N11L/L12N double mutant is 1QTG. There is no structure for N11L mutant available.

The diagram illustrated that the local conformation of the fragment FLLR in N11L is a mixture of -sheet (wild type) and -helix (N11L/N12N double mutant).

Figure 4

Statistics of the secondary structure distributions of the three peptide fragment FNLR, FLLR, and FLNR from wild type, N11L mutant, and N11L/L12N double mutant,

respectively. The secondary structure assignment follows DSSP (Kabsch & Sander, 1983); E denotes extended -sheet, whereas H denotes -helix. It can be seen that wild type fragment FNLR has dominant conformation of -sheet, and the double mutant fragment FLNR has dominant conformation of -helix. The fragment from N11L is either in -sheet or -helix. The statistics are done on each residual position,

underlined bold text labels the fractions of the secondary structures higher than average.

Figure 5

Schematics for entropy values and local conformations of wild type and mutant Arc repressors. Each horizontal bar is leveled to the computed entropy value of the peptide fragment. The arrows indicate the corresponding mutations, and texts below the bars describe the conformation of each state. The transition from sheet conformation of wild type to helix of N11L/L12N double mutant could be bridged by the N11L mutant, which composed of various conformations at different solvent conditions.

Figure 6

Structure entropy distributions along the sequences for several proteins are illustrated.

The secondary structure elements of each protein are also identified with horizontal

lines. For consistency, all the secondary structure elements are numbered sequentially;

for example, the first -helix is termed 1, the second termed 2, and so on. The

proteins used for structure entropy computations are (a) Arc repressor, (b) barnase, (c) chymotrypsin inhibitor 2 (CI2), and (d) cardiotoxin analogue III (CTX III),

respectively. The entropy value at each residue position is calculated using the local

peptide fragment around the specified residue.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Tables

Table 1

Summaries of some selected PROSITE patterns with low and high structural entropies.

Accession Number ID Sss S S RMSD (Å)

Low Entropies

PS00068 MDH -1.6844 -2.3646 -1.7411 0.35

PS00155 CUTINASE_1 -1.6528 -2.5895 -1.6988 0.10

PS00271 THIONIN -1.6416 -2.4570 -1.7068 0.23

PS00540 FERRITIN_1 -1.6645 -2.6062 -1.7730 0.19

High Entropies

PS00022 EGF_1 -0.7905 -0.9334 -0.7849 2.19

PS00030 RNP_1 -0.6737 -0.9130 -0.5134 2.18

PS00215 MITOCH_CARRIER -0.5753 -1.6548 -0.6839 3.64

PS00678 WD_REPEATS -0.8367 -1.2915 -0.7468 3.59

在文檔中結構生物資訊－雙硫鍵資料庫之建立與分析工具之發展(II) (頁 21-42)