• 沒有找到結果。

Chapter 4   Conclusions

4.2  Future works

Our scoring matrices consider considering the contact information of the steric and specific energies, but still not perfect. Several works might be able to improve our method to be better:

1. The weights of each term in our proposed scoring method will be obtained a good parameter sets by machine learning approach like genetic algorithm (GA), neural network (NN), or support vector machines (SVM).

2. For detecting possible transcription factor binding sites, more transcription factors which have crystal structures of protein-DNA complexes will be used to as the template. The high-score region predicted by our scoring method in promoter regions will be further verified.

3. Consider the occurrences number of interaction pairs, since that multiple hydrogen bonds has been observed appear in several cases. We roughly test this consideration in to ProNIT free energy data set, the performance is improve from -0.498 to -0.525.

REFERENCE

1. Hannenhalli S, Levy S: Predicting transcription factor synergism. Nucleic Acids Res 2002, 30(19):4278-4284.

2. Smith AD, Sumazin P, Das D, Zhang MQ: Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics 2005, 21 Suppl 1:i403-412.

3. Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS et al: Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 2001, 106(6):697-708.

4. Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004, 5(4):276-287.

5. Tsai HK, Huang GT, Chou MY, Lu HH, Li WH: Method for identifying transcription factor binding sites in yeast. Bioinformatics 2006, 22(14):1675-1681.

6. Liu XS, Brutlag DL, Liu JS: An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments.

Nat Biotechnol 2002, 20(8):835-839.

7. Stormo GD: DNA binding sites: representation and discovery. Bioinformatics 2000, 16(1):16-23.

8. Hoglund A, Kohlbacher O: From sequence to structure and back again:

approaches for predicting protein-DNA binding. Proteome Sci 2004, 2(1):3.

9. Pabo CO, Sauer RT: Protein-DNA recognition. Annu Rev Biochem 1984, 53:293-321.

10. Matthews BW: Protein-DNA interaction. No code for recognition. Nature 1988, 335(6188):294-295.

11. Mandel-Gutfreund Y, Schueler O, Margalit H: Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. J Mol Biol 1995, 253(2):370-382.

12. Brooker RJ:

Genetics: analysis and principles

. 2005.

13. Wang D, Meier TI, Chan CL, Feng G, Lee DN, Landick R: Discontinuous movements of DNA and RNA in RNA polymerase accompany formation of a paused transcription complex. Cell 1995, 81(3):341-350.

14. Guo F, Gopaul DN, van Duyne GD: Structure of Cre recombinase complexed with DNA in a site-specific recombination synapse. Nature 1997, 389(6646):40-46.

15. Buck MJ, Lieb JD: ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments.

Genomics 2004, 83(3):349-360.

Harbison CT, Thompson CM, Simon I et al: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799-804.

17. Kono H, Sarai A: Structure-based prediction of DNA target sites by regulatory proteins. Proteins 1999, 35(1):114-131.

18. Havranek JJ, Duarte CM, Baker D: A simple physical model for the prediction and design of protein-DNA interactions. J Mol Biol 2004, 344(1):59-70.

19. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235-242.

20. Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A:

ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res 2006, 34(Database issue):D204-206.

21. Gama-Castro S, Jimenez-Jacinto V, Peralta-Gil M, Santos-Zavaleta A, Penaloza-Spinola MI, Contreras-Moreira B, Segura-Salazar J, Muniz-Rascado L, Martinez-Flores I, Salgado H et al: RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 2008, 36(Database issue):D120-124.

22. Mandel-Gutfreund Y, Margalit H: Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res 1998, 26(10):2306-2312.

23. McDonald IK, Thornton JM: Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994, 238(5):777-793.

24. Napoli AA, Lawson CL, Ebright RH, Berman HM: Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: recognition of pyrimidine-purine and purine-purine steps. J Mol Biol 2006, 357(1):173-183.

25. Revzin A:

The Biology of nonspecific DNA-protein interactions

. 1990.

26. Luscombe NM, Laskowski RA, Thornton JM: Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res 2001, 29(13):2860-2874.

27. Lilley DMJ: Understanding DNA - the Molecule and How It Works - Calladine,Cr, Drew,Hr. Nature 1994, 367(6461):330-331.

Table 1. List of the representative protein-DNA complex structure set. Each entry is provided with the four-digit PDB code, the protein chain identifier, the chain identifiers of dsDNA to which the protein is bound, the contact number , and the description of the protein.

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1a0a A CD 27 PHOSPHATE SYSTEM POSITIVE REGULATORY 1ais B CE 19 TRANSCRIPTION INITIATION FACTOR IIB 1am9 A EFGH 36 STEROL REGULATORY ELEMENT BINDING PR

1an4 A CD 32 UPSTREAM STIMULATORY FACTOR 1apl C AB 45 MAT-ALPHA2 HOMEODOMAIN

1azp A BC 38 HYPERTHERMOPHILE CHROMOSOMAL PROTEIN 1b3t A CD 70 NUCLEAR PROTEIN EBNA1

1bdh A B 20 PURINE REPRESSOR

1bdt B EF 27 GENE-REGULATING PROTEIN ARC

1bf5 A BC 42 SIGNAL TRANSDUCER AND ACTIVATOR OF TRANSCRIPT 1brn L A 25 BARNASE (E.C.3.1.27.-)

1bvo A D 16 TRANSCRIPTION FACTOR GAMBIF1

1c7y A BCDEFGHI 104 HOLLIDAY JUNCTION DNA HELICASE RUVA 1c9b M CDGHKLOP 38 GENERAL TRANSCRIPTION FACTOR IIB 1cdw A BC 74 TATA BINDING TBP

1cf7 A CD 35 TRANSCRIPTION FACTOR E2F-4 1cf7 B CD 22 TRANSCRIPTION FACTOR DP-2 1cgp A CDEF 42 CATABOLITE GENE ACTIVATOR C

1dc1 A WC 68 BSOBI RESTRICTION ENDONUCLEASE

1de9 B UVW 47 MAJOR APURINIC/APYRIMIDINIC ENDONUCLEASE 1dfm A CD 76 ENDONUCLEASE BGLII

1dh3 A BD 21 TRANSCRIPTION FACTOR CREB

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1emh A BC 23 URACIL-DNA GLYCOSYLASE 1eoo A CD 66 TYPE II RESTRICTION ENZYME ECORV 1eyg C Q 82 SINGLE-STRAND DNA-BINDING PROTEIN 1f0v B N 11 RIBONUCLEASE A

1f2i H AB 42 FUSION OF N-TERMINAL 17-MER PEPTIDE EXTENSION 1f4k B DE 43 REPLICATION TERMINATION PROTEIN

1f6o A DE 38 3-METHYL-ADENINE DNA GLYCOSYLASE 1fiu D GKHL 54 TYPE II RESTRICTION ENZYME NGOMI 1fok A BC 107 FOKI RESTRICTION ENDONUCLEAS 1fos F AB 22 C-JUN PROTO-ONCOGENE PROTEIN

1fzp B WK 8 STAPHYLOCOCCAL ACCESSORY REGULATOR A 1gd2 E AB 33 TRANSCRIPTION FACTOR PAP1

1gm5 A XYZ 46 RECG

1gt0 D AB 61 TRANSCRIPTION FACTOR SOX-2

1gxp F GH 47 PHOSPHATE REGULON TRANSCRIPTIONAL REGULATORY 1h88 B DE 29 CCAAT/ENHANCER BINDING PROTEIN BETA

1h88 C DE 50 MYB PROTO-ONCOGENE PROTEIN 1h9d A EF 35 CORE-BINDING FACTOR ALPHA SUBUNIT1

1h9t A XY 50 FATTY ACID METABOLISM REGULATOR PROTEIN 1hbx G CW 46 ETS-DOMAIN PROTEIN ELK-4

1hdd C AB 38 ENGRAILED HOMEODOMAIN

1hf0 A MN 69 OCTAMER-BINDING TRANSCRIPTION FACTOR 1

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1ign A CD 124 RAP1

1iu3 F ABDE 42 SeqA protein

1ixy A CE 37 DNA beta-glucosyltransferase

1j1v A BC 50 Chromosomal replication initiator protein dna 1jb7 A D 60 telomere-binding protein alpha subunit

1jt0 A EF 37 HYPOTHETICAL TRANSCRIPTIONAL REGULATOR IN QAC 1k3w A BC 34 Endonuclease VIII

1k78 I CDGH 41 Paired Box Protein Pax5

1k82 A FJ 8 formamidopyrimidine-DNA glycosylase 1keg H A 8 Anti-(6-4) photoproduct antibody 64M-2 Fab 1m07 A CD 24 Ribonuclease

1m18 H IJ 28 Histone H2B.1

1m3q A BC 46 8-oxoguanine DNA glycosylase 1m6x B EIFJGH 92 Flp recombinase

1mdm A CD 97 PAIRED BOX PROTEIN PAX-5

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1mje A C 40 breast cancer 2

1mjq H KL 25 METHIONINE REPRESSOR 1mm8 A BC 76 Tn5 Transposase

1n6j B CD 31 Myocyte-specific enhancer factor 2B 1nkp B FG 28 Max protein

1nlw A FG 25 MAD PROTEIN

1noy B S 20 DNA POLYMERASE (E.C.2.7.7.7)

1odh A CD 42 MGCM1

1oe6 B EF 11 SINGLE-STRAND SELECTIVE MONOFUNCTIONAL URACIL 1oh6 A EF 63 DNA MISMATCH REPAIR PROTEIN MUTS

1pgz A B 40 Heterogeneous nuclear ribonucleoprotein A1 1pp8 F EIYKTRJG 50 39 kDa initiator binding protein

1qpi A M 9 TETRACYCLINE REPRESSOR 1qrv A CD 42 HIGH MOBILITY GROUP PROTEIN D 1qum A BCD 54 ENDONUCLEASE IV

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1qzh D J 40 Protection of telomeres protein 1 1r0o A CD 35 Ultraspiracle protein

1r71 A EIFJ 70 Transcriptional repressor protein korB 1r8e A B 20 multidrug-efflux transporter regulator 1rb8 F X 5 Capsid protein

1rzr A EB 45 Glucose-resistance amylase regulator 1s9k D AB 26 Proto-oncogene protein c-fos

1sax B CD 34 Methicillin resistance regulatory protein mec 1seu A BCD 80 DNA topoisomerase I

1sfu A CD 16 34L protein

1skn P AB 37 DNA-BINDING DOMAIN OF SKN-1 1svc P D 29 NUCLEAR FACTOR KAPPA-B (NF-KB) 1t2k D EF 25 Cyclic-AMP-dependent transcription factor ATF 1t39 A CD 33 Methylated-DNA--protein-cysteine methyltransf 1t39 A EF 9 Methylated-DNA--protein-cysteine methyltransf 1tc3 C AB 51 TC3 TRANSPOSASE

1tez C M 15 Deoxyribodipyrimidine photolyase 1tez C N 11 Deoxyribodipyrimidine photolyase

1trr A CI 43 TRP REPRESSOR

1ttu A BC 52 lin-12 And Glp-1 transcriptional regulator 1tx3 C EFGH 79 Type II restriction enzyme HindII

1u1l A B 40 Heterogeneous nuclear ribonucleoprotein A1 1u3e M ABC 128 HNH homing endonuclease

1u78 A BC 99 transposable element tc3 transposase 1u8b A BC 35 Ada polyprotein

1u8r B EF 36 Iron-dependent repressor ideR 1v14 C IJ 34 COLICIN E9

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

1w0t A CD 47 TELOMERIC REPEAT BINDING FACTOR 1 1w36 B Y 47 EXODEOXYRIBONUCLEASE V BETA CHAIN 1w36 C Y 27 EXODEOXYRIBONUCLEASE V GAMMA CHAIN

1wte A XY 77 EcoO109IR

1x9n A BCD 123 DNA ligase I 1x9w A CD 78 DNA polymerase

1xbr B CD 46 T PROTEIN

1xc8 A BC 42 Formamidopyrimidine-DNA glycosylase 1xf2 L T 7 antibody light chain Fab

1xjv A B 50 Protection of telomeres 1 1xpx A DC 19 Protein prospero

1ya6 B CD 27 DNA alpha-glucosyltransferase 1yfi B EF 62 Type II restriction enzyme MspI

1z9c F KL 56 Organic hydroperoxide resistance transcriptio

1zaa C AB 67 ZIF268

1zg1 B CD 34 Nitrate/nitrite response regulator protein na 1zlk A CD 36 Dormancy Survival Regulator

1zme C AB 24 PROLINE UTILIZATION TRANSCRIPTION ACTIVATOR 1zqk A TP 34 DNA POLYMERASE BETA (E.C.2.7.7.7)

1zr4 B JIK 74 Transposon gamma-delta resolvase 1zrc A WXYZ 36 Catabolite gene activator

1zs4 A UT 32 Regulatory protein CII

1zzj B D 31 Heterogeneous nuclear ribonucleoprotein K 2a3v D GH 101 site-specific recombinase IntI4

2ac0 D GH 26 Cellular tumor antigen p53

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

2bnw A EFGH 21 ORF OMEGA

2bop A B 14 E2

2bqu A PT 72 DNA POLYMERASE IV 2bsq G IJ 18 TRAFFICKING PROTEIN A

2bzf A BC 29 BARRIER-TO-AUTOINTEGRATION FACTOR 2c5r F YZ 9 EARLY PROTEIN P16.7

2c62 A C 32 ACTIVATED RNA POLYMERASE II TRANSCRIPTIONAL C 2c9l Y AB 25 BZLF1 TRANS-ACTIVATOR PROTEIN

2ccz A C 18 PRIMOSOMAL REPLICATION PROTEIN N 2d5v A CD 71 Hepatocyte nuclear factor 6

2dem A CD 45 uracil-DNA glycosylase

2dgc A B 14 GCN4

2dpj A PT 48 DNA polymerase iota

2drp A BC 50 TRAMTRACK DNA-BINDING DOMAIN 2dwl C F 13 Primosomal protein N

2e1c A BD 26 Putative HTH-type transcriptional regulator P 2e52 C FH 86 Type II restriction enzyme HindIII

2h8c B WX 15 Crossover junction endodeoxyribonuclease rusA 2h8r B EF 58 Hepatocyte nuclear factor 1-beta

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

2hvr B CD 12 T4 RNA ligase 2

2hzv A IJ 22 Nickel-responsive regulator 2hzv H KL 30 Nickel-responsive regulator

2i06 A BC 135 DNA replication terminus site-binding protein 2i9k A CD 89 Modification methylase HhaI

2o61 A EF 149 Transcription factor p65/Interferon regulator 2o6m A CD 58 Intron-encoded endonuclease I-PpoI

2o8c A EF 13 DNA mismatch repair protein Msh2 2o8f B EF 59 DNA mismatch repair protein MSH6 2oa8 B C 29 Three prime repair exonuclease 1 2oaa A CDEF 116 R.MvaI

2ofi A CB 34 3-methyladenine DNA glycosylase I, constituti 2oh2 A SQ 74 DNA polymerase kappa

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

2pfj A ZY 43 Endodeoxyribonuclease 1 2pi0 D EF 54 Interferon regulatory factor 3

2q2k A F 15 Hypothetical protein 2q2u A EF 94 Chlorella virus DNA ligase

2qby B CD 63 Cell division control protein 6 homolog 3 2qfj A C 7 FBP-interacting repressor

2qhb A EF 37 Telomere binding protein TBP1 2ql2 A EF 23 Transcription factor E2-alpha 2ql2 B EF 20 Neurogenic differentiation factor 1 2qnf B EF 23 Recombination endonuclease VII 2qsg A WY 69 DNA repair protein RAD4 2v6e A CDEF 138 PROTELEMORASE 2ve9 D IJKL 30 DNA TRANSLOCASE FTSK

2w42 A PQ 48 PUTATIVE UNCHARACTERIZED PROTEIN 2w7n A EFGH 75 TRFB TRANSCRIPTIONAL REPRESSOR PROTEIN

2wb2 A CD 31 PHOTOLYASE

2yvh B GH 31 Transcriptional regulator

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

2z70 A B 27 Ribonuclease I

2z9o B CD 48 Replication initiation protein

2zhg A B 13 Redox-sensitive transcriptional activator sox 3b39 A C 17 DNA primase

3bam B CDE 50 RESTRICTION ENDONUCLEASE BAMHI 3bdn A CD 35 Lambda Repressor

3bep A CD 14 DNA polymerase III subunit beta 3bm3 A CD 68 PspGI restriction endonuclease

3bs1 A BC 35 Accessory gene regulator protein A

3btx A BC 54 Alpha-ketoglutarate-dependent dioxygenase alk 3c0x A BCD 113 Intron-encoded endonuclease I-SceI

3c25 A CD 87 NotI restriction endonuclease

3cvs C GH 22 DNA-3-methyladenine glycosylase 2 3d0p A B 16 Ribonuclease H

3d2w A B 24 TAR DNA-binding protein 43 3d70 A B 21 BMR promoter DNA

3dfv C YZ 55 Trans-acting T-cell-specific transcription fa 3dlh A X 108 Argonaute

3dnv B T 17 HTH-type transcriptional regulator hipB 3dsc A B 26 DNA double-strand break repair protein mre11 3dvo B EF 72 SgraIR restriction enzyme

3dzy D CF 49 Peroxisome proliferator-activated receptor ga 3e00 A CF 37 Retinoic acid receptor RXR-alpha

3e54 A CDEF 93 RRNA intron-encoded endonuclease 3e6c C BA 36 Cyclic nucleotide-binding protein 3eh8 A BC 125 Intron-encoded DNA endonuclease I-AniI

3ei1 B GH 27 DNA damage-binding protein 2 3eyi A CD 17 Z-DNA-binding protein 1

3f2c A PT 95 GEOBACILLUS KAUSTOPHILUS DNA POLC

PDB Code

Protein

Chain DNA Chains Contact

number Protein Description

3f8i B FG 40 E3 ubiquitin-protein ligase UHRF1 3fc3 A CD 58 Restriction endonuclease Hpy99I 3fhz A GHKL 40 Arginine repressor

3g73 A CD 35 Forkhead box protein M1

3hts B A 16 HEAT SHOCK TRANSCRIPTION FACTOR 3orc A RS 22 CRO REPRESSOR

Table 2. Thermodynamic data of single residue mutations. Each entry is provided with the four-digit PDB code, the protein chain identifier, the wild-type amino acid of protein, the position of amino acid, the mutated amino acid, and the free energy change ∆∆G, which was calculated as ∆G(mutate) - ∆G(wild).

PDB Code Protein

Chain Wild-type Position Mutate ∆∆G

1ais A E 12 A 0.00

PDB Code Protein

Chain Wild-type Position Mutate ∆∆G

1emh A R 276 C 0.58

PDB Code Protein

Chain Wild-type Position Mutate ∆∆G

1run A E 181 D 0.40 1tro A A 77 V 0.00 2bpf A R 283 A 0.84 2bpf A R 283 K 0.45 2bpg A Y 271 A 0.28 2bpg A Y 271 F 0.06 2bpg A Y 271 S 0.22 2hmi A W 153 A 0.30 2hmi A W 153 F -0.40 2hmi A W 153 Y 0.10

Figure 1. An example of residue–nucleotide interaction pair in 1zrc. A guanine base is making hydrogen bonds to an arginine amino acid. There are two contacts of hydrogen atoms on the arginine with oxygen or nitrogen atoms on the major groove edge of the guanine ring.

(A)

(B)

Figure 2. An example of constructing contact profile. (A) The 3-D structure of CRP protein binding with DNA. HTH motif of CRP chain A was colored in red; blue and green were each chain of double helix DNA. (B) The Contact profile of 1ZRC HTH motif. The contact pair is represented using the form “CP: NP R + CD: ND T”, where CP is the ID of the protein chain, NP

is the residue number, R is the residue symbol, CD is the ID of the DNA chain, ND is the nucleotide number, and T is the nucleotide symbol.

Figure 3. Frequency tables of eight interaction types. (Vss, Vsb, Vms, Vmb, Sss, Ssb, Sms, and Smb).

Figure 4. Log-odds score translated from frequency tables. (Vss, Vsb, Vms, Vmb, Sss, Ssb, Sms, and Smb).

Vss A C G T Vsb A C G T Vms A C G T Vmb A C G T

Gly ‐inf ‐inf ‐inf ‐inf Gly ‐inf ‐inf ‐inf ‐inf Gly 0.93 1.20 0.99 1.31 Gly 0.63 0.86 0.72 0.80

Ala ‐0.66 ‐0.72 ‐0.68 ‐0.28 Ala ‐0.37 ‐0.17 ‐0.17 ‐0.11 Ala 0.05 0.48 ‐0.03 1.24 Ala 0.26 0.30 0.35 0.60

Val ‐0.32 ‐0.46 ‐0.55 ‐0.38 Val ‐0.30 ‐0.18 0.04 ‐0.28 Val ‐0.64 ‐0.79 ‐0.30 ‐0.18 Val 0.02 0.18 0.36 0.10

Ile ‐0.40 ‐0.47 ‐0.66 ‐0.09 Ile ‐0.16 ‐0.06 ‐0.14 ‐0.24 Ile ‐0.48 ‐0.12 ‐0.37 0.26 Ile ‐0.01 0.20 0.22 ‐0.05

Leu ‐0.51 ‐0.61 ‐0.38 0.08 Leu ‐0.23 ‐0.09 0.05 ‐0.07 Leu ‐1.21 0.11 ‐0.42 ‐0.12 Leu ‐0.14 0.30 0.06 0.10

Pro ‐0.43 ‐1.28 ‐0.79 0.09 Pro ‐0.06 ‐0.30 ‐0.14 ‐0.11 Pro 0.36 ‐0.47 ‐1.34 0.72 Pro 0.02 0.06 0.17 0.27

Cys ‐0.77 0.04 ‐0.23 ‐0.16 Cys ‐0.33 ‐0.25 ‐0.36 ‐0.33 Cys 0.29 ‐0.81 ‐inf 0.20 Cys 0.17 0.26 0.55 ‐0.61

Met 0.26 ‐0.40 0.03 0.51 Met 0.07 ‐0.12 0.02 0.21 Met ‐0.08 ‐1.69 0.08 0.71 Met ‐0.45 ‐0.19 0.18 0.45

Phe ‐0.09 ‐0.35 0.02 0.34 Phe 0.13 0.02 0.09 0.24 Phe ‐0.13 ‐0.04 ‐0.38 0.12 Phe ‐0.30 ‐0.05 ‐0.23 ‐0.42

Tyr ‐0.10 0.29 0.06 0.52 Tyr 0.16 0.38 0.27 0.32 Tyr ‐0.44 ‐0.55 ‐0.98 ‐0.06 Tyr ‐0.55 ‐0.39 ‐0.80 ‐0.90

Trp ‐0.41 ‐0.06 ‐0.19 0.08 Trp ‐0.10 0.45 0.13 0.19 Trp ‐inf 0.08 ‐0.63 ‐0.41 Trp ‐1.65 ‐0.26 ‐0.49 ‐1.18

Ser ‐0.53 ‐0.56 ‐0.28 0.27 Ser ‐0.01 0.03 0.06 0.19 Ser ‐0.56 ‐0.06 0.30 0.63 Ser 0.19 0.29 0.35 0.35

Thr ‐0.48 ‐0.16 ‐0.34 0.13 Thr 0.02 0.07 0.16 0.25 Thr ‐0.07 ‐0.20 ‐0.79 0.33 Thr 0.02 0.03 0.27 0.18

Asn 0.32 0.16 0.22 0.33 Asn ‐0.04 ‐0.05 0.03 0.11 Asn ‐0.30 0.39 0.23 0.50 Asn ‐0.28 ‐0.20 ‐0.08 ‐0.11

Gln 0.30 0.25 0.29 0.32 Gln 0.08 ‐0.01 0.08 ‐0.05 Gln ‐0.59 ‐0.48 ‐0.61 ‐0.79 Gln ‐0.18 ‐0.28 ‐0.09 ‐0.25

Asp ‐0.66 0.56 ‐0.03 ‐0.51 Asp ‐0.31 ‐0.08 0.07 ‐0.57 Asp ‐0.05 0.12 ‐0.39 0.03 Asp ‐0.34 ‐0.28 0.17 ‐0.46

Glu ‐0.26 0.60 ‐0.21 0.16 Glu ‐0.43 ‐0.19 ‐0.10 ‐0.52 Glu ‐1.17 ‐0.07 ‐1.19 ‐0.16 Glu ‐0.45 ‐0.36 ‐0.03 ‐0.53

His ‐0.02 0.09 0.33 0.43 His 0.05 ‐0.11 0.05 0.34 His ‐1.09 0.16 ‐0.20 0.43 His ‐0.40 ‐0.51 ‐0.07 ‐0.18

Arg 0.42 0.55 0.79 0.58 Arg 0.18 0.21 0.22 0.34 Arg ‐0.76 ‐0.80 ‐0.78 ‐0.40 Arg ‐0.42 ‐0.49 ‐0.45 ‐0.37

Lys ‐0.26 ‐0.34 0.10 ‐0.33 Lys 0.18 0.27 0.08 0.13 Lys ‐1.25 ‐0.10 ‐0.62 ‐0.12 Lys ‐0.29 ‐0.05 ‐0.37 ‐0.29

Sss A C G T Ssb A C G T Sms A C G T Smb A C G T

Gly ‐inf ‐inf ‐inf ‐inf Gly ‐inf ‐inf ‐inf ‐inf Gly ‐0.01 1.09 1.76 0.42 Gly 0.41 1.13 0.88 0.82

Ala ‐inf ‐inf ‐inf ‐inf Ala ‐inf ‐inf ‐inf ‐inf Ala ‐0.72 0.66 0.64 0.58 Ala 0.42 0.56 0.72 0.47

Val ‐inf ‐inf ‐inf ‐inf Val ‐inf ‐inf ‐inf ‐inf Val 0.57 ‐inf 0.14 ‐inf Val ‐0.56 0.13 ‐0.03 0.45

Ile ‐inf ‐inf ‐inf ‐inf Ile ‐inf ‐inf ‐inf ‐inf Ile ‐inf 0.33 ‐0.39 ‐0.45 Ile ‐0.40 ‐1.79 0.13 ‐0.78

Leu ‐inf ‐inf ‐inf ‐inf Leu ‐inf ‐inf ‐inf ‐inf Leu ‐inf ‐0.54 ‐0.56 ‐0.62 Leu ‐0.17 ‐0.02 0.41 0.15

Pro ‐inf ‐inf ‐inf ‐inf Pro ‐inf ‐inf ‐inf ‐inf Pro ‐0.36 ‐0.36 ‐0.38 ‐inf Pro ‐inf ‐inf ‐inf ‐inf

Cys ‐inf ‐inf ‐inf 0.27 Cys ‐0.53 ‐0.53 ‐0.15 ‐1.31 Cys ‐inf ‐inf ‐inf ‐inf Cys 0.52 0.92 ‐inf ‐inf

Met ‐inf ‐inf ‐inf ‐inf Met ‐inf ‐inf ‐inf ‐inf Met ‐inf ‐inf 1.03 0.97 Met ‐1.06 0.33 ‐0.39 0.65

Phe ‐inf ‐inf ‐inf ‐inf Phe ‐inf ‐inf ‐inf ‐inf Phe ‐inf 0.72 0.29 0.23 Phe ‐0.70 ‐0.01 ‐0.04 ‐0.50

Tyr ‐0.33 ‐inf ‐0.35 ‐0.57 Tyr 0.21 0.28 0.15 0.05 Tyr 0.00 0.70 ‐0.71 ‐inf Tyr ‐0.17 ‐0.73 ‐0.75 ‐1.10

Trp ‐inf ‐inf ‐1.26 ‐1.32 Trp ‐1.02 ‐0.73 ‐0.35 ‐0.13 Trp ‐inf 0.35 ‐inf ‐inf Trp ‐1.07 ‐0.38 ‐1.10 ‐1.16

Ser ‐0.52 ‐0.97 0.19 ‐0.61 Ser 0.09 ‐0.22 0.23 0.42 Ser ‐0.64 0.05 0.59 ‐0.73 Ser ‐0.05 ‐0.05 0.35 0.45

Thr ‐0.83 0.06 ‐0.40 ‐0.67 Thr 0.03 ‐0.05 0.31 0.32 Thr 0.20 ‐0.09 ‐0.52 ‐0.58 Thr 0.52 0.16 0.19 0.63

Asn 0.73 0.17 0.74 0.68 Asn ‐0.44 ‐0.65 ‐0.02 ‐0.14 Asn ‐0.33 1.18 0.57 ‐1.11 Asn ‐0.36 ‐0.50 0.02 0.41

Gln 1.15 ‐0.11 0.50 0.09 Gln ‐0.20 ‐0.20 ‐0.13 ‐0.39 Gln ‐inf 0.37 0.35 ‐inf Gln 0.34 ‐0.76 0.22 0.25

Asp ‐0.59 1.55 0.49 ‐1.37 Asp ‐inf ‐inf ‐inf ‐inf Asp 0.30 1.00 ‐0.41 0.22 Asp ‐0.71 ‐0.72 0.11 ‐0.51

Glu ‐0.70 1.55 ‐0.32 ‐1.07 Glu ‐inf ‐2.97 ‐inf ‐inf Glu ‐inf 0.20 ‐0.52 ‐0.58 Glu ‐1.92 ‐inf 0.00 ‐0.62

His ‐1.31 ‐0.40 1.30 ‐0.15 His 0.16 0.25 0.45 0.74 His ‐inf 0.96 ‐inf 0.19 His ‐0.46 ‐0.75 ‐0.08 ‐0.32

Arg ‐0.14 ‐0.01 1.78 0.66 Arg 0.78 0.82 0.94 0.79 Arg 0.28 ‐0.53 ‐0.84 ‐0.40 Arg ‐1.04 ‐0.35 ‐0.66 ‐0.54

Lys ‐0.57 ‐1.56 1.12 0.12 Lys 0.67 0.88 0.76 0.77 Lys ‐inf 0.80 ‐0.69 ‐1.85 Lys 0.25 0.18 ‐0.32 ‐0.09

Figure 5. A flowchart of calculating the score of protein-DNA complex.

Figure 6. The flowchart of scanning CRP binding sites. We use the protein–DNA complex of CRP (PDB entry: 1zrc), to test the capacity of our model to discriminate targets within real CRP binding sequences.

(A)

(B)

Figure 7. The propensity of 20 amino acids in protein-DNA interaction. (A) Classify with interaction group. (B)Classify with interaction force.

0.00 

Gly Ala Val Ile Leu Pro Cys Met Phe Tyr Trp Ser Thr Asn Gln Asp Glu His Arg Lys

Side

Gly Ala Val Ile Leu Pro Cys Met Phe Tyr Trp Ser Thr Asn Gln Asp Glu His Arg Lys

vdW Special

(A)

(B)

Figure 8. Distribution of interaction types of protein-DNA interactions. (A)In van der Waals forces. (B) In special-forces.

Vsm, 44%

Vss, 25%

Vms, 7%

Vmm, 24% Vsm

Vss Vms Vmm

Ssm, 54%

Sss, 22%

Sms, 5%

Smb, 19% Ssm

Sss Sms Smb

Figure 9. Evaluation of the scoring function in binding affinities prediction. The correlation between scoring matrices and experimental free

energy change (ΔΔG). Our scoring matrices were noted in orange diamonds and DBD-Hunter were green squares. The correlation of our method is -0.498 and DBD-Hunter is -0.471.

‐20 

‐15 

‐10 

‐5  0  5  10 

‐3  ‐2  ‐1  0  1  2  3  4  5 

DBD‐Hunter This study

Figure 10. The distribution of distance from TFBSs to transcription start site (TSS).

(A)

(B)

(C)

Figure 11. Sequences logo of three kinds CRP regulator. (A) 197 activator binding sites, (B) 60 repressor binding sites, and (C) 16 dual binding sites.

6 7 8 9 11 12 13 13 14 14 15 15 16 16 17 18

R180 ss+sb ss/ss ss ss ss

E181 ss ss ss ss/ss ss

(A)

(B)

Figure 13. Sequences logos of scanning CRP binding sites result. (A) 97 TFBSs with ranking top 1%. (B) Least 50% of 9 TFBSs.

相關文件