• 沒有找到結果。

This study incorporated the powerful support vector machine with useful regulatory features of core promoters such as statistically significant 6-mer patterns, nucleotide composition, and DNA stability identify the transcriptional start sites in mammalian genomes. By evaluating the prediction performance of the constructed SVM models based on the evaluation benchmark we constructed, our method results in high prediction sensitivity and specificity. The results showed that the accuracy of our method is greater than 70%. Furthermore, the combinatorial SVM model of the statistically significant 6-mer pattern, nucleotide composition, and DNA stability performs better than the individual or pair of them.

By comparing our method to other previously proposed gene promoter prediction methods, the performance was also better than others. Therefore, we implement an efficient and effective web interface for guiding biologist to analyze gene promoters and transcriptional start sites.

References

Bajic, V. B. and S. H. Seah (2003). "Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units." Genome Res 13(8): 1923-9.

Bajic, V. B., S. H. Seah, et al. (2002). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters." Bioinformatics 18(1): 198-9.

Bajic, V. B., S. L. Tan, et al. (2004). "Promoter prediction analysis on the whole human genome." Nat Biotechnol 22(11): 1467-73.

Chang, C. C. and C. J. Lin (2001). " LIBSVM: a library for support vector machines."

Davuluri, R. V., I. Grosse, et al. (2001). "Computational identification of promoters and first exons in the human genome." Nat Genet 29(4): 412-7.

Down, T. A. and T. J. Hubbard (2002). "Computational detection and location of transcription start sites in mammalian genomic DNA." Genome Res 12(3): 458-61.

Hsu, C. W., C. C. Chang, et al. "A Practical Guide to Support Vector Classification."

Kanhere, A. and M. Bansal (2005). "A novel method for prokaryotic promoter prediction based on DNA stability." BMC Bioinformatics 6(1): 1.

Kanhere, A. and M. Bansal (2005). "Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes." Nucleic Acids Res 33(10):

3165-75.

Knudsen, S. (1999). "Promoter2.0: for the recognition of PolII promoter sequences."

Bioinformatics 15(5): 356-61.

Larsen, F., G. Gundersen, et al. (1992). "CpG islands as gene markers in the human genome." Genomics 13(4): 1095-107.

Ohler, U., G. C. Liao, et al. (2002). "Computational analysis of core promoters in the Drosophila genome." Genome Biol 3(12): RESEARCH0087.

Ohler, U., G. Stemmer, et al. (2000). "Stochastic segment models of eukaryotic promoter regions." Pac Symp Biocomput: 380-91.

Ponger, L. and D. Mouchiroud (2002). "CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences." Bioinformatics 18(4): 631-3.

Prakash, A. and M. Tompa (2005). "Discovery of regulatory elements in vertebrates through comparative genomics." Nat Biotechnol 23(10): 1249-56.

Reese, M. G. (2001). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome." Comput Chem 26(1): 51-6.

SantaLucia, J., Jr. (1998). "A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics." Proc Natl Acad Sci U S A 95(4): 1460-5.

Scherf, M., A. Klingenhoff, et al. (2000). "Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach." J Mol Biol 297(3): 599-606.

Schmid, C. D., V. Praz, et al. (2004). "The Eukaryotic Promoter Database EPD: the impact of in silico primer extension." Nucleic Acids Res 32(Database issue): D82-5.

Solovyev, V. V. and I. A. Shahmuradov (2003). "PromH: Promoters identification using orthologous genomic sequences." Nucleic Acids Res 31(13): 3540-5.

Suzuki, Y., R. Yamashita, et al. (2004). "DBTSS, DataBase of Transcriptional Start Sites:

progress report 2004." Nucleic Acids Res 32(Database issue): D78-81.

Vapnik, V. N. (1995). The nature of statistical learning theory. New York.

Xing, B. and M. J. van der Laan (2005). "A statistical method for constructing transcriptional regulatory networks using gene expression and sequence data." J Comput Biol 12(2): 229-46.

Appendix A

Table A.1 Top 100 patterns of statistically significant 6-mer pattern in group 1 (all).

SN pattern whole Whole pro. 80num Pos pro. OC ratio 1 CGCGCG 93380 0.0000164 1354 0.001396 85.20718 2 GCGCGC 142810 0.0000251 1676 0.001729 68.96482 3 CGCCGC 165495 0.0000290 1916 0.001976 68.03347 4 GCGGCG 165495 0.0000290 1916 0.001976 68.03347 5 CGGCGC 119385 0.0000210 1347 0.001389 66.30266 6 GCGCCG 119385 0.0000210 1347 0.001389 66.30266 7 CGCGGC 123365 0.0000217 1388 0.001432 66.11665 8 GCCGCG 123365 0.0000217 1388 0.001432 66.11665 9 CCGCCG 156119 0.0000274 1558 0.001607 58.64409 10 CGGCGG 156119 0.0000274 1558 0.001607 58.64409 11 CCGGCG 111276 0.0000195 1080 0.001114 57.03421 12 CGCCGG 111276 0.0000195 1080 0.001114 57.03421 13 AGCGCG 72155 0.0000127 683 0.000704 55.62466 14 CGCGCT 72155 0.0000127 683 0.000704 55.62466 15 CCGCGG 129318 0.0000227 1206 0.001244 54.80265 16 CGCGCA 75745 0.0000133 697 0.000719 54.07431 17 TGCGCG 75745 0.0000133 697 0.000719 54.07431 18 CCCGCG 127926 0.0000225 1163 0.001199 53.42372 19 CGCGGG 127926 0.0000225 1163 0.001199 53.42372 20 CGCGAG 63936 0.0000112 562 0.00058 51.65401 21 CTCGCG 63936 0.0000112 562 0.00058 51.65401 22 CGCGGA 61387 0.0000108 525 0.000541 50.25664 23 TCCGCG 61387 0.0000108 525 0.000541 50.25664 24 TCGCGA 29124 0.0000051 238 0.000245 48.0218 25 CGCGAC 35935 0.0000063 289 0.000298 47.25988 26 GTCGCG 35935 0.0000063 289 0.000298 47.25988 27 CGTCGC 44589 0.0000078 329 0.000339 43.35912 28 GCGACG 44589 0.0000078 329 0.000339 43.35912 29 CCGCGC 210901 0.0000370 1521 0.001569 42.38025 30 GCGCGG 210901 0.0000370 1521 0.001569 42.38025 31 CGCCGA 51860 0.0000091 371 0.000383 42.03916

32 TCGGCG 51860 0.0000091 371 0.000383 42.03916 33 CGGCGA 56495 0.0000099 396 0.000408 41.19057 34 TCGCCG 56495 0.0000099 396 0.000408 41.19057 35 CGAGCG 65811 0.0000116 453 0.000467 40.44943 36 CGCTCG 65811 0.0000116 453 0.000467 40.44943 37 CCGCGA 55823 0.0000098 367 0.000379 38.63365 38 TCGCGG 55823 0.0000098 367 0.000379 38.63365 39 CGCGTC 59353 0.0000104 390 0.000402 38.61302 40 GACGCG 59353 0.0000104 390 0.000402 38.61302 41 ACGCGC 66140 0.0000116 428 0.000441 38.02714 42 GCGCGT 66140 0.0000116 428 0.000441 38.02714 43 CGACGC 41872 0.0000074 270 0.000278 37.89237 44 GCGTCG 41872 0.0000074 270 0.000278 37.89237 45 CGACCG 33411 0.0000059 208 0.000215 36.58352 46 CGGTCG 33411 0.0000059 208 0.000215 36.58352 47 CGGCCG 176492 0.0000310 1056 0.001089 35.16022 48 CGCCCC 362006 0.0000635 2130 0.002197 34.57615 49 GGGGCG 362006 0.0000635 2130 0.002197 34.57615 50 ACGCCG 64787 0.0000114 373 0.000385 33.83243 51 CGGCGT 64787 0.0000114 373 0.000385 33.83243 52 GCCGCC 344938 0.0000605 1890 0.001949 32.19831 53 GGCGGC 344938 0.0000605 1890 0.001949 32.19831 54 CCCCGC 426331 0.0000748 2201 0.00227 30.33793 55 GCGGGG 426331 0.0000748 2201 0.00227 30.33793 56 CCGCCC 518352 0.0000910 2647 0.00273 30.00832 57 GGGCGG 518352 0.0000910 2647 0.00273 30.00832 58 GCGGCC 261440 0.0000459 1330 0.001372 29.89456 59 GGCCGC 261440 0.0000459 1330 0.001372 29.89456 60 CCGTCG 44482 0.0000078 226 0.000233 29.85637 61 CGACGG 44482 0.0000078 226 0.000233 29.85637 62 CCGGAA 199323 0.0000350 1000 0.001031 29.48189 63 TTCCGG 199323 0.0000350 1000 0.001031 29.48189 64 CGCACG 77899 0.0000137 381 0.000393 28.74135 65 CGTGCG 77899 0.0000137 381 0.000393 28.74135 66 CGGGGC 371940 0.0000653 1806 0.001863 28.53367 67 GCCCCG 371940 0.0000653 1806 0.001863 28.53367 68 CGCGAA 30243 0.0000053 145 0.00015 28.17444 69 TTCGCG 30243 0.0000053 145 0.00015 28.17444

70 ACGCGG 69430 0.0000122 326 0.000336 27.59189 71 CCGCGT 69430 0.0000122 326 0.000336 27.59189 72 ACGGCG 68155 0.0000120 312 0.000322 26.90103 73 CGCCGT 68155 0.0000120 312 0.000322 26.90103 74 CGCGCC 291748 0.0000512 1310 0.001351 26.38613 75 GGCGCG 291748 0.0000512 1310 0.001351 26.38613 76 CCGACG 45809 0.0000080 200 0.000206 25.6562 77 CGTCGG 45809 0.0000080 200 0.000206 25.6562 78 GCCGGC 246470 0.0000433 1066 0.001099 25.41593 79 CGGAAG 262733 0.0000461 1119 0.001154 25.02812 80 CTTCCG 262733 0.0000461 1119 0.001154 25.02812 81 AACGCG 38488 0.0000068 161 0.000166 24.58179 82 CGCGTT 38488 0.0000068 161 0.000166 24.58179 83 CGGACG 65539 0.0000115 261 0.000269 23.40193 84 CGTCCG 65539 0.0000115 261 0.000269 23.40193 85 GCCGGA 184107 0.0000323 721 0.000744 23.01324 86 TCCGGC 184107 0.0000323 721 0.000744 23.01324 87 CTGCGC 288998 0.0000507 1098 0.001132 22.32648 88 GCGCAG 288998 0.0000507 1098 0.001132 22.32648 89 CACGCG 89405 0.0000157 331 0.000341 21.75602 90 CGCGTG 89405 0.0000157 331 0.000341 21.75602 91 GCGCGA 113177 0.0000199 414 0.000427 21.49582 92 TCGCGC 113177 0.0000199 414 0.000427 21.49582 93 ACGTCG 36991 0.0000065 135 0.000139 21.44626 94 CGACGT 36991 0.0000065 135 0.000139 21.44626 95 GAGCGC 183691 0.0000322 656 0.000677 20.98595 96 GCGCTC 183691 0.0000322 656 0.000677 20.98595 97 ACGCGA 32037 0.0000056 114 0.000118 20.91052 98 TCGCGT 32037 0.0000056 114 0.000118 20.91052 99 TGCGCA 174690 0.0000307 616 0.000635 20.72174 100 CGCGTA 22145 0.0000039 78 8.04E-05 20.69811

Table A.2 Top 100 patterns of statistically significant 6-mer pattern in group 2 (non-CpG island).

SN pattern whole Whole pro. 80num Pos pro. OC ratio 1 CGCGAA 30243 0.0000053 6 0.0000255 4.810317 2 TTCGCG 30243 0.0000053 6 0.0000255 4.810317 3 CGCCCC 362006 0.0000635 71 0.0003023 4.759941 4 GGGGCG 362006 0.0000635 71 0.0003023 4.759941 5 CCGCCC 518352 0.0000910 100 0.0004257 4.678166 6 GGGCGG 518352 0.0000910 100 0.0004257 4.678166 7 CCCCGC 426331 0.0000748 82 0.0003491 4.666908 8 GCGGGG 426331 0.0000748 82 0.0003491 4.666908 9 ATCCGG 135765 0.0000238 26 0.0001107 4.650647 10 CCGGAT 135765 0.0000238 26 0.0001107 4.650647 11 GACGTC 138788 0.0000244 26 0.0001107 4.536287 12 CCGCAG 305519 0.0000536 54 0.0002299 4.2889 13 CTGCGG 305519 0.0000536 54 0.0002299 4.2889 14 CGCGAG 63936 0.0000112 11 0.0000468 4.181111 15 CTCGCG 63936 0.0000112 11 0.0000468 4.181111 16 CGGAAG 262733 0.0000461 45 0.0001916 4.155551 17 CTTCCG 262733 0.0000461 45 0.0001916 4.155551 18 ACGTCA 227674 0.0000400 39 0.0001660 4.150702 19 TGACGT 227674 0.0000400 39 0.0001660 4.150702 20 CGTCGA 23424 0.0000041 4 0.0000170 4.143193 21 TCGACG 23424 0.0000041 4 0.0000170 4.143193 22 CCGGAA 199323 0.0000350 34 0.0001447 4.135498 23 TTCCGG 199323 0.0000350 34 0.0001447 4.135498 24 GCCGGC 246470 0.0000433 42 0.0001788 4.129318 25 CGGCAG 310633 0.0000545 52 0.0002214 4.061849 26 CTGCCG 310633 0.0000545 52 0.0002214 4.061849 27 ACTCGC 134642 0.0000236 22 0.0000937 3.968512 28 GCGAGT 134642 0.0000236 22 0.0000937 3.968512 29 CGCGGA 61387 0.0000108 10 0.0000426 3.941788 30 TCCGCG 61387 0.0000108 10 0.0000426 3.941788 31 CCGCGA 55823 0.0000098 9 0.0000383 3.90961 32 TCGCGG 55823 0.0000098 9 0.0000383 3.90961 33 CGGCGA 56495 0.0000099 9 0.0000383 3.862316 34 TCGCCG 56495 0.0000099 9 0.0000383 3.862316 35 CGACTC 145121 0.0000255 23 0.0000979 3.839765

36 GAGTCG 145121 0.0000255 23 0.0000979 3.839765 37 GCGGCA 221533 0.0000389 35 0.0001490 3.830323 38 TGCCGC 221533 0.0000389 35 0.0001490 3.830323 39 CGGCCC 309351 0.0000543 48 0.0002043 3.763209 40 GGGCCG 309351 0.0000543 48 0.0002043 3.763209 41 TCCGCA 176001 0.0000309 27 0.0001149 3.719823 42 TGCGGA 176001 0.0000309 27 0.0001149 3.719823 43 CGGGGC 371940 0.0000653 57 0.0002427 3.716025 44 GCCCCG 371940 0.0000653 57 0.0002427 3.716025 45 GCCGGA 184107 0.0000323 28 0.0001192 3.690392 46 TCCGGC 184107 0.0000323 28 0.0001192 3.690392 47 GCCCGA 172187 0.0000302 26 0.0001107 3.665079 48 TCGGGC 172187 0.0000302 26 0.0001107 3.665079 49 AGGGCG 258763 0.0000454 39 0.0001660 3.657007 50 CGCCCT 258763 0.0000454 39 0.0001660 3.657007 51 CGTCAC 192503 0.0000338 29 0.0001235 3.652568 52 GTGACG 192503 0.0000338 29 0.0001235 3.652568 53 ACCGGA 127794 0.0000224 19 0.0000809 3.610959 54 TCCGGT 127794 0.0000224 19 0.0000809 3.610959 55 GGGCCC 909212 0.0001596 134 0.0005705 3.574887 56 ACGGCC 186522 0.0000327 27 0.0001149 3.515062 57 GGCCGT 186522 0.0000327 27 0.0001149 3.515062 58 GCCGAC 118157 0.0000207 17 0.0000724 3.496194 59 GTCGGC 118157 0.0000207 17 0.0000724 3.496194 60 CGGCAC 202327 0.0000355 29 0.0001235 3.477656 61 GTGCCG 202327 0.0000355 29 0.0001235 3.477656 62 CGCAGA 233353 0.0000410 33 0.0001405 3.426471 63 TCTGCG 233353 0.0000410 33 0.0001405 3.426471 64 TCGGCA 177866 0.0000312 25 0.0001064 3.411162 65 TGCCGA 177866 0.0000312 25 0.0001064 3.411162 66 CCCGGA 256531 0.0000450 36 0.0001533 3.405705 67 TCCGGG 256531 0.0000450 36 0.0001533 3.405705 68 CCGGTC 136422 0.0000239 19 0.0000809 3.38433 69 GACCGG 136422 0.0000239 19 0.0000809 3.38433 70 CCCGCC 746914 0.0001311 104 0.0004427 3.377412 71 GGCGGG 746914 0.0001311 104 0.0004427 3.377412 72 CGGAAC 129310 0.0000227 18 0.0000766 3.375698 73 GTTCCG 129310 0.0000227 18 0.0000766 3.375698

74 CCGGCA 230262 0.0000404 32 0.0001362 3.371985 75 TGCCGG 230262 0.0000404 32 0.0001362 3.371985 76 GCGAAC 79344 0.0000139 11 0.0000468 3.368952 77 GTTCGC 79344 0.0000139 11 0.0000468 3.368952 78 TCGCGA 29124 0.0000051 4 0.0000170 3.332392 79 GGCCCC 1025386 0.0001800 140 0.0005960 3.311782 80 GGGGCC 1025386 0.0001800 140 0.0005960 3.311782 81 GCGGAA 168690 0.0000296 23 0.0000979 3.307906 82 TTCCGC 168690 0.0000296 23 0.0000979 3.307906 83 CGAACG 29492 0.0000052 4 0.0000170 3.28736 84 CGTTCG 29492 0.0000052 4 0.0000170 3.28736 85 CGCGTA 22145 0.0000039 3 0.0000128 3.283134 86 TACGCG 22145 0.0000039 3 0.0000128 3.283134 87 ACGTCG 36991 0.0000065 5 0.0000213 3.279762 88 CGACGT 36991 0.0000065 5 0.0000213 3.279762 89 GCGGAC 110933 0.0000195 15 0.0000639 3.274716 90 GTCCGC 110933 0.0000195 15 0.0000639 3.274716 91 CACCGG 200517 0.0000352 27 0.0001149 3.265413 92 CCGGTG 200517 0.0000352 27 0.0001149 3.265413 93 AGCGTC 171060 0.0000300 23 0.0000979 3.2638 94 GACGCT 171060 0.0000300 23 0.0000979 3.2638 95 AGTCGG 157158 0.0000276 21 0.0000894 3.239121 96 CCGACT 157158 0.0000276 21 0.0000894 3.239121 97 CTCCGA 203551 0.0000357 27 0.0001149 3.219679 98 TCGGAG 203551 0.0000357 27 0.0001149 3.219679 99 ACGCTG 294212 0.0000516 39 0.0001660 3.217599 100 CAGCGT 294212 0.0000516 39 0.0001660 3.217599

Table A.3 Top 100 patterns of statistically significant 6-mer pattern in group 3 (CpG island).

SN pattern whole Whole pro. 80num Pos pro. OC ratio 1 CGCGCG 93380 0.0000164 1350 0.001837 112.0417 2 GCGCGC 142810 0.0000251 1674 0.002278 90.77614 3 CGCCGC 165495 0.0000290 1903 0.00259 89.31631 4 GCGGCG 165495 0.0000290 1903 0.00259 89.31631 5 CGGCGC 119385 0.0000210 1338 0.001821 86.7215 6 GCGCCG 119385 0.0000210 1338 0.001821 86.7215 7 CGCGGC 123365 0.0000217 1377 0.001874 86.37025 8 GCCGCG 123365 0.0000217 1377 0.001874 86.37025 9 CCGCCG 156119 0.0000274 1544 0.002102 76.69847 10 CGGCGG 156119 0.0000274 1544 0.002102 76.69847 11 CCGGCG 111276 0.0000195 1067 0.001452 74.47659 12 CGCCGG 111276 0.0000195 1067 0.001452 74.47659 13 AGCGCG 72155 0.0000127 678 0.000923 72.66344 14 CGCGCT 72155 0.0000127 678 0.000923 72.66344 15 CCGCGG 129318 0.0000227 1198 0.001631 71.83249 16 CGCGCA 75745 0.0000133 689 0.000938 70.51111 17 TGCGCG 75745 0.0000133 689 0.000938 70.51111 18 CCCGCG 127926 0.0000225 1155 0.001572 69.86979 19 CGCGGG 127926 0.0000225 1155 0.001572 69.86979 20 CGCGAG 63936 0.0000112 551 0.00075 66.96125 21 CTCGCG 63936 0.0000112 551 0.00075 66.96125 22 CGCGGA 61387 0.0000108 515 0.000701 64.90429 23 TCCGCG 61387 0.0000108 515 0.000701 64.90429 24 TCGCGA 29124 0.0000051 234 0.000318 62.32825 25 CGCGAC 35935 0.0000063 288 0.000392 62.1231 26 GTCGCG 35935 0.0000063 288 0.000392 62.1231 27 CGTCGC 44589 0.0000078 325 0.000442 56.4952 28 GCGACG 44589 0.0000078 325 0.000442 56.4952 29 CCGCGC 210901 0.0000370 1509 0.002054 55.5108 30 GCGCGG 210901 0.0000370 1509 0.002054 55.5108 31 CGCCGA 51860 0.0000091 365 0.000497 54.59356 32 TCGGCG 51860 0.0000091 365 0.000497 54.59356 33 CGGCGA 56495 0.0000099 387 0.000527 53.09936 34 TCGCCG 56495 0.0000099 387 0.000527 53.09936 35 CGAGCG 65811 0.0000116 447 0.000608 52.44928

36 CGCTCG 65811 0.0000116 447 0.000608 52.44928 37 CGCGTC 59353 0.0000104 386 0.000525 50.51774 38 GACGCG 59353 0.0000104 386 0.000525 50.51774 39 ACGCGC 66140 0.0000116 425 0.000578 49.86788 40 GCGCGT 66140 0.0000116 425 0.000578 49.86788 41 CCGCGA 55823 0.0000098 358 0.000487 49.72181 42 TCGCGG 55823 0.0000098 358 0.000487 49.72181 43 CGACGC 41872 0.0000074 266 0.000362 49.25885 44 GCGTCG 41872 0.0000074 266 0.000362 49.25885 45 CGACCG 33411 0.0000059 204 0.000278 47.383 46 CGGTCG 33411 0.0000059 204 0.000278 47.383 47 CGGCCG 176492 0.0000310 1048 0.001426 46.01395 48 CGCCCC 362006 0.0000635 2059 0.002803 44.13393 49 GGGGCG 362006 0.0000635 2059 0.002803 44.13393 50 ACGCCG 64787 0.0000114 368 0.000501 43.93726 51 CGGCGT 64787 0.0000114 368 0.000501 43.93726 52 GCCGCC 344938 0.0000605 1854 0.002523 41.7104 53 GGCGGC 344938 0.0000605 1854 0.002523 41.7104 54 GCGGCC 261440 0.0000459 1306 0.001778 38.72759 55 GGCCGC 261440 0.0000459 1306 0.001778 38.72759 56 CCCCGC 426331 0.0000748 2119 0.002884 38.55843 57 GCGGGG 426331 0.0000748 2119 0.002884 38.55843 58 CCGTCG 44482 0.0000078 221 0.000301 38.51512 59 CGACGG 44482 0.0000078 221 0.000301 38.51512 60 CCGCCC 518352 0.0000910 2547 0.003467 38.09584 61 GGGCGG 518352 0.0000910 2547 0.003467 38.09584 62 CCGGAA 199323 0.0000350 966 0.001315 37.56635 63 TTCCGG 199323 0.0000350 966 0.001315 37.56635 64 CGCACG 77899 0.0000137 376 0.000512 37.35573 65 CGTGCG 77899 0.0000137 376 0.000512 37.35573 66 CGGGGC 371940 0.0000653 1749 0.002381 36.4558 67 GCCCCG 371940 0.0000653 1749 0.002381 36.4558 68 ACGCGG 69430 0.0000122 321 0.000437 35.81254 69 CCGCGT 69430 0.0000122 321 0.000437 35.81254 70 CGCGAA 30243 0.0000053 139 0.000189 35.62954 71 TTCGCG 30243 0.0000053 139 0.000189 35.62954 72 ACGGCG 68155 0.0000120 306 0.000416 34.70804 73 CGCCGT 68155 0.0000120 306 0.000416 34.70804

74 CGCGCC 291748 0.0000512 1303 0.001774 34.63893 75 GGCGCG 291748 0.0000512 1303 0.001774 34.63893 76 CCGACG 45809 0.0000080 194 0.000264 32.84246 77 CGTCGG 45809 0.0000080 194 0.000264 32.84246 78 GCCGGC 246470 0.0000433 1024 0.001394 32.18859 79 AACGCG 38488 0.0000068 159 0.000216 32.06146 80 CGCGTT 38488 0.0000068 159 0.000216 32.06146 81 CGGAAG 262733 0.0000461 1074 0.001462 31.70979 82 CTTCCG 262733 0.0000461 1074 0.001462 31.70979 83 CGGACG 65539 0.0000115 258 0.000351 30.53598 84 CGTCCG 65539 0.0000115 258 0.000351 30.53598 85 GCCGGA 184107 0.0000323 693 0.000943 29.20254 86 TCCGGC 184107 0.0000323 693 0.000943 29.20254 87 CTGCGC 288998 0.0000507 1074 0.001462 28.83276 88 GCGCAG 288998 0.0000507 1074 0.001462 28.83276 89 CACGCG 89405 0.0000157 329 0.000448 28.52241 90 CGCGTG 89405 0.0000157 329 0.000448 28.52241 91 GCGCGA 113177 0.0000199 410 0.000558 28.04276 92 TCGCGC 113177 0.0000199 410 0.000558 28.04276 93 ACGTCG 36991 0.0000065 130 0.000177 27.26394 94 CGACGT 36991 0.0000065 130 0.000177 27.26394 95 GAGCGC 183691 0.0000322 642 0.000874 27.13746 96 GCGCTC 183691 0.0000322 642 0.000874 27.13746 97 ACGCGA 32037 0.0000056 111 0.000151 26.88293 98 TCGCGT 32037 0.0000056 111 0.000151 26.88293 99 TGCGCA 174690 0.0000307 604 0.000822 26.77864 100 CGTACG 22916 0.0000040 78 0.000106 26.4094

相關文件