• 沒有找到結果。

The CpG O/E was defined as

CpG O/E = ¿ PCpG

PC×PG=number of observed CpG×exon length

number of C×number of G , where PCpG, PC,

and PG represent the frequency of CpG dinucleotides, C nucleotides, and G nucleotides, respectively.

Classification of coding exons

The A. thaliana gene annotations and the corresponding coding sequences were downloaded from the Ensembl genome browser at http://www.ensembl.org/. The CDSs that overlap with non-coding RNAs or pseudogenes were excluded. Single-exon genes were also excluded. According to the relative positions of Single-exons in the Ensembl-annotated genes, the retrieved coding exonic regions were divided into three groups: first, internal, and last exons. Briefly, all of the transcript isoforms of a gene 489

490 491 492 493 494 495 496 497

498 499 500 501

502 503 504 505 506 507 508 509 510 511 512

were collated (except for those that overlapped non-coding RNAs or pseudogenes), and the coordinates of the exons were compared. The coding exon that was closest to the most downstream 5’UTR and the most upstream 3’UTR was classified as the first and last coding exon, respectively. However, in the case where a stand-alone 5’UTR exon was followed by a second 5’UTR juxtaposed to a coding exon, this coding exon was excluded. This is because in this case, the first coding exon is not part of the most upstream exonic region. The same comment also applied to the last exon. The

remaining exons that were neither first nor last coding exons were considered as internal exons. The retrieved exons were also classified into constitutively and alternatively spliced exons (CSEs and ASEs, respectively) according to whether they were always present in different transcript isoforms of a gene.

Measurement of exonic expression level

The transcriptome data for A. thaliana pollen derived from a recent study [49]

were retrieved from the Gene Expression Omnibus database under accession number SRP022162. The sequencing reads were mapped to the A. thaliana genome by using TopHat 2 [50], and then analyzed by using eXpress [51] to obtain exonic expression levels.

Predictions of intrinsically disordered regions and repetitive elements

The genomic and peptide sequences of A. thaliana retrieved from the ENSEMBL Plants website were submitted to RepeatMasker [52] and Disopred [53], respectively, for predictions of repetitive elements and intrinsically disordered regions. The

prediction tools were applied with default parameters. The proportions of exonic regions that overlapped repetitive elements and disordered regions were then calculated separately.

22

513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 43

Calculation of evolutionary rates

The one-to-one gene orthology between A. thaliana and A. lyrata was retrieved from ENSEMBL Plants (Version 18). The protein sequences of the orthologous genes were aligned using MUSCLE [54] and then back-translated to nucleotide sequences.

The aligned sequences were then separated exon-wise according to the annotations of ENSEMBL. The exonic sequence alignments were checked for the correctness of reading frame before being submitted to the CodeML program of PAML4 [55] for the calculations of dN, dS and the dN/dS ratio.

List of abbreviations ASE: alternative exon CDS: coding sequence

CpG methylation: the level of DNA methylation at CpG dinucleotides

CpG O/E ratio: the observed-to-expected ratio of the number of CpG dinucleotides CSE: constitutive exon

dN: nonsynonymous substitution rate dS: synonymous substitution rate mCG: methylated CpG dinucleotide TFBS: transcription factor binding site UTR: untranslated region

Competing interests

The authors declare that they have no competing interests.

Author contributions 539

540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564

Conceived the study: FCC; designed the research: FCC; conducted data collection and analysis: HYL and MKH; data interpretation: FCC and TJC; wrote the manuscript:

FCC and TJC.

Acknowledgements

We thank Dr. Pao-Yang Chen and Mr. Wen-Wei Liao at Academia Sinica, and Dr.

Wen Wang and Xin Li at the Kunming Institute of Zoology for technical assistance in processing the methylome data. We are also grateful for Dr. Ben-Yang Liao for constructive comments. This study was supported by the Ministry of Science and Technology under contract number 102-2311-B-400-003 (to FCC) and NSC-102-2621-B-001-003 (to TJC).

24

565 566 567 568 569 570 571 572 573 574 575 576 577 578

47

References

1. Kelemen O, Convertini P, Zhang Z, Wen Y, Shen M, Falaleeva M, Stamm S:

Function of alternative splicing. Gene 2013, 514(1):1-30.

2. Singh RK, Cooper TA: Pre-mRNA splicing in disease and therapeutics. Trends in molecular medicine 2012, 18(8):472-482.

3. Carvalho RF, Feijao CV, Duque P: On the physiological significance of

alternative splicing events in higher plants. Protoplasma 2012, [Epub ahead of print].

4. Kalsotra A, Cooper TA: Functional consequences of developmentally

regulated alternative splicing. Nature reviews Genetics 2011, 12(10):715-729.

5. Keren H, Lev-Maor G, Ast G: Alternative splicing and evolution:

diversification, exon definition and function. Nature reviews Genetics 2010, 11(5):345-355.

6. Chen FC, Liao BY, Pan CL, Lin HY, Chang AY: Assessing determinants of exonic evolutionary rates in mammals. Molecular biology and evolution 2012, 29(10):3121-3129.

7. Wu GC, Chen FC: Determinants of exon-level evolutionary rates in Arabidopsis species. Evolutionary bioinformatics online 2012, 8:389-415.

8. Chen FC, Chaw SM, Tzeng YH, Wang SS, Chuang TJ: Opposite evolutionary effects between different alternative splicing patterns. Molecular biology and evolution 2007, 24(7):1443-1446.

9. Chen FC, Chen CJ, Ho JY, Chuang TJ: Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat. BMC bioinformatics 2006, 7:136.

10. Chen FC, Chuang TJ: Different alternative splicing patterns are subject to opposite selection pressure for protein reading frame preservation. BMC evolutionary biology 2007, 7(1):179.

11. Gelly JC, Lin HY, de Brevern AG, Chuang TJ, Chen FC: Selective constraint on human pre-mRNA splicing by protein structural properties. Genome biology and evolution 2012, 4(9):966-975.

12. Chuang TJ, Chen FC, Chen YZ: Position-dependent correlations between DNA methylation and the evolutionary rates of mammalian coding exons.

Proceedings of the National Academy of Sciences of the United States of America 2012, 109(39):15841-15846.

13. Mouse Genome Sequence Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420(6915):520-562.

579

14. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A,

Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C et al: Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860-921.

15. Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408(6814):796-815.

16. Gossmann TI, Woolfit M, Eyre-Walker A: Quantifying the variation in the effective population size within a genome. Genetics 2011, 189(4):1389-1402.

17. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 2008, 452(7184):215-219.

18. Chen FC, Chuang TJ: The effects of multiple features of alternatively spliced exons on the K(A)/K(S) ratio test. BMC bioinformatics 2006, 7:259.

19. Chen FC, Wang SS, Chen CJ, Li WH, Chuang TJ: Alternatively and

constitutively spliced exons are subject to different evolutionary forces.

Molecular biology and evolution 2006, 23(3):675-682.

20. Graur D, Li W-H: Fundamentals of molecular evolution, second edition edn.

Sunderland, Massachusetts: Sinauer Associates; 2000.

21. Chen FC, Pan CL, Lin HY: Independent effects of alternative splicing and structural constraint on the evolution of mammalian coding exons.

Molecular biology and evolution 2011, 29(1):187-193.

22. Brown CJ, Johnson AK, Daughdrill GW: Comparing models of evolution for ordered and disordered proteins. Molecular biology and evolution 2010, 27(3):609-621.

23. Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK: Evolutionary rate heterogeneity in proteins with long

disordered regions. Journal of molecular evolution 2002, 55(1):104-110.

24. Cheng C, Bhardwaj N, Gerstein M: The relationship between the evolution of microRNA targets and the length of their UTRs. BMC genomics 2009, 10:431.

25. Chen SC, Chuang TJ, Li WH: The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Molecular biology and evolution 2011, 28(9):2513-2520.

26. Chen YC, Cheng JH, Tsai ZT, Tsai HK, Chuang TJ: The impact of trans-regulation on the evolutionary rates of metazoan proteins. Nucleic acids research 2013, 41(13):6371-6380.

27. Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M: Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome research 2011, 21(11):1916-1928.

28. Ding J, Li D, Ohler U, Guan J, Zhou S: Genome-wide search for miRNA-target interactions in Arabidopsis thaliana with an integrated approach. BMC genomics 2012, 13 Suppl 3:S3.

29. Dweep H, Sticht C, Pandey P, Gretz N: miRWalk--database: prediction of possible miRNA binding sites by "walking" the genes of three genomes.

Journal of biomedical informatics 2011, 44(5):839-847.

30. Wu L, Zhou H, Zhang Q, Zhang J, Ni F, Liu C, Qi Y: DNA methylation mediated by a microRNA pathway. Molecular cell 2010, 38(3):465-475.

31. Hu W, Wang T, Xu J, Li H: MicroRNA mediates DNA methylation of target genes. Biochemical and biophysical research communications 2014, 444(4):676-681.

32. Chuang TJ, Chiang TW: Pre-transcriptional DNA Methylation, Transcriptional Transcription Factor and Post-transcriptional microRNA Regulations on Protein Evolutionary Rate. Genome biology and evolution 2014, In press.

33. Fang Z, Rajewsky N: The impact of miRNA target sites in coding sequences and in 3'UTRs. PLOS ONE 2011, 6(3):e18067.

34. Chen SC, Chen FC, Li WH: Phosphorylated and nonphosphorylated serine and threonine residues evolve at different rates in mammals. Molecular biology and evolution 2010, 27(11):2548-2554.

35. Aivaliotis M, Macek B, Gnad F, Reichelt P, Mann M, Oesterhelt D: Ser/Thr/Tyr protein phosphorylation in the archaeon Halobacterium salinarum--a representative of the third domain of life. PLOS ONE 2009, 4(3):e4777.

36. Levy ED, Michnick SW, Landry CR: Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philosophical transactions of the Royal Society of London Series B, Biological sciences 2012, 367(1602):2594-2606.

37. Freschi L, Osseni M, Landry CR: Functional divergence and evolutionary turnover in mammalian phosphoproteomes. PLoS genetics 2014, 10(1):e1004062.

38. Villen J, Beausoleil SA, Gerber SA, Gygi SP: Large-scale phosphorylation analysis of mouse liver. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(5):1488-1493.

39. Wang X, Bian Y, Cheng K, Gu LF, Ye M, Zou H, Sun SS, He JX: A large-scale

phosphoregulatory networks in Arabidopsis. Journal of proteomics 2013, 78:486-498.

40. Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D, Dopazo J:

FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic acids research 2007, 35(Web Server issue):W91-96.

41. Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM, Stamatoyannopoulos JA: Exonic

transcription factor binding directs codon choice and affects protein evolution. Science 2013, 342(6164):1367-1372.

42. Li X, Zhu J, Hu F, Ge S, Ye M, Xiang H, Zhang G, Zheng X, Zhang H, Zhang S, Li Q, Luo R, Yu C, Yu J, Sun J, Zou X, Cao X, Xie X, Wang J, Wang W: Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression. BMC genomics 2012, 13:300.

43. Chodavarapu RK, Feng S, Ding B, Simon SA, Lopez D, Jia Y, Wang GL, Meyers BC, Jacobsen SE, Pellegrini M: Transcriptome and methylome interactions in rice hybrids. Proceedings of the National Academy of Sciences of the United States of America 2012, 109(30):12040-12045.

44. Ibarra CA, Feng X, Schoft VK, Hsieh TF, Uzawa R, Rodrigues JA, Zemach A, Chumak N, Machlicova A, Nishimura T, Rojas D, Fischer RL, Tamaru H,

Zilberman D: Active DNA demethylation in plant companion cells reinforces transposon methylation in gametes. Science 2012, 337(6100):1360-1364.

45. Chen PY, Cokus SJ, Pellegrini M: BS Seeker: precise mapping for bisulfite sequencing. BMC bioinformatics 2010, 11:203.

46. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES: Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 2008, 454(7205):766-770.

47. Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, Wei CL: Dynamic changes in the human methylome during differentiation. Genome research 2010, 20(3):320-331.

48. Nekrutenko A, Makova KD, Li WH: The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome research 2002, 12(1):198-202.

49. Loraine AE, McCormick S, Estrada A, Patel K, Qin P: RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. Plant

physiology 2013, 162(2):1092-1109.

50. Xi L, Feber A, Gupta V, Wu M, Bergemann AD, Landreneau RJ, Litle VR, Pennathur A, Luketich JD, Godfrey TE: Whole genome exon arrays identify differential expression of alternatively spliced, cancer-related genes in lung cancer. Nucleic acids research 2008, 36(20):6535-6547.

51. Roberts A, Pachter L: Streaming fragment assignment for real-time analysis of sequencing experiments. Nature methods 2013, 10(1):71-73.

52. Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996-2010.

53. Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20(13):2138-2139.

54. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 2004, 32(5):1792-1797.

55. Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution 2007, 24(8):1586-1591.

730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745

Figure legends Figure 1.

Pearson’s coefficients of correlation between mCG density and the CpG O/E ratio in (A) different methylome datasets (S1~S4); (B) first, last, and internal coding exons in different methylome datasets. ***: p < 0.001.

Figure 2.

The evolutionary rates (dN/dS ratio, dN, and dS) of first, last, and internal coding exons in different methylome datasets. The curves with stars indicate statistically significant difference. ***: p < 0.001, by Wilcoxon Rank Sum Test.

Figure 3.

The Spearman’s coefficients of correlation between mCG density and the dN/dS ratio, dN, and dS based on different methylome datasets. *: p < 0.05; **: p<0.01; ***:

p<0.001.

Figure 4.

The Pearson’s coefficient of correlations between mCG density and the CpG O/E ratio of first, last, and internal coding exons of five length subgroups (Subgroups 1~5) in the four analyzed sperm methylome datasets. Subgroup 1 includes the shortest and subgroup 5 includes the longest exons.

30

746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771

59

Tables

Table 1. The methylome datasets and the background exome dataset analyzed in this study. Note that the background dataset was not filtered for methylation data, but was filtered for the definitions of first, last, and internal exons and for the length threshold for calculations of evolutionary rates (see Methods). NA: not applicable. Note that the “Bisulfite Seq. read depth” is defined as the total length of the bisulfite

sequencing reads divided by the size of the A. thaliana genome.

Arabidopsis Symbol # Gene # Exon Bisulfte Seq.

read depth Average mCG density

(per 100 Sampled CpG) Average CpG density

(per 100 bp) # First/Last/Internal Average length

Col_wt S1 10152 12649 14 17.3 4.8 6132/3182/3335 765.4

Col_wt S2 14409 21230 97 16.3 4.7 9243/5316/6671 615.8

Col_wt S3 9666 11758 17 17.1 4.8 5848/2944/2966 783.5

Col_wt S4 14002 20199 65 16.7 4.7 8870/5085/6244 630.0

Col_0 Background 19500 79730 NA NA 3.2 13933/12570/53227 282.9 ± 325.0

772 773 774 775 776 777

778 779 780 781

Table 2. The Spearman’s coefficient of correlation (ρ) between exon length and the dN/dS ratio, dN, and dS before (upper row; “Original”) and after (lower row; “Control”) controlling for four potential confounding factors (ASE/CSE exon type, proprtion of repetitive elements/disordered regions, and exonic expression level). Note that this table is based on the background dataset.

ρ p-value ρ p-value ρ p-value

Original -0.004 0.6126 0.090 < 2.2e-16 0.156 < 2.2e-16

Control -0.004 0.41741 0.090 1.34E-68 0.146 0

Original 0.085 < 2.2e-16 0.139 < 2.2e-16 0.161 < 2.2e-16

Control 0.085 1.23E-68 0.140 9.64E-166 0.153 0

Original 0.187 < 2.2e-16 0.112 < 2.2e-16 0.039 < 2.2e-16

Control 0.188 0 0.113 1.32E-108 0.043 9.23E-67

First Exon Last Exon Internal Exon

dN/dS

dN

dS

32

782 783 784

785

63

Additional Files Additional File 1.

The median mCG densities of first, last, and internal coding exons in different methylation datasets. All pairwise differences between exon groups in each dataset are statistically significant (p < 0.001, by Wilcoxon Rank Sum Test).

Additional File 2.

The Spearman’s coefficients of correlation between mCG density and the dN/dS ratio, dN, and dS based on S1~S4 datasets after controlling for four potential confounding factors (CpG density, G+C content, exon length, ASE/CSE exon type). *: p < 0.05;

**: p<0.01; ***: p<0.001.

Additional File 3.

The Spearman’s coefficients of correlation and p values between exon length and dN/dS ratio, dN, and dS for first, last, and internal exons before (“Original”) and after (“Control”) controlling for five potential confounding factors (the ASE/CSE exon type, the proportion of repetitive elements/disordered regions, exonic expression level, and mCG density).

Additional File 4.

Pairwise comparison between length subgroups of internal coding exons in view of Gene Ontology functional categories. Note that here the “function” of an exon is the function of the gene it resides. The Y axis indicates the percentage of each length subgroup in a specific functional category. The table at the bottom shows whether the differences between subgroups are statistically significant. Lighter grey shading indicates that the former subgroup is relatively enriched. Darker grey shading 786

787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811

indicates the contrary. ***: p < 0.001. The numbers on top of the table indicate the percentages of genes in a specific functional category over all of the analyzed genes.

Additional File 5. The Spearman’s coefficients of correlation between mCG density and the dN/dS ratio, dN, and dS for different Gene Ontology functional categories (Level 3 of Molecular Function). *: p < 0.05; **: p<0.01; ***: p<0.001.

Additional File 6. The methylome datasets of rice (panicles: P1and P2; leaves: L).

Additional File 7. Pearson’s coefficients of correlation between mCG density and the CpG O/E ratio in (A) different methylome datasets (panicles: P1and P2; leaves: L);

(B) first, last, and internal coding exons in different methylome datasets of rice. ***:

p < 0.001.

Additional File 8. The evolutionary rates (dN/dS ratio, dN, and dS) of first, last, and internal coding exons in different methylome datasets of rice (panicles: P1and P2;

leaves: L). The curves with stars indicate statistically significant difference. **: p <

0.01; ***: p < 0.001, by Wilcoxon Rank Sum Test.

Additional File 9. The Spearman’s coefficients of correlation between mCG density and the dN/dS ratio, dN, and dS based on different methylome datasets of rice (panicles:

P1and P2; leaves: L). *: p < 0.05; **: p<0.01; ***: p<0.001.

34

812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836

67

相關文件