Characterizing the strand-specific distribution of
non-CpG methylation in human pluripotent cells
Weilong Guo
1, Wen-Yu Chung
2, Minping Qian
3,4, Matteo Pellegrini
5,* and
Michael Q. Zhang
1,2,*
1
Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Tsinghua University, Beijing
100084, China, 2Department of Molecular and Cell Biology, Center for Systems Biology, The University of
Texas at Dallas, Richardson, TX 75080, USA, 3LMAM, School of Mathematical Sciences, Peking University,
Beijing 100871, China, 4Center for Theoretical Biology, Peking University, Beijing 100871, China and
5
Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA
Received September 19, 2012; Revised October 25, 2013; Accepted November 22, 2013
ABSTRACT
DNA methylation is an important defense and regu-latory mechanism. In mammals, most DNA methy-lation occurs at CpG sites, and asymmetric non-CpG methylation has only been detected at appre-ciable levels in a few cell types. We are the first to systematically study the strand-specific distribu-tion of non-CpG methyladistribu-tion. With the divide-and-compare strategy, we show that CHG and CHH methylation are not intrinsically different in human
embryonic stem cells (ESCs) and induced
pluripotent stem cells (iPSCs). We also find that non-CpG methylation is skewed between the two strands in introns, especially at intron boundaries and in highly expressed genes. Controlling for the proximal sequences of non-CpG sites, we show that the skew of non-CpG methylation in in-trons is mainly guided by sequence skew. By
studying subgroups of transposable elements,
we also found that non-CpG methylation is
distributed in a strand-specific manner in both short interspersed nuclear elements (SINE) and long interspersed nuclear elements (LINE), but
not in long terminal repeats (LTR). Finally, we
show that on the antisense strand of Alus, a non-CpG site just downstream of the A-box is highly
methylated. Together, the divide-and-compare
strategy leads us to identify regions with strand-specific distributions of non-CpG methylation in humans.
INTRODUCTION
DNA methylation is a stable epigenetic mark that is important for gene expression regulation, transposon silencing, imprinting, X chromosome inactivation and
other diverse biological processes (1–4). Several techniques
have been developed to profile DNA methylomes (5).
Using genomic sequencing after bisulfite treatment, such
as MethylC-seq (6,7), methylated cytosines can be
detected at base pair resolution in a strand-specific manner. Currently, human methylomes generated by whole-genome bisulfite sequencing are available for
multiple cell types (8–14).
Mammalian DNA methylation occurs predominantly at CpG dinucleotides. By contrast, DNA methylation in plants is found frequently in both CpG and non-CpG
(CHG and CHH, where H is A, C or T) contexts (15,16).
Recent studies have revealed substantial non-CpG methy-lation in a few mammalian cell types, including ESC (8,9,17–19), iPSC (13,18), oocyte (20,21) and brain cells
(22,23). A comparative study among different human
ESC lines showed that the highly methylated non-CpG
sites were conserved at TACAG motif (17).
In Arabidopsis, CHG and CHH methylations are
maintained by CMT3 and DRM2, respectively (24). In
mammals, knockdown studies have shown that
non-CpGs may be methylated by DNMT3a/3b (25), but the
details of the establishment, maintenance and biological
function of non-CpG methylation are still unclear (3,26).
We compared the surrounding DNA motifs of CHG and CHH methylation patterns in human and showed that they were highly correlated in sequence context, indicating that the two methylation patterns are not intrinsically dif-ferent as is found in Arabidopsis.
*To whom correspondence should be addressed. Tel: +1 972 883 2523; Fax: +1 972 883 5710; Email: michael.zhang@utdallas.edu Correspondence may also be addressed to Matteo Pellegrini. Tel: +1 310 825 0012; Fax: +1 310 206 3987; Email: matteop@mcdb.ucla.edu Present address:
Wen-Yu Chung, Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan.
Nucleic Acids Research, 2013, 1–8 doi:10.1093/nar/gkt1306
ß The Author(s) 2013. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Nucleic Acids Research Advance Access published December 16, 2013
by guest on December 11, 2014
http://nar.oxfordjournals.org/
CpG sites are thought to be symmetrically methylated
on the two strands (9). In contrast, whether asymmetric
non-CpG sites can be strand-specifically methylated has
not been systematically studied. Lister et al. (8) have
shown that non-CpG sites on the antisense strand of a coding gene body appeared to be more highly methylated than those on the sense strand. To gain more detailed insights, we decomposed the genome bodies into different functional regions. We found the strand-specific non-CpG methylation in introns but not in exons. Specifically, intron boundaries showed more significant skew of non-CpG methylation than interior introns. We also show that highly transcribed genes tend to have higher skew scores. Using our sequence-decomposing method, we show that the methylation-prone pattern ACA is more enriched on the antisense strand of introns, indicating that the skewed non-CpG methylation is mainly guided by skewed sequences. Further, we examine strand-specific non-CpG methylation in different groups of transposons. We found that both short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs) have strand-specific non-CpG methylation. Using our sequence-decomposing method, we show that the strand specificity of both LINE and mammalian interspersed repetitive (MIR) elements can be explained by the skew of ACA sequences. We also find that the TACAG site on the anti-sense strand of Alu, which is right after the A-box of Alu elements, contributes the most to the skew of non-CpG methylation.
MATERIALS AND METHODS Divide-and-compare strategy
When investigating any property of a data set, we are interested whether the overall property is only represented by partial elements, or whether different subgroups have different properties. Here we proposed a ‘divide-and-compare’ strategy to evaluate whether a division is useful or redundant when evaluating certain properties of a data set. The ‘divide-and-compare’ strategy comprises two steps: the ‘divide’ step and ‘compare’ step. In the ‘divide’ step, the full data set is divided into several subgroups according to user-defined criteria. Then the properties of different subgroups are compared in the ‘compare’ step. If the subgroups have similar properties, the division is redundant for understanding the property. In contrast, if the subgroups have different properties, the division is useful to gain a clearer understanding of the
properties of the subgroups (Figure 1A).
Sequence-decomposing strategy for studying correlations between DNA methylation and sequences
The sequence-decomposing strategy is a specific case of the divide-and-compare strategy. We define the average methylation level as follows:
M ¼ 1 N XN i Mi,
where Mi is the methylation level for ith site measured
across N sites.
When decomposing sites into different words (k-mers), we have M ¼X w Mcw, where Mc
w, the contribution of the word w to the average
methylation level, is defined as follows:
Mcw¼Mw Nw Nall ¼ |{z}Mw methylation propensity |{z}Fw sequence frequency :
Using this decomposing analysis, we can separate the average methylation levels into two parts: the methylation
propensity (Mw) and the sequence frequency (Fw) for each
specific word w. The methylation propensity of a word w is the average methylation level of the specific word. Different words have different methylation propensities.
Average methylation levels of each 3-mer pattern (w in
the NCH context, where N 2 A,C,G,Tf gand H 2 fA,C,Tg)
were calculated for both strands, denoted as Mwjsense and
Mwjantisense, and the sequence frequencies are Fwjsense
and Fwjantisense, respectively. As a result, the average
methylation levels of non-CpG sites on one strand can
be expressed as Mstrand¼P w MwjstrandFwjstrand , where
strand 2 sense,antisensef g. The contribution of pattern w
to the average methylation level was defined as
Mc
wjstrand¼ MwjstrandFwjstrand. For each pattern w, the
contribution to the difference of methylation levels
between the two strands was defined as
Cw¼ Mc wjantisenseMcwjsense Mantisense Msense :
Sequence preference of CHG and CHH methylation A comparison between CHG and CHH contexts was carried out on 5-mers. A corresponding pair in both CHG and CHH contexts was defined as (xyChG, xyChH), where x, y can be A, C, G or T, h can be A, C or T and H indicates that A, C and T were considered collectively. For example, the pairs would be (AACAG, AACAH), (AACCG, AACCH), (AACTG, AACTH) and so forth. Average methylation levels of the 48 xyChG patterns and 48 xyChH patterns were calculated and ranked from high to low. Then Spearman’s rho and P-value were calculated based on the two lists.
Estimate of DNA methylation levels at single sites and gene regions
We selected two representative DNA methylomes in our study, H1 and ADS-iPSC. DNA methylomes of the
two cell lines were downloaded from Lister et al. (13)
(http://neomorph.salk.edu/ips_methylomes). The DNA methylome of Arabidopsis was obtained from Lister
et al. (6). To estimate reliable methylation levels, we
only used cytosines with coverage 10X. The
methyla-tion level of each cytosine was calculated as
2 Nucleic Acids Research, 2013
by guest on December 11, 2014
http://nar.oxfordjournals.org/
eukaryotes (31). The core non-CpG methylation pattern ACA enriched on the antisense strand in human introns (Figure 2e) corresponds to the AT-skew phenomenon
(32,33). In eukaryotes, AT-skew is thought to be
coupled with gene transcription (34) and splicing (33).
As the bias is stronger at the extremities of introns than
their interior, Zhang et al. (32) attributed such DNA
strand asymmetry to the selection pressure on splicing enhancers or silencers. This study suggests that there is a potential epigenetic pressure on the asymmetric sequence as well, especially in introns.
That the skewed non-CpG methylation in introns is correlated with skewed sequences and transcriptional levels could be the result of their coevolution, and these retained characters may be biologically favorable. Non-CpG methylation is known to be enriched in germ cells
(20) and ES cells (8), and it is possible that mutations in
these cell lines are increased as a result of non-CpG lation. That the skew of sequences and non-CpG methy-lation are conserved and correlated could be the result of reciprocal benefit. Because the non-CpG methylation is associated with Dnmt3a/Dnmt3b and Dnmt3L, and is in-dependent of Dnmt1, the lack of non-CpG methylation in somatic cells may be partly caused by much lower levels of
Dnmt3L compared with ESCs (25). The sequence
prefer-ences and strand-specific distribution of non-CpG methy-lation in ESCs and iPSCs could be dependent on the properties of Dnmt3L. Also, a recent study showed that the GC-skew in promoters leads to the formation of R-loops that protected the region from being methylated
(35), providing evidence that asymmetric sequences could
influence the regulation of DNA methylation.
Finally, we found that the 25th site on the antisense
strand of Alus from the 50 end is within a non-CpG
context and prone to be methylated, and the position is right after the A-box of Alu elements. As the A-box and B-box of Alus are promoters of RNA polymerase III for
the transcription of Alus (27), the methylation of the 25th
site potentially affects the transcription of Alu elements. As non-CpG methylation is found to be specifically enriched in embryonic cell lines and oocytes, which are germ line cell types, the high methylation levels of the 25th sites of Alus could be responsible for the silencing of the activities of Alu elements.
SUPPLEMENTARY DATA
Supplementary Dataare available at NAR Online.
ACKNOWLEDGEMENTS
The authors thank Juntao Gao for helps in editing the
manuscript and discussions. They thank Monica
Sleumer, Xiaowo Wang and other members of the Center for Synthetic and Systems Biology at Tsinghua University for helpful discussions. They thank Bing Ren and Wei Xie for discussion on some of the results. They also thank Davide Carnevali for help with the analysis of Alu elements.
FUNDING
National Basic Research Program of China
[2012CB316503]; NSFC [91010016]; National Institutes of Health (NIH) [ES017166 to M.Q.Z.]. Funding for open access charge: National Basic Research Program of China [2013CB316503].
Conflict of interest statement. None declared.
REFERENCES
1. Bird,A. (2002) DNA methylation patterns and epigenetic memory. Genes Dev., 16, 6–21.
2. Cedar,H. and Bergman,Y. (2012) Programming of DNA methylation patterns. Annu. Rev. Biochem., 81, 97–117. 3. Jones,P.A. (2012) Functions of DNA methylation: islands, start
sites, gene bodies and beyond. Nat. Rev. Genet., 13, 484–492. 4. Smallwood,S.A. and Kelsey,G. (2012) De novo DNA
methylation: a germ cell perspective. Trends Genet., 28, 33–42. 5. Laird,P.W. (2010) Principles and challenges of genomewide DNA
methylation analysis. Nat. Rev. Genet., 11, 191–203. 6. Lister,R., O’Malley,R.C., Tonti-Filippini,J., Gregory,B.D.,
Berry,C.C., Millar,A.H. and Ecker,J.R. (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell, 133, 523–536.
7. Cokus,S.J., Feng,S., Zhang,X., Chen,Z., Merriman,B., Haudenschild,C.D., Pradhan,S., Nelson,S.F., Pellegrini,M. and Jacobsen,S.E. (2008) Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature, 452, 215–219.
8. Lister,R., Pelizzola,M., Dowen,R.H., Hawkins,R.D., Hon,G., Tonti-Filippini,J., Nery,J.R., Lee,L., Ye,Z., Ngo,Q.M. et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322. 9. Laurent,L., Wong,E., Li,G., Huynh,T., Tsirigos,A., Ong,C.T.,
Low,H.M., Kin Sung,K.W., Rigoutsos,I., Loring,J. et al. (2010) Dynamic changes in the human methylome during differentiation. Genome Res., 20, 320–331.
10. Chodavarapu,R.K., Feng,S., Bernatavichute,Y.V., Chen,P.Y., Stroud,H., Yu,Y., Hetzel,J.A., Kuo,F., Kim,J., Cokus,S.J. et al. (2010) Relationship between nucleosome positioning and DNA methylation. Nature, 466, 388–392.
11. Molaro,A., Hodges,E., Fang,F., Song,Q., McCombie,W.R., Hannon,G.J. and Smith,A.D. (2011) Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates. Cell, 146, 1029–1041.
12. Hodges,E., Molaro,A., Dos Santos,C.O., Thekkat,P., Song,Q., Uren,P.J., Park,J., Butler,J., Rafii,S., McCombie,W.R. et al. (2011) Directional DNA methylation changes and complex intermediate states accompany lineage specificity in the adult hematopoietic compartment. Mol. Cell, 44, 17–28.
13. Lister,R., Pelizzola,M., Kida,Y.S., Hawkins,R.D., Nery,J.R., Hon,G., Antosiewicz-Bourget,J., O’Malley,R., Castanon,R., Klugman,S. et al. (2011) Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature, 471, 68–73.
14. Ziller,M.J., Gu,H., Muller,F., Donaghey,J., Tsai,L.T., Kohlbacher,O., De Jager,P.L., Rosen,E.D., Bennett,D.A., Bernstein,B.E. et al. (2013) Charting a dynamic DNA methylation landscape of the human genome. Nature, 500, 477–481.
15. Zemach,A., McDaniel,I.E., Silva,P. and Zilberman,D. (2010) Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science, 328, 916–919.
16. Feng,S., Cokus,S.J., Zhang,X., Chen,P.Y., Bostick,M., Goll,M.G., Hetzel,J., Jain,J., Strauss,S.H., Halpern,M.E. et al. (2010) Conservation and divergence of methylation patterning in plants and animals. Proc. Natl Acad. Sci. USA, 107, 8689–8694. 17. Chen,P.Y., Feng,S., Joo,J.W., Jacobsen,S.E. and Pellegrini,M.
(2011) A comparative analysis of DNA methylation across human embryonic stem cell lines. Genome Biol., 12, R62.
Nucleic Acids Research, 2013 7
by guest on December 11, 2014
http://nar.oxfordjournals.org/
18. Ziller,M.J., Muller,F., Liao,J., Zhang,Y., Gu,H., Bock,C., Boyle,P., Epstein,C.B., Bernstein,B.E., Lengauer,T. et al. (2011) Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet., 7, e1002389. 19. Ramsahoye,B.H., Biniszkiewicz,D., Lyko,F., Clark,V., Bird,A.P.
and Jaenisch,R. (2000) Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad. Sci. USA, 97, 5237–5242.
20. Tomizawa,S., Kobayashi,H., Watanabe,T., Andrews,S., Hata,K., Kelsey,G. and Sasaki,H. (2011) Dynamic stage-specific changes in imprinted differentially methylated regions during early
mammalian development and prevalence of non-CpG methylation in oocytes. Development, 138, 811–820.
21. Kobayashi,H., Sakurai,T., Imai,M., Takahashi,N., Fukuda,A., Yayoi,O., Sato,S., Nakabayashi,K., Hata,K., Sotomaru,Y. et al. (2012) Contribution of intragenic DNA methylation in mouse gametic DNA methylomes to establish oocyte-specific heritable marks. PLoS Genet., 8, e1002440.
22. Xie,W., Barr,C.L., Kim,A., Yue,F., Lee,A.Y., Eubanks,J., Dempster,E.L. and Ren,B. (2012) Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell, 148, 816–831.
23. Lister,R., Mukamel,E.A., Nery,J.R., Urich,M., Puddifoot,C.A., Johnson,N.D., Lucero,J., Huang,Y., Dwork,A.J., Schultz,M.D. et al. (2013) Global epigenomic reconfiguration during mammalian brain development. Science, 341, 1237905.
24. Law,J.A. and Jacobsen,S.E. (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet., 11, 204–220.
25. Arand,J., Spieler,D., Karius,T., Branco,M.R., Meilinger,D., Meissner,A., Jenuwein,T., Xu,G., Leonhardt,H., Wolf,V. et al. (2012) In Vivo control of CpG and Non-CpG DNA methylation by DNA methyltransferases. PLoS Genet., 8, e1002750.
26. Dyachenko,O.V., Schevchuk,T.V., Kretzner,L., Buryanov,Y.I. and Smith,S.S. (2010) Human non-CG methylation: are human stem cells plant-like? Epigenetics, 5, 569–572.
27. Sela,N., Kim,E. and Ast,G. (2010) The role of transposable elements in the evolution of non-mammalian vertebrates and invertebrates. Genome Biol., 11, R59.
28. Stadler,M.B., Murr,R., Burger,L., Ivanek,R., Lienert,F., Scholer,A., van Nimwegen,E., Wirbelauer,C., Oakeley,E.J., Gaidatzis,D. et al. (2011) DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature, 480, 490–495. 29. Nestor,C., Ruzov,A., Meehan,R. and Dunican,D. (2010)
Enzymatic approaches and bisulfite sequencing cannot distinguish between 5-methylcytosine and 5-hydroxymethylcytosine in DNA. Biotechniques, 48, 317–319.
30. Yu,M., Hon,G.C., Szulwach,K.E., Song,C.X., Zhang,L., Kim,A., Li,X., Dai,Q., Shen,Y., Park,B. et al. (2012) Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell, 149, 1368–1380.
31. Green,P., Ewing,B., Miller,W., Thomas,P.J. and Green,E.D. (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat. Genet., 33, 514–517.
32. Zhang,C., Li,W.H., Krainer,A.R. and Zhang,M.Q. (2008) RNA landscape of evolution for optimal exon and intron
discrimination. Proc. Natl Acad. Sci. USA, 105, 5797–5802. 33. Touchon,M., Arneodo,A., d’Aubenton-Carafa,Y. and Thermes,C.
(2004) Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. Nucleic Acids Res., 32, 4969–4978.
34. Touchon,M., Nicolay,S., Arneodo,A., d’Aubenton-Carafa,Y. and Thermes,C. (2003) Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett., 555, 579–582. 35. Ginno,P.A., Lott,P.L., Christensen,H.C., Korf,I. and Chedin,F.
(2012) R-loop formation is a distinctive characteristic of
unmethylated human CpG island promoters. Mol. Cell, 45, 814–825.
8 Nucleic Acids Research, 2013
by guest on December 11, 2014
http://nar.oxfordjournals.org/