• 沒有找到結果。

Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells

N/A
N/A
Protected

Academic year: 2021

Share "Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

Characterizing the strand-specific distribution of

non-CpG methylation in human pluripotent cells

Weilong Guo

1

, Wen-Yu Chung

2

, Minping Qian

3,4

, Matteo Pellegrini

5,

* and

Michael Q. Zhang

1,2,

*

1

Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Tsinghua University, Beijing

100084, China, 2Department of Molecular and Cell Biology, Center for Systems Biology, The University of

Texas at Dallas, Richardson, TX 75080, USA, 3LMAM, School of Mathematical Sciences, Peking University,

Beijing 100871, China, 4Center for Theoretical Biology, Peking University, Beijing 100871, China and

5

Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA

Received September 19, 2012; Revised October 25, 2013; Accepted November 22, 2013

ABSTRACT

DNA methylation is an important defense and regu-latory mechanism. In mammals, most DNA methy-lation occurs at CpG sites, and asymmetric non-CpG methylation has only been detected at appre-ciable levels in a few cell types. We are the first to systematically study the strand-specific distribu-tion of non-CpG methyladistribu-tion. With the divide-and-compare strategy, we show that CHG and CHH methylation are not intrinsically different in human

embryonic stem cells (ESCs) and induced

pluripotent stem cells (iPSCs). We also find that non-CpG methylation is skewed between the two strands in introns, especially at intron boundaries and in highly expressed genes. Controlling for the proximal sequences of non-CpG sites, we show that the skew of non-CpG methylation in in-trons is mainly guided by sequence skew. By

studying subgroups of transposable elements,

we also found that non-CpG methylation is

distributed in a strand-specific manner in both short interspersed nuclear elements (SINE) and long interspersed nuclear elements (LINE), but

not in long terminal repeats (LTR). Finally, we

show that on the antisense strand of Alus, a non-CpG site just downstream of the A-box is highly

methylated. Together, the divide-and-compare

strategy leads us to identify regions with strand-specific distributions of non-CpG methylation in humans.

INTRODUCTION

DNA methylation is a stable epigenetic mark that is important for gene expression regulation, transposon silencing, imprinting, X chromosome inactivation and

other diverse biological processes (1–4). Several techniques

have been developed to profile DNA methylomes (5).

Using genomic sequencing after bisulfite treatment, such

as MethylC-seq (6,7), methylated cytosines can be

detected at base pair resolution in a strand-specific manner. Currently, human methylomes generated by whole-genome bisulfite sequencing are available for

multiple cell types (8–14).

Mammalian DNA methylation occurs predominantly at CpG dinucleotides. By contrast, DNA methylation in plants is found frequently in both CpG and non-CpG

(CHG and CHH, where H is A, C or T) contexts (15,16).

Recent studies have revealed substantial non-CpG methy-lation in a few mammalian cell types, including ESC (8,9,17–19), iPSC (13,18), oocyte (20,21) and brain cells

(22,23). A comparative study among different human

ESC lines showed that the highly methylated non-CpG

sites were conserved at TACAG motif (17).

In Arabidopsis, CHG and CHH methylations are

maintained by CMT3 and DRM2, respectively (24). In

mammals, knockdown studies have shown that

non-CpGs may be methylated by DNMT3a/3b (25), but the

details of the establishment, maintenance and biological

function of non-CpG methylation are still unclear (3,26).

We compared the surrounding DNA motifs of CHG and CHH methylation patterns in human and showed that they were highly correlated in sequence context, indicating that the two methylation patterns are not intrinsically dif-ferent as is found in Arabidopsis.

*To whom correspondence should be addressed. Tel: +1 972 883 2523; Fax: +1 972 883 5710; Email: michael.zhang@utdallas.edu Correspondence may also be addressed to Matteo Pellegrini. Tel: +1 310 825 0012; Fax: +1 310 206 3987; Email: matteop@mcdb.ucla.edu Present address:

Wen-Yu Chung, Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan.

Nucleic Acids Research, 2013, 1–8 doi:10.1093/nar/gkt1306

ß The Author(s) 2013. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Nucleic Acids Research Advance Access published December 16, 2013

by guest on December 11, 2014

http://nar.oxfordjournals.org/

(2)

CpG sites are thought to be symmetrically methylated

on the two strands (9). In contrast, whether asymmetric

non-CpG sites can be strand-specifically methylated has

not been systematically studied. Lister et al. (8) have

shown that non-CpG sites on the antisense strand of a coding gene body appeared to be more highly methylated than those on the sense strand. To gain more detailed insights, we decomposed the genome bodies into different functional regions. We found the strand-specific non-CpG methylation in introns but not in exons. Specifically, intron boundaries showed more significant skew of non-CpG methylation than interior introns. We also show that highly transcribed genes tend to have higher skew scores. Using our sequence-decomposing method, we show that the methylation-prone pattern ACA is more enriched on the antisense strand of introns, indicating that the skewed non-CpG methylation is mainly guided by skewed sequences. Further, we examine strand-specific non-CpG methylation in different groups of transposons. We found that both short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs) have strand-specific non-CpG methylation. Using our sequence-decomposing method, we show that the strand specificity of both LINE and mammalian interspersed repetitive (MIR) elements can be explained by the skew of ACA sequences. We also find that the TACAG site on the anti-sense strand of Alu, which is right after the A-box of Alu elements, contributes the most to the skew of non-CpG methylation.

MATERIALS AND METHODS Divide-and-compare strategy

When investigating any property of a data set, we are interested whether the overall property is only represented by partial elements, or whether different subgroups have different properties. Here we proposed a ‘divide-and-compare’ strategy to evaluate whether a division is useful or redundant when evaluating certain properties of a data set. The ‘divide-and-compare’ strategy comprises two steps: the ‘divide’ step and ‘compare’ step. In the ‘divide’ step, the full data set is divided into several subgroups according to user-defined criteria. Then the properties of different subgroups are compared in the ‘compare’ step. If the subgroups have similar properties, the division is redundant for understanding the property. In contrast, if the subgroups have different properties, the division is useful to gain a clearer understanding of the

properties of the subgroups (Figure 1A).

Sequence-decomposing strategy for studying correlations between DNA methylation and sequences

The sequence-decomposing strategy is a specific case of the divide-and-compare strategy. We define the average methylation level as follows:

 M ¼ 1 N XN i Mi,

where Mi is the methylation level for ith site measured

across N sites.

When decomposing sites into different words (k-mers), we have  M ¼X w Mcw, where Mc

w, the contribution of the word w to the average

methylation level, is defined as follows:

Mcw¼Mw Nw Nall ¼ |{z}Mw methylation propensity  |{z}Fw sequence frequency :

Using this decomposing analysis, we can separate the average methylation levels into two parts: the methylation

propensity (Mw) and the sequence frequency (Fw) for each

specific word w. The methylation propensity of a word w is the average methylation level of the specific word. Different words have different methylation propensities.

Average methylation levels of each 3-mer pattern (w in

the NCH context, where N 2 A,C,G,Tf gand H 2 fA,C,Tg)

were calculated for both strands, denoted as Mwjsense and



Mwjantisense, and the sequence frequencies are Fwjsense

and Fwjantisense, respectively. As a result, the average

methylation levels of non-CpG sites on one strand can

be expressed as Mstrand¼P w  MwjstrandFwjstrand   , where

strand 2 sense,antisensef g. The contribution of pattern w

to the average methylation level was defined as

Mc

wjstrand¼ MwjstrandFwjstrand. For each pattern w, the

contribution to the difference of methylation levels

between the two strands was defined as

Cw¼ Mc wjantisenseMcwjsense  Mantisense Msense :

Sequence preference of CHG and CHH methylation A comparison between CHG and CHH contexts was carried out on 5-mers. A corresponding pair in both CHG and CHH contexts was defined as (xyChG, xyChH), where x, y can be A, C, G or T, h can be A, C or T and H indicates that A, C and T were considered collectively. For example, the pairs would be (AACAG, AACAH), (AACCG, AACCH), (AACTG, AACTH) and so forth. Average methylation levels of the 48 xyChG patterns and 48 xyChH patterns were calculated and ranked from high to low. Then Spearman’s rho and P-value were calculated based on the two lists.

Estimate of DNA methylation levels at single sites and gene regions

We selected two representative DNA methylomes in our study, H1 and ADS-iPSC. DNA methylomes of the

two cell lines were downloaded from Lister et al. (13)

(http://neomorph.salk.edu/ips_methylomes). The DNA methylome of Arabidopsis was obtained from Lister

et al. (6). To estimate reliable methylation levels, we

only used cytosines with coverage 10X. The

methyla-tion level of each cytosine was calculated as

2 Nucleic Acids Research, 2013

by guest on December 11, 2014

http://nar.oxfordjournals.org/

(3)

eukaryotes (31). The core non-CpG methylation pattern ACA enriched on the antisense strand in human introns (Figure 2e) corresponds to the AT-skew phenomenon

(32,33). In eukaryotes, AT-skew is thought to be

coupled with gene transcription (34) and splicing (33).

As the bias is stronger at the extremities of introns than

their interior, Zhang et al. (32) attributed such DNA

strand asymmetry to the selection pressure on splicing enhancers or silencers. This study suggests that there is a potential epigenetic pressure on the asymmetric sequence as well, especially in introns.

That the skewed non-CpG methylation in introns is correlated with skewed sequences and transcriptional levels could be the result of their coevolution, and these retained characters may be biologically favorable. Non-CpG methylation is known to be enriched in germ cells

(20) and ES cells (8), and it is possible that mutations in

these cell lines are increased as a result of non-CpG lation. That the skew of sequences and non-CpG methy-lation are conserved and correlated could be the result of reciprocal benefit. Because the non-CpG methylation is associated with Dnmt3a/Dnmt3b and Dnmt3L, and is in-dependent of Dnmt1, the lack of non-CpG methylation in somatic cells may be partly caused by much lower levels of

Dnmt3L compared with ESCs (25). The sequence

prefer-ences and strand-specific distribution of non-CpG methy-lation in ESCs and iPSCs could be dependent on the properties of Dnmt3L. Also, a recent study showed that the GC-skew in promoters leads to the formation of R-loops that protected the region from being methylated

(35), providing evidence that asymmetric sequences could

influence the regulation of DNA methylation.

Finally, we found that the 25th site on the antisense

strand of Alus from the 50 end is within a non-CpG

context and prone to be methylated, and the position is right after the A-box of Alu elements. As the A-box and B-box of Alus are promoters of RNA polymerase III for

the transcription of Alus (27), the methylation of the 25th

site potentially affects the transcription of Alu elements. As non-CpG methylation is found to be specifically enriched in embryonic cell lines and oocytes, which are germ line cell types, the high methylation levels of the 25th sites of Alus could be responsible for the silencing of the activities of Alu elements.

SUPPLEMENTARY DATA

Supplementary Dataare available at NAR Online.

ACKNOWLEDGEMENTS

The authors thank Juntao Gao for helps in editing the

manuscript and discussions. They thank Monica

Sleumer, Xiaowo Wang and other members of the Center for Synthetic and Systems Biology at Tsinghua University for helpful discussions. They thank Bing Ren and Wei Xie for discussion on some of the results. They also thank Davide Carnevali for help with the analysis of Alu elements.

FUNDING

National Basic Research Program of China

[2012CB316503]; NSFC [91010016]; National Institutes of Health (NIH) [ES017166 to M.Q.Z.]. Funding for open access charge: National Basic Research Program of China [2013CB316503].

Conflict of interest statement. None declared.

REFERENCES

1. Bird,A. (2002) DNA methylation patterns and epigenetic memory. Genes Dev., 16, 6–21.

2. Cedar,H. and Bergman,Y. (2012) Programming of DNA methylation patterns. Annu. Rev. Biochem., 81, 97–117. 3. Jones,P.A. (2012) Functions of DNA methylation: islands, start

sites, gene bodies and beyond. Nat. Rev. Genet., 13, 484–492. 4. Smallwood,S.A. and Kelsey,G. (2012) De novo DNA

methylation: a germ cell perspective. Trends Genet., 28, 33–42. 5. Laird,P.W. (2010) Principles and challenges of genomewide DNA

methylation analysis. Nat. Rev. Genet., 11, 191–203. 6. Lister,R., O’Malley,R.C., Tonti-Filippini,J., Gregory,B.D.,

Berry,C.C., Millar,A.H. and Ecker,J.R. (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell, 133, 523–536.

7. Cokus,S.J., Feng,S., Zhang,X., Chen,Z., Merriman,B., Haudenschild,C.D., Pradhan,S., Nelson,S.F., Pellegrini,M. and Jacobsen,S.E. (2008) Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature, 452, 215–219.

8. Lister,R., Pelizzola,M., Dowen,R.H., Hawkins,R.D., Hon,G., Tonti-Filippini,J., Nery,J.R., Lee,L., Ye,Z., Ngo,Q.M. et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322. 9. Laurent,L., Wong,E., Li,G., Huynh,T., Tsirigos,A., Ong,C.T.,

Low,H.M., Kin Sung,K.W., Rigoutsos,I., Loring,J. et al. (2010) Dynamic changes in the human methylome during differentiation. Genome Res., 20, 320–331.

10. Chodavarapu,R.K., Feng,S., Bernatavichute,Y.V., Chen,P.Y., Stroud,H., Yu,Y., Hetzel,J.A., Kuo,F., Kim,J., Cokus,S.J. et al. (2010) Relationship between nucleosome positioning and DNA methylation. Nature, 466, 388–392.

11. Molaro,A., Hodges,E., Fang,F., Song,Q., McCombie,W.R., Hannon,G.J. and Smith,A.D. (2011) Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates. Cell, 146, 1029–1041.

12. Hodges,E., Molaro,A., Dos Santos,C.O., Thekkat,P., Song,Q., Uren,P.J., Park,J., Butler,J., Rafii,S., McCombie,W.R. et al. (2011) Directional DNA methylation changes and complex intermediate states accompany lineage specificity in the adult hematopoietic compartment. Mol. Cell, 44, 17–28.

13. Lister,R., Pelizzola,M., Kida,Y.S., Hawkins,R.D., Nery,J.R., Hon,G., Antosiewicz-Bourget,J., O’Malley,R., Castanon,R., Klugman,S. et al. (2011) Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature, 471, 68–73.

14. Ziller,M.J., Gu,H., Muller,F., Donaghey,J., Tsai,L.T., Kohlbacher,O., De Jager,P.L., Rosen,E.D., Bennett,D.A., Bernstein,B.E. et al. (2013) Charting a dynamic DNA methylation landscape of the human genome. Nature, 500, 477–481.

15. Zemach,A., McDaniel,I.E., Silva,P. and Zilberman,D. (2010) Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science, 328, 916–919.

16. Feng,S., Cokus,S.J., Zhang,X., Chen,P.Y., Bostick,M., Goll,M.G., Hetzel,J., Jain,J., Strauss,S.H., Halpern,M.E. et al. (2010) Conservation and divergence of methylation patterning in plants and animals. Proc. Natl Acad. Sci. USA, 107, 8689–8694. 17. Chen,P.Y., Feng,S., Joo,J.W., Jacobsen,S.E. and Pellegrini,M.

(2011) A comparative analysis of DNA methylation across human embryonic stem cell lines. Genome Biol., 12, R62.

Nucleic Acids Research, 2013 7

by guest on December 11, 2014

http://nar.oxfordjournals.org/

(4)

18. Ziller,M.J., Muller,F., Liao,J., Zhang,Y., Gu,H., Bock,C., Boyle,P., Epstein,C.B., Bernstein,B.E., Lengauer,T. et al. (2011) Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet., 7, e1002389. 19. Ramsahoye,B.H., Biniszkiewicz,D., Lyko,F., Clark,V., Bird,A.P.

and Jaenisch,R. (2000) Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad. Sci. USA, 97, 5237–5242.

20. Tomizawa,S., Kobayashi,H., Watanabe,T., Andrews,S., Hata,K., Kelsey,G. and Sasaki,H. (2011) Dynamic stage-specific changes in imprinted differentially methylated regions during early

mammalian development and prevalence of non-CpG methylation in oocytes. Development, 138, 811–820.

21. Kobayashi,H., Sakurai,T., Imai,M., Takahashi,N., Fukuda,A., Yayoi,O., Sato,S., Nakabayashi,K., Hata,K., Sotomaru,Y. et al. (2012) Contribution of intragenic DNA methylation in mouse gametic DNA methylomes to establish oocyte-specific heritable marks. PLoS Genet., 8, e1002440.

22. Xie,W., Barr,C.L., Kim,A., Yue,F., Lee,A.Y., Eubanks,J., Dempster,E.L. and Ren,B. (2012) Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell, 148, 816–831.

23. Lister,R., Mukamel,E.A., Nery,J.R., Urich,M., Puddifoot,C.A., Johnson,N.D., Lucero,J., Huang,Y., Dwork,A.J., Schultz,M.D. et al. (2013) Global epigenomic reconfiguration during mammalian brain development. Science, 341, 1237905.

24. Law,J.A. and Jacobsen,S.E. (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet., 11, 204–220.

25. Arand,J., Spieler,D., Karius,T., Branco,M.R., Meilinger,D., Meissner,A., Jenuwein,T., Xu,G., Leonhardt,H., Wolf,V. et al. (2012) In Vivo control of CpG and Non-CpG DNA methylation by DNA methyltransferases. PLoS Genet., 8, e1002750.

26. Dyachenko,O.V., Schevchuk,T.V., Kretzner,L., Buryanov,Y.I. and Smith,S.S. (2010) Human non-CG methylation: are human stem cells plant-like? Epigenetics, 5, 569–572.

27. Sela,N., Kim,E. and Ast,G. (2010) The role of transposable elements in the evolution of non-mammalian vertebrates and invertebrates. Genome Biol., 11, R59.

28. Stadler,M.B., Murr,R., Burger,L., Ivanek,R., Lienert,F., Scholer,A., van Nimwegen,E., Wirbelauer,C., Oakeley,E.J., Gaidatzis,D. et al. (2011) DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature, 480, 490–495. 29. Nestor,C., Ruzov,A., Meehan,R. and Dunican,D. (2010)

Enzymatic approaches and bisulfite sequencing cannot distinguish between 5-methylcytosine and 5-hydroxymethylcytosine in DNA. Biotechniques, 48, 317–319.

30. Yu,M., Hon,G.C., Szulwach,K.E., Song,C.X., Zhang,L., Kim,A., Li,X., Dai,Q., Shen,Y., Park,B. et al. (2012) Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell, 149, 1368–1380.

31. Green,P., Ewing,B., Miller,W., Thomas,P.J. and Green,E.D. (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat. Genet., 33, 514–517.

32. Zhang,C., Li,W.H., Krainer,A.R. and Zhang,M.Q. (2008) RNA landscape of evolution for optimal exon and intron

discrimination. Proc. Natl Acad. Sci. USA, 105, 5797–5802. 33. Touchon,M., Arneodo,A., d’Aubenton-Carafa,Y. and Thermes,C.

(2004) Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. Nucleic Acids Res., 32, 4969–4978.

34. Touchon,M., Nicolay,S., Arneodo,A., d’Aubenton-Carafa,Y. and Thermes,C. (2003) Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett., 555, 579–582. 35. Ginno,P.A., Lott,P.L., Christensen,H.C., Korf,I. and Chedin,F.

(2012) R-loop formation is a distinctive characteristic of

unmethylated human CpG island promoters. Mol. Cell, 45, 814–825.

8 Nucleic Acids Research, 2013

by guest on December 11, 2014

http://nar.oxfordjournals.org/

參考文獻

相關文件

Milk and cream, in powder, granule or other solid form, of a fat content, by weight, exceeding 1.5%, not containing added sugar or other sweetening matter.

(c) Draw the graph of as a function of and draw the secant lines whose slopes are the average velocities in part (a) and the tangent line whose slope is the instantaneous velocity

• Cell: A unit of main memory (typically 8 bits which is one byte).. – Most significant bit: the bit at the left (high- order) end of the conceptual row of bits in a

If the points line on the 45 o line then the skewness and excess kurtosis seen in the stochastic residuals is the same as that of a standard normal distribution, which indicates

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

In the algorithm, the cell averages in the resulting slightly non-uniform grid is updated by employing a finite volume method based on a wave- propagation formulation, which is very

The first row shows the eyespot with white inner ring, black middle ring, and yellow outer ring in Bicyclus anynana.. The second row provides the eyespot with black inner ring

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it