Research Purposes - Background and Significance

CHAPTER 1. Background and Significance

1.10 Research Purposes

As mentioned above, the plastomes of cupressophytes are highly rearranged.

Therefore, they are ideal materials for studying the following two projects.

First, we aimed to propose a new strategy for the identification and evolutionary study of nupts in yews. Previously, most of the studies on nupts were based on a comparison between nuclear and plastid genomes. This approach, however, is impractical because the nuclear genome sizes of cupressophytes are huge (ranging from 6.3 to 20 Gb/1C; Murray BG., 1998). In this study, we propose an approach to study

nupts based on comparative plastomes. A proof-of-concept study using yew plastomes

is described (in Chapter 2 below).

Second, we aimed to understand the evolutionary effect of the plastomic rearrangements. Two questions were asked: (1) Do plastomic rearrangements alter the transcription, translation, or end products of plastomes? And (2) Can these

rearrangements disrupt any functional operons and create new gene-clusters? To date, consequences of genomic rearrangements are still poorly understood in gymnosperms.

Sciadopityaceae provides a unique opportunity to address these questions as its plastome is highly rearranged. Here we used experimental data to approach these two questions (Chapter 3).

CHAPTER 2 Ancient Nuclear Plastid DNA in the Yew Family

2.1 Introduction

Previous comparative genomic studies indicated that on average about 14% of the nuclear-encoded proteins were acquired from the cyanobacterial ancestor of plastids (Deusch et al., 2008). Transgenic experiments also demonstrated a high frequency of plastid-to-nucleus transfers with one event per 11,000 pollen grains or per 273,000 ovules (Sheppard et al., 2008).

Nupts have been discovered in a large number of plant species (Smith et al., 2011).

Nupts can contribute to nuclear exonic sequences (Noutsos et al., 2007) and play an

important role in plant evolution. Nupts may be initially inserted close to centromeres and then fragmented and distributed by transposable elements (Michalovova et al., 2013). The amount of nupts in plants is associated with the nuclear genome size and the number of plastids per cell (Smith et al., 2011; Yoshida et al., 2014).

Studies on nupts remains limited to plant species that have complete sequences of both nuclear and plastid genomes. In nuclear genomes, nupt rearrangements may resemble that of plastomes or consisted of mosaic DNA derived from both plastids and mitochondria (Leister, 2005; Noutsos et al., 2005). Notably, a 131-kb nupt of rice was found to harbor a 12.4-kb inversion, which was likely the ancestral characters in the plastome before the transfer (Huang et al., 2005). Recently, Rousseau-Gueutin et al.

(2011) proposed a PCR-based method to amplify nupts containing a specific ancestral sequence that was deleted from the plastomes of viable offspring. Hence, ancestral plastomic characteristics, such as unique indels and gene orders of specific fragments,

may be retained in nupts. Construction of an ancestral plastomic organization should yield valuable clues to retrieve nupts. If a plastomic inversion can distinguish an ancestral plastome from its current counterpart, appropriate primers based on the ancestral plastomic organization should be able to amplify the corresponding nupts that were transferred to the nucleus before the inversion (Figure 3).

Although the first known nupt was identified more than 3 decades ago (Timmis and Scott, 1983), nupts of gymnosperms still remain poorly studied. Conifers, the most diverse gymnosperm group, possess huge nuclear genomes ranging from 8.3 to 64.3 pg (2C) (reviewed in Wang and Ran, 2014) and may have integrated many nupts. The plastomes of conifers are highly rearranged, possibly due to their common loss of a pair of large inverted repeats (Wicke et al., 2011). Numerous plastomic rearrangements have been identified and are useful in reconstructing phylogenetic relationships among taxa and inferring intermediate ancestral plastomes (Wu and Chaw, 2014). Therefore, the conifer plastomes are well suited for evaluating the feasibility of retrieving nupts and surveying their evolution.

In this study, we aim to demonstrate our approach (Figure 3) for mining nupts in yews, and to continue the understanding of the plastome evolution in conifers. To better reconstruct ancestral plastomes of yews, we sequenced two complete plastomes, one from each of the yew genera Amentotaxus and Taxus. The primers based on the recovered ancestral plastomic organization were used to amplify potential nupts. The origins of obtained nupt candidates were then examined by phylogenetic analyses and mutation preferences to ensure that they were indeed transferred plastomic DNA in the nucleus. Here, for the first time, we demonstrate that conifer nupts can be PCR-amplified using our approach and that ancestral plastomic characteristics retained in nupts can be compared with extant ones, providing valuable information for

understanding plastome evolution in conifers.

2.2 Materials and Methods

2.2.1 DNA Extraction, Sequencing, and Genome Assembly

Young leaves of Amentotaxus formosana and Taxus mairei were harvested in the greenhouse of Academia Sinica and Taipei Botanical Garden, respectively. Total DNA was extracted with modified CTAB method with 2% polyvinylpyrrolidone (Stewart and Via, 1993). The DNA was qualified by a threshold of both 260/280 = 1.8–2.0 and 260/230 > 1.7 for next-generation DNA sequencing on an Illumina GAII instrument at Yourgene Bioscience (New Taipei City, Taiwan). For each species, approximately 4 GB of 73-bp paired-end reads were obtained. These short reads were trimmed with a threshold of error probability < 0.05 and then de novo assembled by use of CLC Genomic Workbench 4.9 (CLC Bio, Aarhus, Denmark). Contigs with sequence coverage of depth greater than 50X were blasted against the nr database of the National Center for Biotechnology Information (NCBI). Contigs with hits for plastome sequences with E-value < 10^-10 were retained for subsequent analyses. Gaps between contigs were closed by PCR experiments with specific primers. PCR amplicons were sequenced on an ABI 3730xl DNA Sequencer (Life Technologies).

2.2.2 Genome Annotation and Sequence Alignment

Genome annotation involved the use of DOGMA with default option (Wyman et al., 2004). Transfer RNA genes were explored by using tRNA scan-SE 1.21 (Schattner et al., 2005). For each species, we aligned the annotated genes and their orthologous genes of other known conifer plastomes to confirm gene boundaries. Sequences were aligned using MUSCLE (Edgar, 2004) implemented in MEGA 5.0 (Tamura et al., 2011).

2.2.3 Exploration of Single-Nucleotide Polymorphisms (SNPs), Indels, and Simple Sequence Repeat (SSR) Sequences

To estimate the distribution of both SNPs and indels between our newly sequenced plastome of T. mairei and the T. mairei voucher NN014 (NC_020321), the two genomes were aligned by using VISTA (Frazer et al., 2004). The alignment was then manually divided into non-overlapping bins of 200 bp according to the position of our newly sequenced T. mairei plastome. Both SNPs and indels in each bin were estimated by using DnaSP 5.10 (Librado and Rozas, 2009). SSRs of the T. mairei plastome were explored using SSRIT (Temnykh et al., 2001) with a threshold of repeat units > 3.

2.2.4 Construction of Ancestral Plastomic Organization

We performed whole-plastomic alignments between the two yews under study and other conifers, Calocedrus formosana (NC_023121), Cephalotaxus wilsoniana (NC_016063), Cryptomeria japonica (NC_010548), Cunninghamia lanceolata (NC_021437), and Taiwania cryptomerioides (NC_016065), to detect locally collinear blocks (LCBs) using Mauve 2.3.1 (Darling et al., 2010). The yielded matrix of LCBs was used to reconstruct the putative ancestral plastomic organizations on MGR 2.03 (Bourque and Pevzner, 2002), which seeks the minimal genomic rearrangements over all edges of a most parsimonious tree.

2.2.5 PCR Amplification, Cloning, and Sequencing

Ten pairs of specific primers used for amplification of nupt sequences in Taxaceae were manually designed and their sequences and corresponding locations are in Table 2 and Figure 4. PCR amplification involved the use of long-range PCR Tag (TaKaRa LA Taq, Takara Bio Inc.) under the thermo-cycling condition 98℃ for 3 min, followed by

30 cycles of 98℃ for 15 s, 55℃ for 15 s, and 68℃ for 4 min, and a final extension at 72 ℃ for 10 min. Amplicons were checked by electrophoresis. Amplicons with expected lengths were collected and cloned into yT&A vectors (Yeastern Biotech Co., Taipei) that were then proliferated in E. coli. Sequencing the proliferated amplicons involved M13-F and M13-R primers on an ABI 3730xl DNA Sequencer (Life Technologies).

2.2.6 Phylogenetic Tree Analysis

Maximum likelihood trees were inferred from sequences of potential nupts, their plastomic counterparts, and their orthologs in other gymnosperms using MEGA 5.0 (Tamura et al. 2011) under a GTR + G (4 categories) model. Supports for nodes of trees were evaluated by 1,000 bootstrap replications.

2.2.7 Estimation of Mutations in Nuclear Plastid DNAs and Their Plastomic Counterparts

The sequence for each nupt was aligned to the homologous plastome sequences for

A. formosana, C. wilsoniana, T. maire and C. lanceolata using MUSCLE (Edgar, 2004).

To precisely calculate the mutational preference in nupts, all ambiguous sites and gaps were removed from our alignments. Nucleotide divergence between nupts and their plastomic counterparts were derived from mutations in either of these two sequences. A mutation in nupt or its plastomic counterpart was recognized when the corresponding site of the plastomic counterpart or nupt was identical to that of at least two other taxa.

For example, a specific aligned site has “T”, “C”, “C”, “C”, and “C” in Cep-2 nupt, C.

wilsoniana, A. formosana, T. mairei and C. lanceolata, respectively (also see the aligned

position 32 in Figure 5). This site would be recognized as a nonsynonymous mutation

from “C” to “T” in the Cep-2 nupt as the corresponding amino acid change from Alanine to Valine.

2.2.8 Plastome Mapping and Statistical Analyses

The plastome map of T. mairei was drawn using Circos (http://circos.ca), which is a flexible software for exploring relationships between objects or positions. It was written in Perl language. The physical map will be saved as PNG format in the input folder. In all statistical tests, including Pearson’s correlation test and Student’s t-test, Microsoft Excel 2010 was used.

2.3 Results

2.3.1 Reduction and Compaction of the Plastome of T. mairei

The plastomes of A. formosana (AP014574) and T. mairei (AP014575) are circular molecules with AT contents of 64.17% and 65.32%, respectively. The T. mairei (128,290 bp) plastome has lost five genes (rps16, trnA-UGC, trnG-UCC, trnI-GAU, and

trnS-GGA) compared to that of A. formosana (136,430 bp), which leads to a relatively

smaller plastome size. The coding regions occupy 61.27% of the plastome length in A.

formosana and 64.18% in T. mairei. The gene density was estimated to be 0.88 and 0.90

(genes/kb) for the plastome of A. formosana and T. mairei, respectively. In addition, the other two published plastomes for Taxaceae species, C. wilsoniana (NC_016063) and C.

oliveri (NC_021110), are 136,196 bp and 134,337 bp, respectively. Altogether, these

data suggest that the plastome of T. mairei has evolved towards reduction and compaction.

Dot-plot analysis (Figure 6) reveals three genomic rearrangements between the plastomes of A. formosana and T. mairei, including a relocated fragment of ~18 kb from

psbK to trnC-GCA, a relocated fragment of ~16 kb from trnD-GUC to trnT-UGU, and

an inverse fragment of ~18 kb from 5'rps12 to infA. However, the two plastomes share a unique inverted repeat pair that contains trnQ-UUG in each copy, hereafter termed

“trnQ-IR” (Figure 6).

2.3.2 Intra-species Variations in the Plastomes of T. mairei

To date, the plastomes of three T. mairei individuals (T. mairei voucher NN014:

NC_020321, T. mairei voucher SNJ046: JN867590, and T. mairei voucher WC052:

JN867591) have been published. Together with our newly sequenced plastome of T.

mairei, these four plastomes vary slightly in size ranging from 127,665 to 128,290 bp. A

neighbor-joining (NJ) tree inferred from the whole-plastomic alignment between these four individuals and A. formosana is shown in Figure 7. The tree topology indicates that although the plastome size of SNJ046 and WC052 is similar to that of NN014, and that the two plastomes SNJ046 and WC052 form a sister clade to that of our sampled T.

mairei.

We also performed a pairwise genome comparison between our T. mairei and voucher NN014 because the latter was designated as the reference sequence (RefSeq) in NCBI GenBank. We detected 858 SNPs and 218 indels between the two plastomes.

Figure 8 shows that the intergenic spacers and coding regions contained nearly equal numbers of SNPs. Most of the indels were found in the intergenic spacers and accounted the difference in plastome size between the two T. mairei individuals. We found 33 indels in the coding regions, but none caused frameshifts. Figure 9 illustrates the distribution of SNPs, indels, and SSRs in the plastome of our sampled T. mairei.

Interestingly, the abundance of SSRs was positively correlated to those of SNPs (Pearson, r = 0.52, p < 0.01). However, no correlation was detected between SSRs and

indels abundance (Pearson, r = 0.02, p = 0.89). In legumes, the region that contains ycf4,

psaI, accD, and rps16 was found to be hypermutable (Magee et al. 2010). In the

plastome of T. mairei, three 200-bp bins that located in the sequence of 5’clpP (position 55,001–55,200), 5’ycf1 (pos. 124,201–124,400), and the intergenic spacer between

rrn16 and rrn23 (pos. 96,801–97,000) contained the highest sum of SNPs, indels, and

SSRs (Figure 9). Therefore, these loci can be considered intra-species mutational hotspots in T. mairei and can be a potential high-resolution DNA barcodes for population genetics of Taxus.

2.3.3 Retrieval of Ancestral Plastome Sequences in Taxaceae

A matrix with 20 locally collinear blocks (LCBs) was generated on the basis of whole plastome alignments between the sampled three Taxaceae and four Cupressaceae species. This matrix of LCBs was then used in reconstructing ancestral plastomic organization. The most parsimonious tree with the corresponding ancestral plastomic organization is shown in Figure 4 and Figure 10, and that the three Taxaceae species form a monophyletic clade while A. formosana is closer to C. wilsoniana than to T.

mairei. This topology is in good agreement with the recent molecular review of the

conifer phylogeny by Leslie et al. (2012). Figure 4 shows the detailed evolutionary scenario of plastomic rearrangements with the intermediate ancestral plastomes in the three examined Taxaceae species. By comparing the ancestral and extant plastomes, we postulated that one, three, and two inversions might have occurred in A. formosana, C.

wilsoniana, and T. mairei, respectively, after they had diverged from their common

ancestor. Specific primer pairs were used for amplifying the corresponding ancestral fragments that differ from the extant plastomes in genomic organization (Figure 4). Five (Ame-2, Cep-2, Cep-5, Cep-6, and Tax-4) out of the ten primer pairs were able to

produce amplicons totaling 16.6 kb (see Table 3 for accession numbers).

2.3.4 Characteristics of Potential Nupt Amplicons

The obtained PCR amplicons were sequenced and annotated (Table 3). With the exception of chlB of Cep-2, all putative protein-coding genes contain no premature stop codons. The coding sequence (CDS) of each amplicon was aligned with its plastomic counterparts and orthologs of other cupressophytes, Ginkgo, and Cycas. We used maximum likelihood (ML) trees inferred from concatenated CDSs to examine the origins of these PCR amplicons, with Ginkgo and Cycas as the outgroup (Figure 11). In each tree, the plastomic sequences were divided into three groups (i.e., the Cupressaceae clade, the Taxaceae clade, and the clade comprising Araucariaceae and Podocarpaceae).

Notably, the placements of our PCR amplicons are incongruent among the four trees.

For example, both Ame-2 and Cep-2 were clustered with their plastomic counterparts (Figure 11A). In contrast, Cep-5, Cep-6, and Tax-4 were placed remotely from their plastomic counterparts, indicating that they originated via horizontal transfer (Figure 11B, C, and D).

The ancestral plastomic organization that we used to design primers for amplification of Ame-2 and Cep-2 was rearranged by a 34-kb inversion flanked by trnQ-IRs. These trnQ-IRs were 564 and 549 bp in size for A. formosana and C.

wilsoniana, respectively. IRs of similar sizes can mediate homologous recombination in

the conifer plastomes (Tsumura et al., 2000; Wu et al., 2011; Yi et al., 2013; Guo et al., 2014). As a result, if the trnQ-IR-mediated isomeric plastome is present in our sampled taxa, our PCR approach shall be able to amplify isomeric plastomic fragments. Ame-2 has 100% sequence identity with its plastomic counterpart (Figure 11A) in the CDS, which strongly suggests its origin as an isomeric plastome. Cep-2 differs from its

plastomic counterpart by several mutations, including two premature stop codons in

chlB, of which one of the two cannot be replaced by neither U-to-C nor C-to-U

RNA-editing (Figure 12). Therefore, the origin of Cep-2 is from a horizontal transfer rather than an isomeric plastome.

2.3.5 Evolution of Nupt Sequences in Taxaceae

The sequence identity between the four nupts and their plastomic counterparts ranges from 61.71% to 99.08% (Table 4). In fact, differences in aligned sites between

nupts and their plastomic counterparts are derived from two types of mutations. One is

the mutation in nupts and the other is that in plastomes. As shown in Table 4, with the exception of Tax-4, all nupts accumulated more mutations than their plastomic counterparts. The low sequence identity between Tax-4 and its plastome sequences (61.71% in Table 4) may be due to the unusually increased mutations in the latter. In all

nupts except Cep-5, at least one potential protein-coding gene had the ratio of

nonsynonymous (dn)/synonymous (ds) mutations > 1, which reflects the effect of relaxed functional constraints in nupts. Figure 13 illustrates nucleotide mutation classes in nupts and their corresponding plastome sequences. We excluded the plastomic counterpart of Cep-2 from the calculation because we observed only one mutation in the sequence. In all nupts, transitional mutations comprise over 50% of the total mutations.

The mutation of G to A and its complement C to T (denoted GC-to-AT in Figure 13) had the highest frequency in both nupts and plastome sequences. To examine which of the mutation classes is statistically predominant, we compared the two most abundant classes of mutations. In nupts, the frequency was higher for GC-to-AT than AT-to-GC mutations (t-test, p = 0.018). However, GC-to-AT and AT-to-GC mutations did not differ in plastome sequences (t-test, p = 0.379), suggesting different mutational environments

between nupts and their corresponding plastome sequences.

2.3.6 Ages of Nupts in Taxaceae

Molecular dating of sequences highly depends on mutation rates. Unfortunately, mutation rates in the nuclear genomes of Taxaceae species have not been directly measured. The nupts identified in this study were expected to evolve neutrally. The four-fold degenerated site is a useful indicator in measuring the rate of neutral evolution (Graur and Li, 2000). In nuclear genomes of conifers, the mutation rate at the four-fold degenerate sites was estimated to be 0.64 × 10^-9 per site per year (Buschiazzo et al., 2012). In the nupts Cep-2, Cep-5, Cep-6, and Tax-4, we found 29, 117, 100, and 42 mutations among 2,961, 3,380, 2,207, and 1,466 sites, respectively (Table 4). Therefore, the ages of Cep-2, Cep-5, Cep-6, and Tax-4 were estimated to be approximately 15.3, 54.1, 70.8, and 44.8 million years (MY), respectively.

2.4 Discussion

2.4.1 Labile Plastomes of Yew Family and Their Impact on Phylogenetic Studies The phylogenetic relationships among Amentotaxus, Cephalotaxus, and Taxus have not been resolved. Recent molecular studies placed Amentotaxus as sister to Taxus (e.g., Cheng et al. 2000; Mao et al. 2012) or to Cephalotaxus (e.g., Leslie et al. 2012). We found that a 34-kb inversion from trnT to psbK distinguished A. formosana and C.

wilsoniana from T. mairei (Figure 4), which suggests that A. formosana is closer to C.

wilsoniana than to T. mairei. However, the plastome of another Taxus species, T.

chinensis (Zhang et al., 2014), cannot be distinguished from those of A. formosana and

C. wilsoniana by this 34-kb inversion. Of note, this 34-kb inversion is flanked by a pair

of trnQ-IR sequences. We found that the trnQ-IR sequence is commonly present in A.

formosana (564 bp), C. wilsoniana (549 bp), T. mairei (248 bp), and T. chinensis (248

bp).

The presence of the trnQ-IR pair is able to generate isomeric plastomes in C.

oliveri (Yi et al., 2013) and four Juniperus species (Guo et al., 2014). In Pinaceae,

inverted repeats larger than 0.5 kb could trigger plastomic isomerization, and retention of an isomer was species- or population-specific (Tsumura et al., 2000; Wu et al., 2011).

Indeed, Figure 11A revealed that Ame-2 was likely a PCR amplicon derived from the trnQ-IR-mediated isomeric plastome of A. formosana. Therefore, with the presence of an isomeric plastome, the synapomorphic character―the 34-kb inversion―in Figure 4 might be a false positive result caused by insufficient sampling. Nonetheless, our data also suggest that isomeric plastomes be cautiously treated when using genomic rearrangements in phylogenetic estimates.

Disruption of the plastomic operons is rare in seed plants (Jansen and Ruhlman, 2012). We found that the S10 operon of T. mairei was separated into two gene clusters (rpl23-rps8 and infA-rpoA) by an 18-kb inversion (Figure 4). Because the transcriptional direction of the S10 operon is from rpl23 to rpoA (Jansen and Ruhlman, 2012), the gene cluster infA-rpoA in T. mairei likely has to acquire a novel promoter sequence for transcription. Disruption of the S10 operon was previously reported in the plastome of Geraniaceae (Guisinger et al., 2011). However, the evolutionary consequence of plastomic operon disruption has never been studied. In the plastome of

T. mairei, we detected prominently elevated mutations in the two separated gene

clusters of the S10 operon as compared with their relative nupts (Table 4). Interestingly, two (i.e., infA and rps11 in Table 4) of the three protein-coding genes on the plastomic gene-cluster infA-rpoA had dn/ds ratios larger than 1. Whether disruption of the S10 operon results in the positive selection of these two genes requires further investigation.

2.4.2 PCR-Based Approach in Investigating Nupts: Pros and Cons

The immense growth of available sequenced nuclear genomes offers great opportunities for investigating nuclear organellar DNA (norgs). The number of norgs could vary depending on the use of different assembly software, versions of released genomes, and search strategies (Hazkani-Covo et al., 2010). A PCR-based approach, such as that of Rousseau-Gueutin et al. (2011) and ours, is free from this problem encountered in genome assembly. The nupts we amplified and reported here are a few examples of nupts. However, considering the huge nuclear genome of conifers which requires high cost and efforts for sequencing and assembly, our PCR-based approach provides a cost-effective way for studying the evolution of nupts.

Using a threshold of >70% sequence identity, Smith et al. (2011) extracted nupts of about 50 kb from the nuclear genome of Arabidopsis. The amount of Arabidopsis nupts decreased to approximately 17.6 kb when the threshold of sequence identity was increased to 90% (Yoshida et al., 2014). It seems that identification of possible nupts is largely influenced by the thresholds. Setting high thresholds might limit the exploration of nupts to only relatively recent transfers (Yoshida et al., 2014). Clearly, the problem of setting thresholds is absent from our PCR-based approach. In this study, sequence identity between nupts and their plastomic counterparts ranged from 61.71 to 99.08%

(Table 4). Thus, one or three of the four presented nupts would not be obtained if we had considered the thresholds of Smith et al. (2011) or Yoshida et al. (2014),

在文檔中紅豆杉科與傘松科植物色質體基因組分析:洞悉其核質體DNA 與重組基因群 (頁 25-0)