• 沒有找到結果。

CHAPTER 2. Ancient Nuclear Plastid DNA in the Yew Family

2.3 Results

2.3.1 Reduction and Compaction of the Plastome of T. mairei

The plastomes of A. formosana (AP014574) and T. mairei (AP014575) are circular molecules with AT contents of 64.17% and 65.32%, respectively. The T. mairei (128,290 bp) plastome has lost five genes (rps16, trnA-UGC, trnG-UCC, trnI-GAU, and

trnS-GGA) compared to that of A. formosana (136,430 bp), which leads to a relatively

smaller plastome size. The coding regions occupy 61.27% of the plastome length in A.

formosana and 64.18% in T. mairei. The gene density was estimated to be 0.88 and 0.90

(genes/kb) for the plastome of A. formosana and T. mairei, respectively. In addition, the other two published plastomes for Taxaceae species, C. wilsoniana (NC_016063) and C.

oliveri (NC_021110), are 136,196 bp and 134,337 bp, respectively. Altogether, these

data suggest that the plastome of T. mairei has evolved towards reduction and compaction.

Dot-plot analysis (Figure 6) reveals three genomic rearrangements between the plastomes of A. formosana and T. mairei, including a relocated fragment of ~18 kb from

psbK to trnC-GCA, a relocated fragment of ~16 kb from trnD-GUC to trnT-UGU, and

an inverse fragment of ~18 kb from 5'rps12 to infA. However, the two plastomes share a unique inverted repeat pair that contains trnQ-UUG in each copy, hereafter termed

“trnQ-IR” (Figure 6).

2.3.2 Intra-species Variations in the Plastomes of T. mairei

To date, the plastomes of three T. mairei individuals (T. mairei voucher NN014:

NC_020321, T. mairei voucher SNJ046: JN867590, and T. mairei voucher WC052:

JN867591) have been published. Together with our newly sequenced plastome of T.

mairei, these four plastomes vary slightly in size ranging from 127,665 to 128,290 bp. A

neighbor-joining (NJ) tree inferred from the whole-plastomic alignment between these four individuals and A. formosana is shown in Figure 7. The tree topology indicates that although the plastome size of SNJ046 and WC052 is similar to that of NN014, and that the two plastomes SNJ046 and WC052 form a sister clade to that of our sampled T.

mairei.

We also performed a pairwise genome comparison between our T. mairei and voucher NN014 because the latter was designated as the reference sequence (RefSeq) in NCBI GenBank. We detected 858 SNPs and 218 indels between the two plastomes.

Figure 8 shows that the intergenic spacers and coding regions contained nearly equal numbers of SNPs. Most of the indels were found in the intergenic spacers and accounted the difference in plastome size between the two T. mairei individuals. We found 33 indels in the coding regions, but none caused frameshifts. Figure 9 illustrates the distribution of SNPs, indels, and SSRs in the plastome of our sampled T. mairei.

Interestingly, the abundance of SSRs was positively correlated to those of SNPs (Pearson, r = 0.52, p < 0.01). However, no correlation was detected between SSRs and

indels abundance (Pearson, r = 0.02, p = 0.89). In legumes, the region that contains ycf4,

psaI, accD, and rps16 was found to be hypermutable (Magee et al. 2010). In the

plastome of T. mairei, three 200-bp bins that located in the sequence of 5’clpP (position 55,001–55,200), 5’ycf1 (pos. 124,201–124,400), and the intergenic spacer between

rrn16 and rrn23 (pos. 96,801–97,000) contained the highest sum of SNPs, indels, and

SSRs (Figure 9). Therefore, these loci can be considered intra-species mutational hotspots in T. mairei and can be a potential high-resolution DNA barcodes for population genetics of Taxus.

2.3.3 Retrieval of Ancestral Plastome Sequences in Taxaceae

A matrix with 20 locally collinear blocks (LCBs) was generated on the basis of whole plastome alignments between the sampled three Taxaceae and four Cupressaceae species. This matrix of LCBs was then used in reconstructing ancestral plastomic organization. The most parsimonious tree with the corresponding ancestral plastomic organization is shown in Figure 4 and Figure 10, and that the three Taxaceae species form a monophyletic clade while A. formosana is closer to C. wilsoniana than to T.

mairei. This topology is in good agreement with the recent molecular review of the

conifer phylogeny by Leslie et al. (2012). Figure 4 shows the detailed evolutionary scenario of plastomic rearrangements with the intermediate ancestral plastomes in the three examined Taxaceae species. By comparing the ancestral and extant plastomes, we postulated that one, three, and two inversions might have occurred in A. formosana, C.

wilsoniana, and T. mairei, respectively, after they had diverged from their common

ancestor. Specific primer pairs were used for amplifying the corresponding ancestral fragments that differ from the extant plastomes in genomic organization (Figure 4). Five (Ame-2, Cep-2, Cep-5, Cep-6, and Tax-4) out of the ten primer pairs were able to

produce amplicons totaling 16.6 kb (see Table 3 for accession numbers).

2.3.4 Characteristics of Potential Nupt Amplicons

The obtained PCR amplicons were sequenced and annotated (Table 3). With the exception of chlB of Cep-2, all putative protein-coding genes contain no premature stop codons. The coding sequence (CDS) of each amplicon was aligned with its plastomic counterparts and orthologs of other cupressophytes, Ginkgo, and Cycas. We used maximum likelihood (ML) trees inferred from concatenated CDSs to examine the origins of these PCR amplicons, with Ginkgo and Cycas as the outgroup (Figure 11). In each tree, the plastomic sequences were divided into three groups (i.e., the Cupressaceae clade, the Taxaceae clade, and the clade comprising Araucariaceae and Podocarpaceae).

Notably, the placements of our PCR amplicons are incongruent among the four trees.

For example, both Ame-2 and Cep-2 were clustered with their plastomic counterparts (Figure 11A). In contrast, Cep-5, Cep-6, and Tax-4 were placed remotely from their plastomic counterparts, indicating that they originated via horizontal transfer (Figure 11B, C, and D).

The ancestral plastomic organization that we used to design primers for amplification of Ame-2 and Cep-2 was rearranged by a 34-kb inversion flanked by trnQ-IRs. These trnQ-IRs were 564 and 549 bp in size for A. formosana and C.

wilsoniana, respectively. IRs of similar sizes can mediate homologous recombination in

the conifer plastomes (Tsumura et al., 2000; Wu et al., 2011; Yi et al., 2013; Guo et al., 2014). As a result, if the trnQ-IR-mediated isomeric plastome is present in our sampled taxa, our PCR approach shall be able to amplify isomeric plastomic fragments. Ame-2 has 100% sequence identity with its plastomic counterpart (Figure 11A) in the CDS, which strongly suggests its origin as an isomeric plastome. Cep-2 differs from its

plastomic counterpart by several mutations, including two premature stop codons in

chlB, of which one of the two cannot be replaced by neither U-to-C nor C-to-U

RNA-editing (Figure 12). Therefore, the origin of Cep-2 is from a horizontal transfer rather than an isomeric plastome.

2.3.5 Evolution of Nupt Sequences in Taxaceae

The sequence identity between the four nupts and their plastomic counterparts ranges from 61.71% to 99.08% (Table 4). In fact, differences in aligned sites between

nupts and their plastomic counterparts are derived from two types of mutations. One is

the mutation in nupts and the other is that in plastomes. As shown in Table 4, with the exception of Tax-4, all nupts accumulated more mutations than their plastomic counterparts. The low sequence identity between Tax-4 and its plastome sequences (61.71% in Table 4) may be due to the unusually increased mutations in the latter. In all

nupts except Cep-5, at least one potential protein-coding gene had the ratio of

nonsynonymous (dn)/synonymous (ds) mutations > 1, which reflects the effect of relaxed functional constraints in nupts. Figure 13 illustrates nucleotide mutation classes in nupts and their corresponding plastome sequences. We excluded the plastomic counterpart of Cep-2 from the calculation because we observed only one mutation in the sequence. In all nupts, transitional mutations comprise over 50% of the total mutations.

The mutation of G to A and its complement C to T (denoted GC-to-AT in Figure 13) had the highest frequency in both nupts and plastome sequences. To examine which of the mutation classes is statistically predominant, we compared the two most abundant classes of mutations. In nupts, the frequency was higher for GC-to-AT than AT-to-GC mutations (t-test, p = 0.018). However, GC-to-AT and AT-to-GC mutations did not differ in plastome sequences (t-test, p = 0.379), suggesting different mutational environments

between nupts and their corresponding plastome sequences.

2.3.6 Ages of Nupts in Taxaceae

Molecular dating of sequences highly depends on mutation rates. Unfortunately, mutation rates in the nuclear genomes of Taxaceae species have not been directly measured. The nupts identified in this study were expected to evolve neutrally. The four-fold degenerated site is a useful indicator in measuring the rate of neutral evolution (Graur and Li, 2000). In nuclear genomes of conifers, the mutation rate at the four-fold degenerate sites was estimated to be 0.64 × 10-9 per site per year (Buschiazzo et al., 2012). In the nupts Cep-2, Cep-5, Cep-6, and Tax-4, we found 29, 117, 100, and 42 mutations among 2,961, 3,380, 2,207, and 1,466 sites, respectively (Table 4). Therefore, the ages of Cep-2, Cep-5, Cep-6, and Tax-4 were estimated to be approximately 15.3, 54.1, 70.8, and 44.8 million years (MY), respectively.

相關文件