Material and methods - 東亞特有種苦苣苔親緣地理、花發育及花對稱性基因自然選汰

Sample collection and genomic DNA extraction

We sampled nine populations of var. ramondioides and twelve populations of var. taiwanensis (197 individuals in total) almost covering its distribution range in East Asia (Figure 2-1). The detailed sample codes and collection localities are listed in (Table 2-1). Leaf samples collected in the field and dried in silica gel then grid into powder by homogenizer. DNA extraction following CTAB procedure (Doyle & Doyle, 1987). The DNA concentration was determined for each sample using the Nanodrop RNA/DNA Calculator (Thermo, Taipei, Taiwan).

Amplification of cpDNA markers, CrCYC1 and ITS

To evaluate divergence process of C. ramondioides populations across East Asia, two cpDNA markers (including trnL-F and trnH-psbA) and two nuclear markers (including CrCYC1 and ITS), were amplified from all collected individuals by Sanger sequencing. The primer pairs used for amplifying trnL-trnF region was based on Taberlet et al.(1991), and the forward primer forward

‘trnH’ (CGC GCA TGG TGG ATT CAC AAT CC) and reverse primer ‘psbA’ (GTTATGCATGAA CGT AAT GCT C) were used to amply the trnH-psbA region (Tate and Simpson, 2003). The PCR reaction was made up to total volume 25 μl, containing 13μl of 2x Ampliqon master mix Red (Ampliqon, Denmark), 0.25μl each primer at 2μM, 10.5μl ddH2O and 1μl template at 20 ng/μl. The PCR profile for cpDNA markers amplification was 35 cycles of 30s at 94℃, 30 s at 56℃, and 1min at 72℃ with final extension for 5 minutes at 72℃, after an initial denaturing for 3 min at 94℃ for both trnL-trnF, and trnH-psbA. For nuclear markers, CrCYC1 sequences were amplified using forward primer ‘SPF’ AGCAAGACATGCTTTCTG G-3') and reverse primer ‘SPR’

(5'-GACATTGGATTCAATATGGTG-3') (Xiao and Wang, 2007), while the forward primer ‘ITS 5P’

(5’-GGA AGG AGA AGT CGT AAC AAG G-3’) and reverse primer ‘ITS 8P’ (5’-CAC GCT TCT CCA GAC TAC A-3’), were used for amplifying ITS region (Möller and Cronk, 1997). The PCR

profile was 35 cycles of 30s at 94℃, 30 s at 58℃, and 50s at 72℃ with final extension for 10

minutes at 72℃, after an initial denaturing for 3 min at 94℃ for CrCYC1 and ITS amplification. The PCR reaction was made up to total volume 25 μl, containing 13μl of 2x Ampliqon master mix Red (Ampliqon, Denmark), 0.25μl each primer at 2μM, 10.5μl ddH2O and 1μl template at 20 ng/μl. All PCR products were examined by 1.0％ (w/v) agarose gel electrophoresis and then the band with right size was sent for sequencing. The PCR products were then sequenced by ABI 3700 automatic sequencer. All sequences were examined by Sequence Scanner v1.0 first and then aligned for further analysis. To examine within individual homogeneity of ITS, we amplified nine individuals (three from Japan, three from Southeast China and three from Taiwan) using ITS primer pairs with proofreading KAPA HiFi PCR kit (Massachusetts, USA). Cloning procedure of these nine PCR products was carried out using pGEM-T Easy Vector kit (Promega, USA), following the

manufacturer’s protocols. Finally, five to six clones of each PCR product were sequenced.

Reconstruct haplotypes of CrCYC1 and ITS

Aligned CrCYC1 and ITS sequences were trimmed and the chromatograms of these two loci

inspected by eye for double peaks, indicative of heterozygous sites. To reconstruct haplotypes from CrCYC1 and ITS sequences with multiple heterozygous sites, we used the Bayesian statistical method in PHASE 2.1.1 (Stephens et al. 2001; Stephens & Scheet 2005) for this purpose. PHASE was applied to reconstruct haplotypes from Japan, Taiwan-Iriomote and South-east China sequences respectively, using models that allow intragenic recombination. The analysis was performed

with10000 iterations (10 iterations per sample) following a burn-in of 1000 iterations. Haplotypes inferred by PHASE with a posterior probability of ≥0.6 were accepted (see Garrick et al. 2010).

Single copy nuclear intron marker selection and amplification

To reconstruct phylogeny of closely related taxa or intra species level, single copy nuclear intron

markers are more preferable than commonly used non coding cpDNA marker due to their high genetic variability (Krak et al., 2012). Therefore, to resolve phylogeny and infer population

divergence process of C. ramondioides, we developed specific single copy nuclear intron markers for C. ramondioides in this study. First, three (including AGT intron 1, GroES intron 1 and LFY intron 1) out of eleven potential single copy markers were chosen for amplicon library preparation due to these three markers generating single band on gel electrophoresis. Next, amplicon library preparation of these three loci was conducted following the 16S metagenomic sequencing library preparation guide which is provided by Illumina technology (San Diego, California, USA). Then, amplicon library were sequenced on the Illumina MiSeq sequencing system. Paired-end Illumina reads (2× 300 bp) were imported into CLC Bio Genomics workbench 7.0.4 (QIAGEN Bioinformatics) and then merged to get overlapping pairs. Finally, only reads with over 100 counts were kept for further analysis.

Genetic structure of populations and geographical structure analysis

Sequence of all six (including AGT intron 1, GroES intron 1 and LFY intron 1, CrCYC1, ITS and concatenated cpDNA) markers were aligned with MUSCLE algorithm (Edgar, 2004) implementing in MEGA 6 (Tamura et al., 2013) and manually edited using BioEdit v. 7.2.5 (Hall, 1999). For each marker, we estimated nucleotide diversity (π) and haplotype diversity (Hd) were calculated using DnaSP vers. 5.0 (Librado ＆ Rozas, 2009) for each population, for each region and for whole dataset. When estimating genetic diversity of populations locating at both side across ECS, we assigned Honshu and Shikoku populations as “North group”, while Taiwan, Iriomote and SE-China populations as “South group” (see Table 2-1).

East China Sea is thought to be a barrier to gene flow for shrub or herbs (Qi et al., 2014; Wang et al., 2013; Ye et al., 2017), thus Conandron populations distributed in Main Japan (Honshu and Shikoku), Taiwan and south-east China were classified into geographical corresponding groups for examining

genetic structure. Iriomote populations were assigned to Taiwan group for its close geographical distance to Taiwan. Analysis of molecular variance (AMOVA) of our genetic data was performed using program Arlequin ver 3.5 (Excoffier & Lischer, 2010). F-statistics were calculated to estimate the proportion of genetic variability found among populations (Fst), among population within groups (Fsc) and among groups (Fct) of our datasets. If ECS served as a barrier, we expected highest

genetic variation should be attributed to among group level. The significance level of all estimated fixation indices was tested with 10000 permutations (Excoffier et al., 1992).

To reconstruct the relationship among haplotypes, for each marker (including AGT intron 1, GroES intron 1 and LFY intron 1, CrCYC1, ITS and concatenated cpDNA), we constructed statistical parsimony network with 95% connection limit using R package “Haplotypes”. Indels were coded using simple indel coding method, which treated indels with different start and/or end positions are separate characters (Simmons and Ochoterona, 2000).

Infer lineage divergence of C. ramodioides populations from nuclear markers

To reconstruct divergence order of Conandron populations from our three nuclear loci, we integrate all genetic information into a locus partitioned *BEAST mutli-locus species tree analysis

implemented in the BEAST v2.4.7 software platform (Bouckaert et al., 2014; Heled & Drummond, 2009). Each sequence was assigned into group by collection site and each collection site was used as an operational taxonomic unit (OTU) in all species tree analyses. Substitution model of each locus was determined in Jmodeltest v2.1.3 (Posada, 2008) and we set HKY model for LFY and GroES1 dataset, HKY+G for AGT intron 1 dataset. Due to low rate variation expected among lineages at intraspecific level, strict clock model was applied for all three loci (Brown & Yang, 2011). For clock rate prior, we set a mean and initial value of 0.0037 per substitutions/site/million year and a standard deviation of 2.55 (95% interquartile range of 0.012 to 0.0027) under a lognormal prior distribution of LFY dataset (See supporting file 1). For the other two loci, substitution rate were estimated with a mean value and initial value of 0.0037 per substitutions/site/million year with more broadly

distributed lognormal prior distributions (AGT intron 1 and GroES intron 1: 95% interquartile range of 0.187 to 0.000074) to cover a wide substitution rate range of these two loci to show uncertainty.

For species tree priors, we used a Yule tree prior and a piecewise linear and constant root population size model. MCMC chains were performed for 50 million generations (sampling every 5,000

generations and discarding the first 10% as burn-in). Three independent MCMC chains were performed under these settings and each log files and species tree files were combined into a single log file and a single species tree file using the logCombiner program in BEAST v2.4.7. Convergence level of these three MCMC chains were checked in the program Tracer v1.6.0 (Rambaut et al., 2014).

Adopting topology identified from multi-loci species tree: [(Honshu + Shikoku populations), (Taiwan-Iriomote populations, SE-China populations)], we then further examined the population divergence process with focusing on gene flow by IM model which implementing in IMa2 (Hey, 2010a). An Infinite site model was adopted for each of the 3 nuclear markers analyzed. Truncated uniform distribution priors of all six parameters of IMa2 were set following the author’s

recommendations (Hey, 2010a). The upper bound of scaled lineage splitting time (with the flag -t), scaled effective population size of the extant and ancestral population (-q) and scaled migration rate among populations (-m) were set according to the theta estimated from Taiwan-Iriomote group, which had highest theta among 3 Conandron groups. The geometric mean of Watterson’s (1975) theta per sequence over three markers was 3.2. We set higher upper bound values than estimates of these parameters and they were –t7, -q17 and –m1 for splitting time, effective population size and migration rate respectively. The mutation rate per locus was set to 3.7×10^-9 × 393 bp for LFY marker, and mutation rate of LFY was used as reference for the other two loci to rescale the IMa2 parameters from the analysis.

In addition, we also applied IM models, with same upper bound settings mentioned above, to

pairwise dataset of the three Conandron groups. Divergence time estimated from pairwise analysis can reveal splitting order (Hey, 2010b), therefore the results can confirm the divergence order inferred from *BEAST2. Finally, 40 Metropolis-coupling MCMC chains with heating parameter (-ha0.96, -hb0.85) were set for all four group and pairwise group pairs with 10⁵ iteration discarded as burn-in followed by 10⁶ iterations for sampling (100 iterations per sample). Three independent runs were performed for accessing convergence. In addition, likelihood ratio tests (LRT) were computed to determine significant level of post splitting gene flow between groups, following Nielsen &

Wakeley (2001).

Population demographical history

Two statistics-Tajima’s D (Tajima, 1983) and Fu’s Fs (Fu, 1997) were used to assess whether molecular markers isolated from Conandron conformed to expectation of neutrality theory or departed from neutrality theory due to bottleneck or expansion for each location using DnaSP5.0 (Librado and Rozas, 2009). The examination of deviation from neutrality by both Tajima’s D and Fu’s Fs statistics was based on 1000 coalescent simulations using DnaSP. Expectations of these two statistics are nearly zero or zero in a constant-size population, significant negative values indicate a sudden expansion in population size, while significant positive values mean processes such as a population subdivision or recent population bottlenecks. As a transcription factor, selection could affect CrCYC1 at codon level. Therefore, CrCYC1 dataset was subjected to further examination of deviation of neutral evolution by McDonald-Kreitman test (McDonald & Kreitman, 1991) using McDonald– Kreitman test on line web interface (Egea et al., 2008).

Second, we estimated effective population size change over time of Conandron by assessing whether Conandron has experienced population size expansion or decline as predicted by the hypothesis that the emergences of ECS continental shelf served as corridor or filter during cold periods. Analyses were conducted with the extended Bayesian skyline plot (EBSP) implemented in BEAST v2.4.7,

which uses coalescent based models to estimate posterior probabilities of effective population size over time (Heled & Drummond, 2008). Analyses focused on the groups identified by nrDNA multi loci phylogeny respectively. For nrDNA dataset, C. ramondioides were assigned into Honshu-Shikoku group (HS), Taiwan-Iriomote group (TI) and SE-China group (C) respectively. Substitution models for each dataset and molecular clock prior were set as in the *BEAST analysis. We

reconstructed population size change over time under the ‘coalescent: constant size’ tree prior for each group respectively in three EBSP runs. Each run consisted of 50 million generations with sampling every 5000 steps of which first 10% were discarded as burn-in. We combined individual data from three log files into a single log file and the corresponding three gene trees into a single gene tree file using the logCombiner program in BEAST v2.4.7. Skyline plots were constructed in EXCEL. Then, Bayes factors were used to judge whether EBSP results of each group showing statistical support or not. To do this, we extract counts of population size change and counts of no population size change from sum (indicators.alltrees) in the combined log file of each group. The BF over 3 were be regarded as a signal of significant population size change over time.

Ecological niche modeling

To infer potential suitable habitat change of Conandron at Last glacial maximum (LGM) and present, we modeled the potential suitable distribution range of C. ramondioides using maximum entropy methods (Elith et al. 2006; Phillips et al. 2006) in the program MAXENT version 3.3.3.

Available specimen occurrence data were acquired from GBIF (http://www.gbif.org/) and 19 BIOCLIM variables, including current and Last Glacial maximum, from the WorldClim data set (Hijmans et al. 2005) were used at 2.5m spatial resolution (~5 km).

Many of the BIOCLIM variables are highly correlated and applying all BIOCLIM variables in the modeling will inflate the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model (Dormann et al., 2013). Thus, to prevent

collinearity, variance inflation factors (VIF) analysis was performed to exclude highly correlated variables (VIF>5) before modeling. VIF calculates inflation and generalizes variance-inflation factors for linear and generalized linear models. BIOCLIM variables, including Mean Diurnal Range, Isothermality, Mean Temperature of Wettest Quarter, Mean Temperature of Driest Quarter, Annual Precipitation, Precipitation of Driest Month and Precipitation Seasonality were kept for ecological modeling after VIF analysis. All chosen BIOCLIM variable layers were first clipped into proper area in the ArcGIS v10.3.

Our nr phylogeny suggested current disjunctively distributed C. ramondioides populations can be assigned into three geographical corresponding groups including Honshu-Shikoku (HS), Taiwan-Iriomote (TI) and SE-China (C). To examine whether these three groups showed niche

differentiation, we performed pairwise niche identity test among these groups by using ENMtools (Warren et al., 2010). The niche identity test aims to examine whether the pairwise suitability scores generating from MAXENT of each group showed significant ecological differences (Warren et al., 2010). To achieve this goal, we first generated potential suitability map of each group in MAXENT respectively and then performed niche identity test together with corresponding occurrence data of each group (See supporting file 2). By comparing the actual data to the null distribution, we can tell whether potential suitability map between groups showed statistically significant different or not.

To model potential suitability habitat of each group at LGM and present, all chosen BIOCLIM variables and occurrence data (HS: 202 occurrence, TI: 35occurrence, C: 8 occurrence) were used.

We first modeled potential suitability habitat of each group at present respectively with MAXENT ver. 3.3.3 (Phillips et al., 2006) using all chosen BIOCLIM variables and occurrence data of each group. Then, we projected modeled present suitability onto present and LGM climatic layers at the same resolution to generate continuous potential suitable distribution map of each group at LGM and current respectively.

Next, we try to identify suitable habitats and non-suitable habitats of these three C. ramondioides groups. To do this, we have to transform our continuous map to binary map by setting thresholds. In Liu et al, (2013) suggested that maximum sum of sensitivity and specificity should be regarded as best threshold method for presence-only ecological niche modeling. However, this threshold may be biased due to measure specificity which is based on absence data (Merow et al., 2013) that is

unknown in this analysis. Therefore, we chosen two commonly used thresholds that only consider presence data to determine potential suitable habitats and non-suitable habitats of C. ramondioides, including 10 percentile training presence threshold and minimum training presence threshold (Norris, 2014; Stephanie et al., 2013).

Results

Genetic diversity of Conandron populations

For the nrDNA markers, we obtained 208 AGT intron 1 sequences, 270 GroES intron 1 sequences and 192 LFY intron 1 sequences from our ampliqon library respectively (reads of AGT intron 1 ranging from 102 reads to 1524 reads, reads of GroES intron 1 ranging from 524 reads to 3383 reads and reads of LFY intron 1 ranging from 102 reads to 3011 reads). For CrCYC1 and ITS, we obtained 368 and 364 sequences after haplotype reconstruction by using PHASE. In total, 50 of 208 sequences of AGT intron 1 (24%), and 138 of 272 GroES intron 1 sequences (50.6%), 66 of 190 LFY intron 1 sequences (34.7%), 182 of 368 CrCYC1 (49.4%) and 158 of 364 ITS (43.4%) were heterozygotes.

The aligned lengths of 208 AGT intron 1 sequences ranged from 471 to 474 bp, 270 GroES intron 1 sequences ranged from 421 to 435 bp and 190 LFY intron 1 sequences ranged from 393 to 397 bp, which yielded 37, 43 and 28 haplotypes, respectively (See Table 2-1 for Haplotype distribution). For CrCYC1 and ITS, the aligned length are 798 bp and 734 bp respectively. Nucleotide diversity (π) was 7×10^-3 at species level (ranging from 0 to 6.67×10^-3 at population level) measuring from AGT intron 1 dataset, 9.25×10^-3 (ranging from 0 to 13×10^-3 at population level) from GroES intron 1 dataset and 7.4×10^-3 (ranging from 0 to 6.63×10^-3 at population level) from LFY1 dataset in Table 2-1. For CrCYC1 and ITS, nucleotide diversity of these two markers were 12.76×10^-3 (ranging from 0 to 8.15×10^-3 at population level) and 3.44×10^-3 (ranging from 0 to 7.52×10^-3at population level) at species level respectively. Haplotype diversity (Hd) was 0.927 at species level in AGT intron 1 (ranging from 0 to 0.933 at population level), 0.916 at species level in GroES intron 1 (ranging from 0 to 0.933 at population level) and 0.908 at species level in LFY intron 1(ranging from 0.000 to 1.000) (Table 2-1). For CrCYC1 and ITS, haplotype diversity was 0.951 (ranging from 0 to 0.933 at population level) and 0.864 (ranging from 0 to 0.889 at population level) at species level

respectively. Comparing genetic diversity indices (Hd and π) calculated from these five nrDNA markers of C. ramondioides populations aside ECS, genetic diversity indices from AGT intron 1

dataset showing North are somewhat lower than South (North: Hd = 0.736; π = 4.64×10^-3 vs. South:

Hd = 0.9; π = 5.64×10^-3). Genetic diversity indices from GroES showing North are higher than South (North: Hd = 0.925; π = 13×10^-3 vs. South: Hd = 0.818; π = 4.71×10^-3), while genetic diversity indices from LFY intron 1 dataset showing nearly same level between North and South (North: Hd = 0.823; π = 6.63×10^-3 vs. South: Hd = 0.818; π = 6.15×10^-3). For CrCYC1, genetic diversity indices estimated from North group are lower than that estimated from South group (North: Hd = 0.897; π = 4.7×10^-3 vs. South: Hd = 0.937; π = 10.32×10^-3). Last, genetic diversity indices estimated from ITS showing North are somewhat lower than that in South (North: Hd = 0.802; π = 4.43×10^-3 vs. South:

Hd = 0.839; π = 3.43×10^-3) (Table 2-1).

As for the cpDNA, the aligned lengths of 197 cpDNA (trnL-F + trnH-psbA) sequences ranged from 978 to 985 bp. Among these sequences, we identified 19 haplotypes from these sequences. High levels of haplotype and nucleotide diversity were found at the species level (Hd = 0.954; π = 6.58×10^-3), and were also found at regional level (Honshu+Shikoku : Hd = 0.633; π = 2.49×10^-3, Iriomote : Hd = 0.752; π = 1.42×10^-3, Taiwan : Hd = 0.865; π = 3.81×10^-3, SE-China: Hd = 0.821; π

= 6.34×10^-3) (Table 2-1). Genetic diversity indices (Hd and π) obtained from south populations are somewhat higher than those from north populations (North: Hd = 0.633; π = 2.49×10^-3 vs South: Hd

= 0.894; π = 3.95×10^-3).

Population structure of C. ramondioides

First, we reconstructed haplotype network from 6 loci to reveal the relationship among haplotypes and also help us to assign grouping of C. ramondioides populations. Our haplotype networks reconstructed from three nuclear markers showed that C. ramondioides in East Asia formed two clear groups corresponding to geographical region: Taiwan+Iriomote and Southeast China (Fig. 2-2a, 2-2b and 2-2c). In contrast, Honshu and Shikoku haplotypes formed multiple lineages linking to hypothetical haplotypes in each nuclear marker. In CrCYC1 haplotype network, CrCYC1 sequences

formed three distinct clade corresponding to geographical regions: Honshu + Shikoku, Southeast China and Taiwan+Iriomote (Fig. 2-2d). Unlike previous mentioned haplotype networks, multiple haplotypes connected to one broadly distributed ITS haplotype, which is situated in the center in ITS haplotype network (Fig. 2-2e). As can be seen in the unrooted nrDNA haplotype networks, the Taiwan+Iriomote and Southeast China haplotypes were separated from each other by 2 to three steps, and the Honshu and Shikoku haplotypes by 3 to 7 steps (Fig. 2-2a, 2-2b and 2-2c).

Specifically, the among group mutational steps identified from CrCYC1 haplotypes were 6 steps to 10 steps (Fig. 2-2d). The haplotype network pattern of these three nrDNA markers showed one common pattern that is no shared haplotypes among HS, C and TI groups of these three nuclear markers. In cpDNA haplotype network, HS and C + TI haplotypes were separated from each other by three steps. In addition, a SE-China and Taiwan shared cpDNA haplotype were identified and situated in the center of cpDNA haplotype network with multiple lineages connecting to it, forming a star like topology.

Significant genetic differentiation was obtained from our assigned “3group” dataset (Honshu + Shikoku group, Taiwan + Iriomote group and Southeast China group) of all four markers (Table 2). Pairwise comparisons of “3group” yielded highly significant Fst values for all markers (Table 2-2, 0.571 to 0.748 for AGT intron 1, 0.298 to 0.513 for GroES intron 1, 0.472 to 0.848 for LFY intron

在文檔中東亞特有種苦苣苔親緣地理、花發育及花對稱性基因自然選汰 (頁 24-62)