M ATERIAL AND M ETHOD - ASSESSMENT OF POPULATION DIFFERENTIATION AND

CHAPTER 2 ASSESSMENT OF POPULATION DIFFERENTIATION AND

2.2 M ATERIAL AND M ETHOD

2.2.1 Plant materials

All plant materials and their information were obtained from TGRC (S_Tab 2.1;

http://tgrc.ucdavis.edu/). A total of 12 accessions from Ecuador and 87 accessions from Peru were utilized in this study. According to their mating types, 43 accessions were facultative self-compatible (FSC), and 56 accessions were autogamous self-compatible (ASC). Seeds were propagated by self-pollination for two generations using the method

of single-seed descent in a greenhouse. Young leaves collected from plants of these single-seed descendent seeds were used for DNA extraction.

2.2.2 RAD sequencing

Total genomic DNA was extracted from young leaves using a modified CTAB method (Fulton, Chunwongse, & Tanksley, 1995) and purified with a DNeasy Blood &

Tissue Kit (QIAGEN, Venlo, Netherland) following the manufacturer’s instructions. We chose PstI to select the sequencing regions because PstI is a methylation-sensitive restriction enzyme and it may cut more frequently in euchromatin regions than heterochromatin regions (Dobritsa & Dobritsa, 1980). PstI-digested DNA libraries were prepared following the protocol of Etter et al. (Etter, Bassham, Hohenlohe, Johnson, &

Cresko, 2011). Four RADseq libraries were constructed, and each was sequenced in one lane of an Illumina HiSeq2000 flow cell (100 bp single-end reads) (Illumina Inc., San Diego, CA, USA). All the sequences of RADseq were submitted to the NCBI SRA database, and the BioProject Number is PRJNA358110.

2.2.3 SNP calling

Reads were analyzed with Stacks version 1.37 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) and with CLC Genomics Workbench software version 6.5.1 (QIAGEN, Venlo, Netherlands). First, the process_radtags command in Stacks filtered out low-quality reads with Q scores less than 20. The remaining reads were mapped to the tomato reference genome SL2.50 (Fernandez-Pozo et al., 2015) using the “Map Reads to Reference” tool in the CLC Genomics Workbench software. Considering that genetic variation between the tomato reference genome S. lycopersicum and S. pimpinellifolium is larger than genetic variation within S. lycopersicum, mapping

parameters were set as 0.5 for the length fraction and 0.9 for the similarity fraction. The reads of the same individual in different lanes were merged. In the subsequent analyses using Stacks, the ref_map.pl command set the parameter –m (minimum read depth to create a stack) as 10, and the populations command set the parameter –p (minimum number of populations a locus must be present) as 75. SNPs with a minor allele frequency of less than 0.05 were further excluded, and a set of 24,330 SNP markers was obtained. This set of 24,330 SNP markers was utilized for the analyses of genetic variation, LD, Fst and AMOVA. Another SNP set without ‘redundant SNP markers’ was used to conduct the principal component analysis (PCA) and ADMIXTURE because these two matrices are expected to correct the structure in GWAS. To remove

‘redundant SNP markers’, we defined a sequencing unit as a sequencing region surrounding a PstI site, usually 186 bp long, which has at least one SNP with a minor allele frequency greater than 0.05 in the S. pimpinellifolium population. If more than one SNPs are located in a sequencing unit and they are in complete LD (r² = 1), only the first SNP is kept. This process resulted in a total of 19,993 SNP markers. ITAG2.4 gene model from SGN was used as the reference gene annotation.

2.2.4 Population differentiation

PCA was performed in TASSEL5.0 (Bradbury et al., 2007). ADMIXTURE was completed following the manual; the best K was determined following the procedure of cross-validation in the manual (Alexander, Novembre, & Lange, 2009). Pairwise Fst

(Weir & Cockerham, 1984) and analysis of molecular variance (AMOVA) (Excoffier, Smouse, & Quattro, 1992) were conducted in the R package StAMPP (Pembleton, Cogan, & Forster, 2013).

2.2.5 Isolation by distance

Pairwise genetic distance was measured by Rogers’ distance (Rogers, 1972).

Geographic distance was calculated by the R package geosphere (Hijmans, 2016). The significance of the correlation between pairwise genetic distance and geographic distance was examined by the Mantel test in the R package adegenet with 1,000 permutations (Jombart, 2008).

2.2.6 Estimate of genetic variation and LD

Genetic variation within overall accessions and within each of the seven groups was assessed based on observed heterozygosity and the within-population gene diversity (expected heterozygosity) using the R package hierfstat (Goudet & Jombart, 2015). Pairwise r² values between SNP markers were calculated to assess overall extent of LD via plink1.9 within a 1-Mb window (Gaunt, Rodríguez, & Day, 2007) and fit by non-linear regression (Remington et al., 2001). The baseline of the r² value was set at 0.1 (Bauchet et al., 2017). The local LD along each chromosome was assessed as following: for each pair of consecutive sequencing units (defined in the section of SNP calling), the average r² was calculated between two SNPs in different sequencing units and plotted along the left PstI cutting site based on the physical position. The heterochromatin regions were marked according to the genetic map of EXPIM 2012 and the physical map of the tomato reference genome (Sim et al., 2012).

2.2.7 Analysis of SolCAP array data of S. pimpinellifolium

The SolCAP data of 214 samples of S. pimpinellifolium were downloaded from previous studies (Blanca et al., 2012; Blanca et al., 2015; Sim et al., 2012). A set of 2,934 bi-allelic polymorphic SNPs was extracted after filtered with the criteria that

minor allele frequency is more than 0.05 and the proportion of missing genotypes is less than 25%. We dropped 627 SNP markers that are reverse-complement allele designation, resulting in a set of 2,307 SNPs with consistent allele designation among these studies.

This set of 2,307 SNPs was utilized in the analyses of ADMIXTURE and isolation by distance following the same procedures described in the sections of population differentiation and isolation by distance. Meanwhile, because some accessions were genotyped in more than one SolCAP studies, different suffixes—“_2012S,” “_2012B,”

and “_2015B,”—were added to the sample name to indicate their original references Sim et al. 2012a, Blanca et al. 2012, and Blanca et al. 2015, respectively. Also, the percentage of identical SNP genotypes of the same accessions were calculated based on the 2,307 SNP genotypes without missing values.

在文檔中醋栗番茄全基因體分子標誌之開發與探究控制番茄雄蕊長度之候選基因 (頁 41-45)