Microsatellite data analysis - Materials and methods

2. Materials and methods

2.3 Microsatellite data analysis

2.3.1 The reference genome

The TFA reference genome was sequenced and assembled by our lab. There are three versions. The first version was assembled de novo from Illumina Solexa data (Bentley et al., 2008) derived from DNA extracted from a single TFA haploid male. The second one was assembled de novo from DNA extracted from a worker pool and sequenced using the PacBio Single Molecule, Real-Time (SMRT) Sequencing platform (Eid et al., 2009). This worker pool was from another colony and may include three different alleles. The last one was another assembly from the raw-data of the second version but with the potential alternate allele contigs included (C.C. Lee and J. Wang, personal communication). The third assembly is also the sequence which was analyzed, see below.

2.3.2 Primer design

In a RIFA study, 15 microsatellite primers were used to help map the sex determination locus (Huang et al., unpublished). Of these, 11 were near the target site and 4 others were on different chromosomes. To check if these primers could be used in this study, their sequences were compared to the TFA draft genome using the nucleotide basic local alignment search tool (blastn) (Altschul et al., 1990). Thirteen of the primer pairs were identical to at least 15 nucleotides at the 3' end, suggesting that they could be used in TFA. PCR tests for these 13 primer pairs all produced amplification products. For the fourteenth primer pair (Sol20), the fifth nucleotide from the 3' terminal in TFA is divergent from that in RIFA but PCR tests still yielded a product. The primers for the fifteenth primer pair, SDL6, differed at the last nucleotide at 3' end, so it was excluded in this study (Appendix 2).

For these 14 useable primer pairs, a fluorescent dye was added to the 5’ end of each

forward primer while a “PIG-tail” sequence (5'-GCTTCT) was added to the 5' end of the reverse primers to facilitate accurate genotyping (Brownstein et al., 1996). The primers were divided into six groups according to their fluorescence marker, size, and PCR difficulty (e.g., GC content; see also Appendix 3).

2.3.3 DNA extraction

Since there should be 2 worker genotypes in each presumably singly-mated queen colony, five random individuals were initially chosen from each colony to reduce the probability of obtaining only one genotype. Additional individuals were sampled if initial genotyping results were unclear. In total, 4 to 16 individuals were successfully genotyped per colony. Each sample was placed in a tube with 300 μl of a solution containing LabTurbo buffer LTL (195 μl), PBS (75 μl) and proteinase K (30 μl of 600 mAU/ml).

Next, samples were snap frozen in liquid nitrogen prior to homogenization. Then, samples were homogenized by adding ceramic beads and beating with a bead shaker or by grinding with a plastic pestle. Subsequently the samples were heated overnight at 56⁰C and then DNA was extracted using the LabTurbo® Genomic DNA mini Kit. The DNA was eluted in a final volume of 40 μl and stored at -20⁰C.

2.3.4 Multiplex PCR

The general polymerase chain reaction (PCR) mix was 10 μl, consisting of 1 μl of 0.29 to 159 ng/μl of DNA, 1 μl of 10x Super-Thermo Gold Buffer, 0.8 μl of dNTPs (2.5 mM each dNTP), 0.1 μl of Super-Thermo Gold Taq DNA Polymerase (5 U/μl), 0.2 μl each of each set of forward and reverse primers (10 mM/μl), and water for the remaining volume. For the GC-rich microsatellite set, the PCR mix was modified with 2 μl of 5x Q-solution (Qiagen) replacing an equal volume of water. PCR reactions consisted of an initial denaturation temperature of 94⁰C for 10 min, followed by 10 “touchdown” cycles,

25 normal cycles, and a final extension at 72⁰C for 30 min. The touchdown cycles consisted of denaturation at 94⁰C 30 sec; annealing at 60⁰C for 45 sec and decreasing 0.5⁰C per cycle; and then extension at 72⁰C for 1 min. The normal cycles were:

denaturation at 94⁰C for 30 sec, annealing at 55⁰C for 45 sec and extension at 72⁰C for 1 min. PCR reactions were carried out in an ABI 9700 thermal cycler (Applied Biosystems).

All PCR reactions were checked by gel electrophoresis using 1% TBE agarose gels. The products of successful PCRs were then analyzed (below).

2.3.5 Capillary electrophoresis and microsatellite allele identification

Capillary electrophoresis of the multiplex microsatellite PCR products was carried out by Genomics BioSci & Tech. Because the primers used for PCR sometimes produced multiple peaks, microsatellite allele identities were called based on the following rules, ordered by importance. First, the signal was variable among individuals. Second, male samples exhibited only one peak and female samples had at most two peaks. Third, when there were “shadow” peaks, the strongest signal was chosen. While collecting diploid individual data, the strongest 2 different peaks with the same shadow pattern were chosen or it would be identified as 2 copies of the same allele. Fourth, if there was more than one peak with the strongest signal, then the peak whose size was the largest was chosen.

Microsatellite allele calls are in Appendix 4.

2.3.6 Hardy-Weinberg test

In a singly mated, single-queen ant colony, there are two potential female genotypes, which differ at the maternal allele inherited. Using all the genotype data from a colony can create two biases. First, since both worker genotypes will inherit the paternal allele from the haploid father, analysis of all worker data will overweigh the paternal allele data by two. Second, multiple individuals from one colony will cause pseudo-replication. To

avoid these cases, data was subsampled as follows. One random sample was chosen from each of the 35 colonies to construct a new dataset. This dataset was tested for departure from Hardy-Weinburg (HW) equilibrium using the “hw.test” function from the pegas package in R (alpha threshold was 0.05) (Paradis, 2010). Subsampling and HW testing was repeated 10,000 times. The final P-value was defined as 1 – [(counts of rejecting H0)/replicate number]. All tests were conducted in R version 3.3.1 (R Development Core Team, 2016).

2.3.7 Simulation data for the Hardy-Weinburg test

To check the statistical power of the Hardy-Weinburg test, a simulation was created in R. Three parameters were considered in this simulation: allele number, sample size, and the exact p-value. The last one was calculated as the extreme probability that the queens of all samples were heterozygous and not match-mated with allele frequencies weighted by total allele number. In the simulation, suppose there are n alleles in the population, then ¹

𝑛 was used as the frequency for each allele because the probability for heterozygotes is highest. Additionally, the simulation chooses one diploid individual per colony as the sample. The calculation of the exact p-value is described as follows:

1. Suppose the mother's genotype is heterozygous and father's allele is a third (different) allele, then the frequency of this mated female is:

Fij = fi ∗ fj ∗ (1 – fi – fj )

Here, fi and fj are the respective allele frequencies for alleles i and j and equal to ¹

𝑛. In this situation, the sum of these probabilities is:

Σi (Σj ( Fij )) = n² ∗ ( ¹

𝑛 ) ²∗ ( 1 - ²

𝑛 )

2. When the parents' genotypes only include 2 different alleles (e.g, i and j), there are 6 possible genotype combinations. In 2 combinations, the mother is a homozygote and the father has a different allele, resulting in all heterozygous progeny. In the remaining 4 combinations, the mother is a heterozygote and the father’s allele is the same as one of the mother’s alleles; all 4 of these will yield half homozygous diploid males. These homozygous probabilities are excluded from the final sum.

ProbHo = ( ⁴

Where k is the total number of alleles and each allele’s frequency is equal to ¹

𝑛 . 3. Finally, the formula of the exact p-value is:

Σi (Σj ( Fij )) - ProbHo = ( ^𝑛−2

𝑛 ) ∗ ( 1 - ¹

3𝑛 )

2.3.8 The determination of phased data and heterozygosity power analysis Due to the haplodiploid system, phased data could be identified easily by the allele frequency from the same family. Generally, the allele frequency is 1 for alleles from the father and 0.5 for those from the mother.

The microsatellite genotypes from female samples were used to construct haplotypes.

I considered haplotypes composed of 3, 4, and 5 adjacent loci. Since the CSD mechanism should only produce heterozygous females (and homozygous/hemizygous males), these haplotypes were examined for consistency with this model. The haplotypes were built using only phased data in the region to avoid noise such as recombination between haplotypes, at the cost of reduced sample sizes. Each haplotype was compared with the

other haplotype from the same individual directly. If there was a locus which was never homozygous, I would calculate the exact p-value to express the relationship of this candidate region and the null hypothesis (Hardy-Weinberg equilibrium). Similarly, in the simulation, the exact p-value was calculate as the extreme probability that the queens of all samples were heterozygous but the precise allele number in all samples were used.

Haplotype building and comparisons were performed in R.

在文檔中熱帶火蟻之性別決定機制探討 (頁 23-28)