• 沒有找到結果。

The nuclear and mitochondrial genome of Wiebesia sp3 and Wiebesia

2.1 Introduction

2.1.1 Applying genomics in fig wasp study

Fig-fig wasp mutualism is a classic example of coevolution, and had been extensively studied for decades. Studies of figs and their associated fig wasps range from natural history, stability of mutualism, community assembly, and speciation (E. A.

Herre et al., 2008). The closed system of fig syconia offers perfect opportunities for testing iconic evolutionary theories. Two such theories were sex allocation (Fellowes, Compton, & Cook, 1999) and parasite virulence evolution (E. A. Herre, 1993). In both cases, the fitness consequences, precision of adaptation and their association with fig wasp population structure are well-established. Although not affecting the power and beauty of both classic theory, clearly there are several knowledge gaps in understanding the full picture of population structure and its effects on fig wasps. For instance, exact mechanism (proximate cause) and evolutionary origin (ultimate cause) of sex allocation remained unknown. On the other hand, E. A. Herre et al., 2008 stated that differential selection pressure owing to parasite virulence in different wasp groups “provide remarkable opportunities to study the factors that shape fundamental processes in molecular evolution, …” Such statement provoked applying molecular tools on the classic system to answer fundamental evolutionary questions.

Another unsolved problem regarding fig-fig wasp coevolution is fine-scale co-speciation. Figs and fig wasps were roughly one to one in broad picture (Ficus sections vs agaonidae genus) (Cruaud et al., 2012). However, strict co-speciation rule does not apply in fine scale. There is mounting evidence that speciation of fig wasps is

generally faster than the associated fig hosts (Jackson, Machado, Robbins, & Herre, 2008; Machado et al., 2005). The pattern of gene flow in both hosts and pollinators during and after pollinator speciation, mechanism of reproductive isolation, and co-adaptation under fine-scale co-speciation are largely unknown.

Genomic study is a starting point to bridge the gap and to explore all these evolutionary questions. The only published genome belonging to agaonidae is Ceratosolen solmsi, pollinator of Ficus hispida (Xiao et al., 2013). C. solmsi genome provides first glimpse into how long term specialized mutualistic life style shaped its evolution. For example, C. solmsi has a very compact protein-coding gene set, with reduction in gene families associated with detoxification and chemosensory systems. It is also the C. solmsi partial mitogenome that elucidated a potential premating isolation system caused by Wolbachia (Xiao et al., 2012). Such studies showed the importance of genomic study on understanding the evolutionary history of organisms.

2.1.2 Study species and system

One potential system for studying speciation, local adaptation, and population structure effects on fig wasp are the pollinators associated with Ficus pumila. F. pumila is a widely distributed fig species in East Asia. There are two described, phylogenetically distinct varieties in Taiwan (H.-Y. Wang et al., 2013). The endemic F. p.

var. awkeotsang (jelly-fig) and the nominate F. p. var. pumila (creeping-fig). Jelly-fig is an economically important crop in Taiwan, for its dried drupes produce edible jelly, a popular dessert. Creeping-fig is a common gardening plant that had been widely cultivated, but it is also native to Taiwan.

Genetic study had shown that the two varieties and the three associated wasps (see chapter 1) fitted the classic scenario of which wasp speciation precedes fig speciation.

In this case, W. pumilae and W. sp1, pollinators of F. p. var. pumila came into speciation

through duplication (speciation of wasps but not their host), while there is no detectable genetic differentiation between their hosts (Yan Chen et al., 2012). Also, the estimated divergence time between F. p. var. awkeotsang and F. p. var. pumila is also much shorter than their pollinators (H.-Y. Wang et al., 2013). The different stages of recent codivergence makes them suitable model for fine-scale cospeciation and coadaption study.

The two fig varieties differed in several ecological traits, which possibly results in differential selection pressure in their pollinators during and after codivergence. These includes habitant preferences (Hsieh et al., 1993), population structure (Yong Chen et al., 2002), and floral scents (Y. Y. Chen & Wu, 2010; You-ling Chen et al., 2016). Jelly-fig occurred at altitudes around 1200-1900m, while creeping-fig occur in lowlands.

Transplantation of jelly-fig to lowlands proved viable for the plants, but collapse of pollinating fig wasp population was often noted by farmers. Phenotypic adaptation to cold environment and signatures of selection on mitochondrial COI gene in jelly-fig wasp were also noted (H.-Y. Wang et al., 2013). For population structural effects, although there are currently no ecological data about sex ratio or nematode virulence in these species, the huge gall flower number differences between two fig varieties suggest that there is a foundress number variation between species, making these species potential system for revisiting the two classic theory at molecular level. My study of fig wasp olfactory coadaptation with host floral scent is presented in chapter 3.

Here I present the draft genome of two Wiebesia species. W. sp3 (jelly-fig wasp), pollinator of F. p. var. awkeotsang, was de novo assembled with a combined approach using long read contigs as backbone and short paired-end reads for polishing. W.

pumilae (creeping-fig wasp) was assembled by a mapping-based approach using

mitochondrial genomes of both species from short reads, representing the most complete mitochondrial genomes in fig wasp. The genomes shed light on questions regarding the ecology, coevolution, and cospeciation of fig wasps.

2.2 Material and Methods

2.2.1 Biological materials for genomic DNA sequencing

Near D phase male jelly-fig syconia were collected in December 2017, from a jelly-fig farmland in Pingtung, Taiwan (altitude: 670M, 22.45 N, 120.43 E). The farmland was surrounded by natural forest, and the pollinating wasps were shared among cultivated and wild figs. Male creeping-fig syconia were collected in December 2017 from Chiayi, Taiwan (altitude: 90M, 23.33 N, 120.28 E).

The collected syconia were brought back to lab, which were then split in half by knife for the wasps to emerge. Living fig wasps were collected once they leave the galls, and were washed with double-distilled water. The fresh samples were stored in 100%

ethanol before DNA extraction. Fig wasps are haplodiploid, which means that male wasps came from unfertilized eggs, and only contains genetic information of all the foundresses from previous generation. In order to reduce the genetic diversity in samples, male individuals of jelly-fig wasp from one single syconium were pooled for DNA extraction.

Sex ratio is highly female-biased in agaonid wasps, and because creeping-fig bear much less gall flower per syconia than jelly-fig, collecting enough male individuals from one syconium to yield enough DNA was unsuccessful. As a consequence, female creeping-fig wasps from one syconium were used instead. Genomic DNA was extracted from pooled individuals using Purgene kit (Qiagen, USA) using protocol suggested by the manufacturer.

2.2.2 Biological materials for RNA sequencing

The following samples from five different life stages of each species were sequenced for gene prediction purposes: (1) adult female, (2) adult male, (3) pupal female, (4) pupal male, (5) larva. All samples were freshly taken and stored in liquid nitrogen before RNA extraction. RNA was extracted using TRIzol (ThermoFisher, USA) following the manufacturer’s protocols. Detailed information about collection site, sequencing depth, can be seen in Table 3.

2.2.3 Library construction and sequencing

Library preparation and sequencing were performed by Novogene, Beijing, China.

Separate libraries that had insert size around 300-500bp were prepared for jelly-fig wasp and creeping-fig wasp DNA. A total of 75’957’927 and 92’826’608 150bp paired-end reads were generated for each species using an Illumina Hiseq 4000 platform (Table 3).

The same extracted DNA from jelly-fig wasp was used in creating library for single molecule long read sequencing (SMRT). Single molecule sequencing was done on a PacBio Sequel platform and yielded a total of 28 Gb filtered polymerase read base.

RNA libraries with 400bp insert size were constructed and sequenced on Illumina Hiseq 4000 platform (Table 3).

2.2.4 Sequencing quality check

Raw Illumina pair-end (PE) reads from genomic sequencing of both species were quality trimmed by Trimmomatic v0.36 (Bolger, Lohse, & Usadel, 2014). The head and tail of reads were trimmed if they were below a quality score of 3. Reads with an average per base quality under 15 in a 4 base sliding window would be cut. Reads that were less than 36bp were discarded. Quality assessment was done by FastQC v0.10.1 (Andrews, 2010) before and after Trimmomatic trimming.

2.2.5 Taxonomic validation

Studies had reported that creeping-fig syconia may sometimes contain jelly-fig wasp individuals, although inviability and failure to leave syconia as adults were also noted (Jiang, 2011; H.-Y. Wang et al., 2013). The location where creeping-fig wasp were collected in this study is far from natural habitants of jelly-figs, so the chance of pooling individuals from different species was low. Several large scale molecular genetic surveys had confirmed fixed difference between the studied species in the mitochondrial cytochrome c oxidase subunit I (COI) gene (Yan Chen et al., 2012; H.-Y.

Wang et al., 2013). The high coverage of mitochondrial sequences within short reads library also made it suitable to test if individuals from different species were pooled, since every individual are highly likely to be sequenced. To validate the purity of pooled individuals, trimmed PE reads of creeping-fig wasps were mapped to the published COI sequence (National Center for Biotechnology Information (NCBI) accession:

JN184048.1) (Yan Chen et al., 2012) with BWA v0.7.10-r789 (H. Li, 2013) mem. The mapped bam file was then manually examined under the genome browser Integrative Genomics Viewer (IGV) (J. T. Robinson et al., 2011), to see if there were any foreign reads present.

2.2.6 Read decontamination and de novo assembly of jelly-fig wasp

SMRT reads of jelly-fig wasp were first assembled using wtdbg v1.1.006 (Ruan, n.d.). The parameters used were “ -k 0 -p 21 -S 4 --edge-min 4 “. SMRT read-based and PE read-based polish were applied to increase the accuracy of the initial assembly. For SMRT read-based polish, the reads were mapped on to the contigs by pbalign (“Pacific Biosciences,” n.d.) , and polished using arrow (“Pacific Biosciences,” n.d.). PE reads were aligned to arrow-polished contigs using BWA mem, and polished by pilon

v1.22 (Walker et al., 2014).

The initial draft contained contigs from symbiotic organisms that was revealed by FastQC report of reads (Figure 9) and the unusual bimodal distributed GC content of the initial assembly (Figure 11). Identification of contamination, and reads partition according to their biological source into bins were done using blobtools v1.01 (Laetsch

& Blaxter, 2017). Briefly, PE and SMRT reads were mapped to the polished contigs using BWA mem and blasr (“Pacific Biosciences,” n.d.) respectively. Polished contigs were blast (Altschul, Gish, Miller, Myers, & Lipman, 1990) against NCBI nucleotide database (“Database resources of the National Center for Biotechnology Information,”

2012), with parameter “-max_target_seqs= 10”. Given the blast result, each contig was assigned to a taxonomic position at every taxonomic rank by blobtools using the decontaminated reads. Decontaminated PE reads of jelly-fig wasp were used to estimate the genome size based on 21mer counts using jellyfish 2.0 (Marçais & Kingsford, 2011).

Plot of kmer frequency was draw using R (Figure 12).

Decontaminated jelly-fig wasp SMRT reads were assembled by wtdbg. To see if the decontamination leaded to over-cleaning and loss of genomic information, the decontaminated assembly was first polished the same way as described above. Busco v3 (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) genome-mode

polished initial and polished decontaminated assembly to compare the integrity of each assembly.

For further polishing the assembly, pilon was run in an iterative way until the number of changes in each round became saturated (total six times). According to pilon report, the pilon-polished genome still contained some level of misassembly that cannot be corrected. Thus, I used GATK v4.0.3 (Van der Auwera et al., 2013) on the alignment to perform variant calling based on the GATK Best Practices. I then update the assembly according to the called variants using GATK.

2.2.7 Mapping-based assembly of creeping-fig wasp

To characterize the amount of symbiotic DNA within the creeping-fig wasp library, the trimmed short reads were first de novo assembled with minia (-k 61). The assembly was then evaluated using blobtools. For mapping-based assembly, the jelly-fig wasp genome, which was de novo assembled, served as a backbone. PE reads of creeping-fig wasp were mapped to jelly-fig wasp genome using BWA mem. Variants from alignment file were called using samtools. Consensus sequence in fastq format was generated by bcftools v1.3 (H. Li, 2011) with a upper and a lower coverage cutoff. The minimal and maximum coverage for calling are a third and twice the mean coverage, as recommended by PSMC (H. Li & Durbin, 2011). The sequence fastq was converted to fasta by seqtk v1.2 (H. Li, n.d.). Because bcftools did not output indel information to the consensus, pilon was applied iteratively to further polish the creeping-fig wasp assembly.

To evaluate the assemblies, Quast v4.6.3 (Gurevich, Saveliev, Vyahhi, & Tesler, 2013) was used to calculate the assembly statistics; blobtools were used for detecting contamination, and busco for evaluating the completeness of orthologs found in the assembly. Contaminations detected by blobtools at this stage were excluded before

annotation.

2.2.8 Repeat annotation

Repeat searching based on homology was done by RepeatMasker v4.0 (Smit &

Hubley, n.d.-a), the database used was order Hymenoptera from Repbase (Bao, Kojima,

& Kohany, 2015). A de novo repeat library was constructed using RepeatModeler v1.0.11 (Smit & Hubley, n.d.-b), and analyzed by RepeatMasker. RepeatProteinMask provided by ReapeatMasker was used to identify transposable elements. Repeat-masked genome were created by bamtools v2.5.1 (Barnett, Garrison, Quinlan, Strömberg, &

Marth, 2011).

2.2.9 Gene prediction

The genes were predicted in several ways. For transcriptome based prediction, RNAseq reads from five life stages were mapped to the genome by hisat2 v2.1 (Kim, Langmead, & Salzberg, 2015). Each library’s alignment was assembled independently using stringtie v1.3 (Pertea et al., 2015). The assembled transcripts of each library were combined by stringtie merge command. Transdecoder v5.3 (Haas et al., 2013) was used to predict the ORF of transcripts, with the parameter “retain_blastp_hits”, which took account of blastp results of Uniprot (UniProt Consortium, 2012) database.

For homologous gene prediction, gene sets from five close hymenoptera species, C.

solmsi (Xiao et al., 2013), Nasonia vitripennis (Werren et al., 2010), Copidosoma floridanum (i5K Consortium, 2013), Trichogramma pretiosum (Lindsey et al., 2018), and Apis mellifera (Consortium, 2006) were downloaded from NCBI ftp.

Non-redundant gene set for each species was generated by taking the longest isoform as representative of each gene using a custom Perl script, which was then aligned them to the genome using genblasta v1.0.4 (She, Chu, Wang, Pei, & Chen, 2009). The aligned

region, along with their downstream and upstream 3 Kbp, was sent to genewise v2.2 (Birney, Clamp, & Durbin, 2004) for gene model prediction.

For ab initio gene prediction, augustus v3.3.1 (Stanke & Morgenstern, 2005), SNAP (Korf, 2004), geneid v1.3 (Blanco, Parra, & Guigó, 2007), fgenesh server (Solovyev, Kosarev, Seledsov, & Vorobyev, 2006), and glimmerHMM v3.0.1 (Majoros, Pertea, & Salzberg, 2004) were applied to repeat-masked assemblies.

Augustus was run under training parameters generated by the genome mode of busco using hymenoptera odb9 dataset. Intron hints generated from RNAseq data were also provided to augustus. Geneid and Fgenesh were run using built-in Nasonia parameters.

GlimmerHMM and SNAP were trained by C. solmsi gene set.

Protein alignment, RNAseq alignment, Transdecoder-predicted ORF, and ab initio predictions were sent to Evidencemodeler v1.1.1 (EVM) (Haas et al., 2008) for consensus gene calling. Different combinations of evidence weight were tested for each EVM run, and the result of each run were evaluated by the number of predicted genes, average gene length, and busco.

After getting a primary result of EVM. Genes that were identified as complete buscos from this first consensus gene set were extracted. These genes served as high confident gene models to re-train trainable ab initio predictors. A second round of EVM was done using the new re-trained predictions, along with other evidence used in the first round to generate a final EVM consensus gene set.

2.2.10 Manual gene curation

Web Apollo v2.1.0 (Lee et al., 2013) were used to manually curate the gene predictions. The gene models predicted by EVM, genewise, ab initio software, and transdecoder were loaded into Web Apollo as tracks. The manual curation mainly focused on fixing gene fragmentation or gene fusions, and eliminate those genes that

have low evidence support. The curation started from EVM predicted models, while taking transdecoder, genewise, augustus and other evidence into account. Renaming of the manually curated genes according to their coordinates was done using scripts provided by Maker2 v2.31.0 (Holt & Yandell, 2011).

2.2.11 Functional annotation

Blast2GO (Gotz et al., 2008) v5.2.1 pipeline was applied for functional annotation, which integrates blast and interproscan (Jones et al., 2014) annotation results and produces gene ontology (GO) annotation. Gene sets of the two species were blasted against the UniprotKB/Swiss-prot database. Interproscan were used to search the following databases: CDD-3.16,Coils-2.2.1 (Marchler-Bauer et al., 2015), Gene3D-4.2.0 (Lees et al., 2012), Hamap-2018_03 (Lima et al., 2009), MobiDBLite-1.5 (Necci, Piovesan, Dosztányi, & Tosatto, 2017), Pfam-31.0 (Finn et al., 2016), PIRSF-3.02 (C. H. Wu et al., 2004), PRINTS-42.0 (Attwood et al., 2012), ProDom-2006.1 (Bru et al., 2004), ProSitePatterns-2018_02 (Sigrist et al., 2012), ProSiteProfiles-2018_02, SFLD-3 (Akiva et al., 2014), SMART-7.1 (Letunic, Doerks, &

Bork, 2012), SUPERFAMILY-1.75 (de Lima Morais et al., 2011), and TIGRFAM-15.0 (Haft et al., 2012). Online server of BlastKOALA (Minoru Kanehisa, Sato, & Morishima, 2016) were used to map the gene sets to KEGG (M Kanehisa &

Goto, 2000) database.

2.2.12 Manual annotation of interested genes

Manual annotation of genes involved in chemosensory (see 2.6.1) and venoms were done using homology-based approach. The only Chalcid wasp whose venom profile had been well-studied is the model species N. vitripennis (Sim & Wheeler, 2016;

Werren et al., 2010). Blastp searches were done for the genes of jelly-fig wasp and

creeping-fig wasp using N. vitripennis venom genes as database, with an e-value cutoff at 1e-15. Additional annotation of signal peptide and transmembrane helix was done using SignalP (Petersen, Brunak, von Heijne, & Nielsen, 2011) and TMHMM (Krogh, Larsson, von Heijne, & Sonnhammer, 2001) online server.

2.2.13 Mitochondrial genome assembly and annotation

An initial attempt to retrieve the mitochondrial genome of jelly-fig wasp from de novo assembly turned out unsuccessful. Alternatively, mapping based iterative assembly with MITObim v1.9.1 (Hahn, Bachmann, & Chevreux, 2013) were used to assemble the mitochondrial genome from PE reads for both species. For computational efficiency, 5’000’000 PE reads were randomly sampled from Trimmomatic trimmed data using seqtk. The reads were then transformed into interleaved format using bash commands.

The reads and a seed sequence of COI from each species (NCBI accession of seed sequences: JN184048.1 (Yan Chen et al., 2012) for creeping-fig wasp;

HQ398112.1 (Liu et al., 2014) for jelly-fig wasp) were given to MITObim and assembled with the parameter “-paired”. To validate the assembly, the input reads were mapped back to the assembled contigs with BWA mem and manually checked under IGV. Regions near both ends where there was no coverage support were manually trimmed. The trimmed contig then served as a new seed to repeat the assembly process.

The second round assembly result was trimmed as mentioned above then polished with pilon. The assembled mitochondrial genomes were annotated using MITOS2 server (Bernt et al., 2013).

To study the evolution of mitochondrial synteny in chalcid wasps, mitochondrial genomes of all the families of chalcid wasps available on NCBI were downloaded and examined manually, which included C. solmsi (Agaonidae) (Xiao et al., 2012), N.

vitripennis (Pteromalidae) (Oliveira, Raychoudhury, Lavrov, & Werren, 2008),

Encarsia formosa (Aphelinidae) (Zhu et al., 2018), Eupelmus sp. (Eupelmidae) (Tang et al., 2019), Eurytoma sp. (Eurytomidae) (Tang et al., 2019), Torymus sp.

(Torymidae) (Tang et al., 2019), Brachymeria sp. (Chalcididae) (Tang et al., 2019), and Gonatocerus sp. (Mymaridea). The gene arrangements events were inferred based on maximum parsimony and their known phylogeny (Munro et al., 2011).

2.2.14 Orthology

Orthology between jelly-fig wasp, creeping-fig wasp, and C. solmsi were designated by best blast reciprocal hits (BBRH) of amino acid sequences implemented in proteinortho v5.16 (Lechner et al., 2011). To determine orthology in a broader evolutionary scale, the online service of OrthoDB was used to assign IDs from OrthoDB v9 to the gene sets of both species. OrthoDB online service was also used to plot the shared orthology between jelly-fig wasp, creeping-fig wasp, other Chalcidoidea species, and A. mellifera.

2.2.15 Evolutionary rates between Wiebesia species

All 9352 1:1 orthologs of creeping-fig wasp and jelly-fig wasp were aligned by codon using the aligner prank v.14 (Löytynoja, 2014), which left 9218 sequences with complete triplet ORF. The alignments were filtered by Gblocks 0.91b (Castresana, 2000) under the parameter: -t=c -e=-gb1 -b4=6 -b5=a. dN /dS ratio of filtered alignments were calculated using pairwise codeml from paml 4.9 (Z. Yang, n.d.), with parameter:

CodonFreq = 0. The two species are so closely related that many genes had no synonymous substitutions. To adjust for this matter, for any gene pair that has dS value equals to 0, an adjusted dN/ dS ratio was calculated using the genome wide median dS.

Gene pairs that had dS > 5 were excluded from downstream analysis to prevent poor alignments or saturation of mutations. A list of genes that had adjusted dN / dS > 1 were

created for GO enrichment analysis.

To detect signatures of positive selection, codeml site model was tested on all gene pairs using model M1a (neutral evolution) vs M2a (positive selection), and M7 vs M8. P values were calculated using log likelihood ratio test. Correction for multiple testing were done with Benjamini & Hochberg method (Benjamini & Hochberg, 1995) using R, with false discovery rate (FDR) set to 0.05. GO enrichment analysis were performed on the dN /dS > 1, M1a vs M2a, and M7 vs M8 significant sets using Blast2GO, with a FDR cutoff 0.05. GOs used for enrichment analysis were based on the GO annotation of jelly-fig wasp gene set.

To detect signatures of positive selection, codeml site model was tested on all gene pairs using model M1a (neutral evolution) vs M2a (positive selection), and M7 vs M8. P values were calculated using log likelihood ratio test. Correction for multiple testing were done with Benjamini & Hochberg method (Benjamini & Hochberg, 1995) using R, with false discovery rate (FDR) set to 0.05. GO enrichment analysis were performed on the dN /dS > 1, M1a vs M2a, and M7 vs M8 significant sets using Blast2GO, with a FDR cutoff 0.05. GOs used for enrichment analysis were based on the GO annotation of jelly-fig wasp gene set.

相關文件