• 沒有找到結果。

CHAPTER 3: Results

4.4. The role of TEs in gene duplication

Because we found both TE and CHC enzyme duplications, we considered the possibility that CNV TEs may have been involved in some of the CHC duplication. This role of TEs was shown in previous studies (Feschotte & Pritham, 2006) for human L1 elements retrotranspose DNA (Moran et al., 1999) and plant Mutator-like TEs (Jiang et al., 2004). In D.

melanogaster, Helitron transposons have been implicated in the duplication of desaturases (Fang et al., 2009). We found Mariner, LTR transposons, and P elements flanking some of the CHC genes (i.e., FASs, reductases, and cytochrome P450s) but most of the flanking TEs are >10 kb awa. Mariner and LTR transposons have cargo sizes < 10 kb (Bowen

& McDonald, 2001; Munoz-Lopez & Garcia-Perez, 2010) but P elements can carry fragments up to 180 kb (P. Zhang & Spradling, 1993). Thus, P elements could plausibly be responsible for the duplications of reductases and cytochrome P450s, where we found flanking P elements (Figure 3.7).

CHAPTER 5 Conclusions

In this study we characterized genes with copy number differences between SB and Sb genomes in the fire ant, and we especially highlight the presence of genes with greater copy number in Sb.

Our results revealed two types of genes with greater copy number in Sb:

TEs and TE genes. While TE accumulation was expected for non-recombining regions, we were surprised to find many non-TE CNV genes.

We found a weak linear correlation between DNA TE increased copy number in Sb (but not for LTR and LINE retroelements) and their expression level in SB/Sb queens (i.e., overexpression in SB/Sb respect to SB/SB virgin queens, and absolute expression in SB/Sb virgin and mature queens). Furthermore, TEs with know DE between SB/SB and SB/Sb females, did not show corresponding copy number variation is qPCR analysis. This finding suggests that, while some TEs may be actively proliferating, others have been suppressed or perhaps even co-opted.

More factors (e.g., age of TE in genome) need to be considered to explain the difference in TE copy number between SB and Sb.

Besides TEs, we found 115 non-TE genes with higher copy in Sb.

Enzymes putatively responsible for CHC synthesis were highly represented among CNV genes. We propose that some of these genes are possibly involved in the synthesis of differential CHCs in SB/SB and SB/Sb queens, thus leading to the generation of queens with different odors, an important trait leading to monogyne and polygyny. We highlight two candidate genes possibly responsible for the different CHC

profiles in SB/SB and SB/Sb queens, a desaturase and an elongase, and analyze their genomic copies and expression in details.

Other non-TE CNV genes encoded for FPPSs, which synthesize a JH precursor; OBPs and ORs, involved in odorant reception; zinc finger and coiled-coil containing proteins, which are putative transcription factors;

and toll-like receptors, involved in innate immunity. These finding suggests that genes beneficial for polygyne ants have accumulated on the Sb supergene.

Other non-TE CNV genes encoded for histones, kinases, and serine proteases. As for now, we don’t know what type of advantage they may have for Sb-carrying organisms, if any.

Further analyses, such as gene knock-down coupled with CHC profile analysis are necessary to confirm the role of the desaturase and elongase genes and their duplications. Recently, we were able to perform CRISPR/Cas9 mediated mutagenesis in the fire ants (Chiu, Hsu, Chang, Huang, & Wang, 2019); this can be used to target our genes of interest and assess mutant phenotypes such as altered CHC profile and changed behavior in the colony. In particular, we propose that the duplicated copies of desaturase and elongase, Sinv_desatA1_b and Sinv_eloB, respectively, are playing a role in the synthesis of differential CHCs. It is possible to distinguish these duplicated copies form the putative originals, Sinv_desatA1_a and Sinv_eloA, and use their conserved nucleotide and protein differences to further analyze these candidate genes.

!

Tables

Table 2.1. Details of the 13 Illumina RNA-seq libraries used for de novo assembly of the S. invicta transcriptome.

M, monogyne; P, polygyne

‡ PE, paired-end; SE, single-end

Table 2.2. Statistics for the S. invicta transcriptome assembly.

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

! Total Trinity 'genes': 62,189

Total Trinity transcripts: 91,409

Percent GC: 40.4%

Stats based on ALL transcript contigs:

Contig N50: 3,456

Median contig length: 758 Average contig: 1,664.6 Total assembled bases: 152,162,136

Stats based on only the LONGEST isoform per 'gene':

Contig N50: 2,198

Median contig length: 606 Average contig: 1,206.5 Total assembled bases: 75,029,484

Table 2.3. Results of re-mapping by bowtie2 of paired- and single-end RNA-seq reads on the assembled transcriptome of S. invicta.

!

Count Percentage Paired-end reads

proper_pairs 181,865,664 97.1%

improper_pairs 4,548,989 2.4%

left_only 698,622 0.4%

right_only 262,772 0.1%

total 187,376,047

Single-end reads

single-end 161,474,144 82.8%

total 194,914,989

!

!

Table 2.4. Information for the DNA-seq raw reads from 8 pairs of SB and Sb haploid brothers, used to detect copy number variation between the SB and Sb alleles.

Pair Genotype NCBI SRA accession number

Total Reads Read length

Pair1 SB SRR620242 27,970,177*2 101

Pair1 Sb SRR620546 28,303,372*2 101

Pair2 SB SRR620547 27,807,586*2 101

Pair2 Sb SRR620564 25,586,209*2 101

Pair3 SB SRR620565 22,867,198*2 101

Pair3 Sb SRR620566 21,842,603*2 101

Pair4 SB SRR620567 22,458,087*2 101

Pair4 Sb SRR620568 22,781,933*2 101

Pair5 SB SRR620570 24,404,068*2 101

Pair5 Sb SRR620571 23,304,756*2 101

Pair6 SB SRR620572 23,527,787*2 101

Pair6 Sb SRR620573 30,151,694*2 101

Pair7 SB SRR620574 26,065,492*2 101

Pair7 Sb SRR620577 23,699,662*2 101

Pair8 SB SRR057617, SRR057620, SRR057622, SRR057624, SRR057626

78,101,566*2 50

Pair8 Sb SRR621118 343,209,818*2 50

!

Table 2.5. Information and primers for the 13 genes and two internal controls used to validate CNV analysis results by DNA qPCR.

CNV analysis values, where positive and negative values represent genes duplicated in Sb and SB, respectively.

Significant CNV differences indicated with * (CNV analysis, p ≤ .05 in 75% of biological replicates for at least 25% of the gene length, binomial test with false discovery rate BH).

ID Description Log2 ratio of

Sb:SB copy number†

Forward primer Reverse primer

Copia-12_SI-I LTR/Copia-12 1.19* GCGACTGCCCATTTCAATTTAA TTTGAGTTTCCTCGAGATGTACTCTCT Gypsy-13_SI-I LTR/Gypsy-13 0.98* GCATAGAACGTCAGCACCGTATC AAGAACCTACCGCATTTTCTGCTA Helitron-2_SIn DNA/Helitron-2 1.11* AGGAATTGATCGACGGCGATA AGTTCGAAAAACGGCTGCAACT

BEL-1 LTR/Pao BEL-1 2.38* CGAGAACAACAAATTGCTTGGAA TTCCCATAGTCCTCCGCAATG

TRINITY_DN32630_c2_g2_i2 LTR/Gypsy-12 2.36* CAGCAAAAGCGGGTACGAATT CCGCGAAACATCCATGAGGTT TRINITY_DN28494_c0_g1_i1 LINE/RTE-1_DF 1.21* CCTGCCTAACGCCCTTCAAGAT TGCACGATGCTTCACAATGG TRINITY_DN28494_c2_g1_i2 LINE/RTE-1_DBp 1.78* TTAGTTGCGCTAGTCTTGATGTTTCA GCAAGGAAACGGACAAGAATCG TRINITY_DN27565_c2_g1_i1 LINE/I-1_PBa 1.86* AAAGGTTGGTATGGTGGATGGAA CACGACTTGCCTCTCCTGAA TRINITY_DN32715_c1_g1_i1

DNA/CMC-Chapaev-3

0.88* GACAGCCGTGCAGTTGATCATC TCCTGGCTCATAATGGGTACGA TRINITY_DN32186_c2_g3_i1

DNA/Mariner-11_AEc

0.82* CGACGAACAAGTCATCCAACAAG CACCTAGTGCTCGTGTGTGGATT

TRINITY_DN32908_c0_g1_i2 DNA/P-26_HM 1.21* CAATTACTAAGAGATTTGACGGGAACAC ATCCTTACATTTTTACTCTCCAGCTGCTT gnl|Sinv_1.0|SINV24774 TD and POZ

domain-containing 3-like

1.98* AAATTGTTGTGCGATGCCAGTAA CATATCCGTCGCATTTAATTCCTTT

SI.MKN.04583 DNA/piggyBac 0.38 ATCATGAGCAAGAACCGCTTTG TGTGCTGGTCTGGTCACTGCAT

Internal Control 1 40S ribosomal protein S9

GAAGCCCATATGAAGCTCGATT GTCGCTCCAAGAAGTCCTCAAT Internal Control 2 Elongation factor

1-alpha

CTGGCTGGCACGGAGACAATAT ATCCCTTGAACCAGGGCATT

!

Table 2.6. Information and primers for TEs with known expression differences between SB/SB and SB/Sb queens, for which we evaluate copy number difference between SB and Sb by DNA qPCR. The two internal control primers are reported in Table 2.5.

Genes upregulated in SB/Sb versus SB/SB queens (Q) and workers (W), and SB/SB versus SB/Sb queens.

CNV analysis values, where positive and negative values represent genes duplicated in Sb and SB, respectively.

Significant CNV differences indicated with * (CNV analysis, p ≤ .05 in 75% of biological replicates for at least 25% of the gene length, binomial test with false discovery rate BH).

ID Description Upregula

ted in†

Log2 ratio of Sb:SB copy number‡

Forward primer Reverse primer

SI.MKN.04267 BEL12_ag transposon polyprotein SB/Sb Q and W

0.18 TCGAGCGCGAACATCGTAGAT GCCGGCGAATAGAGTATAATAATGC SI.MKN.04328 Nuclease harbi1-like 1.99* CTGTATTGGCAAATCATGGAGGTC ATAACCGAGACGAGCTCCAAACTC SI.MKN.00039 Nuclease harbi1-like 0.56 TCTTGCCACAGGCGATTTACC CTTCTCTTACTGTGGATTCACCAACTCT

F04BEA1219 piggyBac 6.22* GGGACCTGTTCATCGAAAATTGTACC CAAGAAGCTGTTCACCCACAGTAC

SI.MKN.02387 Retrotransposon ty1-copia subclass

SB/Sb Q

0.49 ATTGATGACTACTCGCGTTTGG CCAGACAATGACCAGTTTGATCTT SI.MKN.04299 Nuclease harbi1-like 0.29 TCACAGAACGTTGATCACACACTGT CTCTGATTGACGAAAAACTCGTTGA SI.MKN.04107 TE reverse transcriptase 0.22 CGGACTCGTTTTATGCGAAGAA CGGTTCGTTCACATACTGTTTCATG SI.MKN.05057 Transposon, probable -0.02 TTTGAGGTAAATTCTTGAGCGTTGT GGTATTCATGGCTCCGATATTTTG SI.MKN.03796 Retrovirus-related pol polyprotein

from transposon 412 -0.01 CCTATGGTCGAGAAAGCCGTTT TGCAGCAGGATACCATCAAATCC SI.MKN.05121 Transposase (Fragment) 0.37 TCATATTGTGGCCGTTTTTCG GGAACGGTATCGACTTCAATTGAT SI.MKN.90403 Mariner transposase SB/SB Q 0.08 CCTTCTTGACCGTTATTCCTGGTT GTGGGAAAATGGACACTTACGACA SI.MKN.00200 Mariner transposase 0.07 GGTGACGAAACATGGGTTTACG TGTCTCCACTGGGACGATTGA SI.MKN.04695 Mariner Mos1 transposase 0.13 TGAACTTGGCATTGCATATGG GTCTCATACCCAAAACATCAACCA

!

Table 2.7. Information and primers for the five genes and two internal controls used to validate CNV analysis results by DNA ddPCR.

CNV analysis values, where positive and negative values represent genes duplicated in Sb and SB, respectively. Significant CNV differences indicated with * (CNV analysis, p ≤ .05 in 75% of biological replicates for at least 25% of the gene length, binomial test with false discovery rate BH).

ID Description Log2 ratio

of Sb:SB copy number†

Forward primer Reverse primer Probe

gi|751225391|ref|XM_011

SI.CL.01.cl.0146.Contig1 Elongase 1.02* GTGGCTTCTTTCTATCAA ATACCA

F04BEA1219 piggyBac 6.11* ATCCGCAACCATTTGATT

TGA

TACGGATCTTCGCAAAGAAT AC

ACCAAGTTGATGTTGTGCACTCC ACCT

Internal Control 1 40S ribosomal protein S9

CGATCAGGAGTTACGCAT CA

TCGGGTCCTTCTCTTCCA CTCACGAGCGGCCTTGCGGATC Internal Control 2 Elongation factor

1-alpha

GCTCCTGGACATAGAGAT TTTATC

GGGTTTGTCCGTTCTTTGAA TACTGGAACCTCGCAGGCTGACT GT

!

Table 2.8. Details of the 4 Illumina RNA-seq library pairs used for the gene expression and DE analysis of CNV genes.

Each library is composed of a pool of 2 dealated egg-laying virgin queens, 25 days after eclosion. Each pair is composed of virgin queens from the same polygyne colony having similar weight, with alternate genotype. All libraries were paired-end strand-specific.

!

Library Genotype Weight

(mg): No. of pooled individuals:

NCBI SRA

accession: Total Reads Read

length Total Reads

After Processing Read length After

Processing Pair 1-SB/Sb SB/Sb 8±0.85 2 SRR9045851 21,388,963*2 126 21,266,990*2 50-126 Pair 1-SB/SB SB/SB 7±1.63 2 SRR9045852 19,018,920*2 126 18,827,849*2 50-126 Pair 2-SB/Sb SB/Sb 9±0.07 2 SRR9045857 21,471,313*2 126 21,311,378*2 50-126 Pair 2- SB/SB SB/SB 9±0.14 2 SRR9045850 19,690,956*2 126 19,501,211*2 50-126 Pair 3-SB/Sb SB/Sb 8±1.34 2 SRR9045855 20,392,588*2 126 20,258,231*2 50-126 Pair 3- SB/SB SB/SB 7±0.28 2 SRR9045856 19,281,235*2 126 19,152,680*2 50-126 Pair 4-SB/Sb SB/Sb 10±1.20 2 SRR9045853 18,880,571*2 126 18,746,367*2 50-126 Pair 4- SB/SB SB/SB 10±2.26 2 SRR9045854 21,575,400*2 126 21,440,748*2 50-126

Table 3.1. Number of total genes per gene category in nr-sinv-ref compared to genes with CNV between the SB and Sb genomes, when using a log2-fold threshold of ≥1 or ≤-1 (2-fold). We report the gene categories with at least 3 CNV genes. Gene categories which are over-represented among CNV genes are indicated (Fisher exact test with Bonferroni correction: *p < .05, **p < .01, ***p < .001; symbols in parenthesis are the results with a non-curated list of CNV genes).

Gene Category: Total

genes in nr-sinv-ref:

Genes with CNV between Sb and SB:

Total Sb >

SB

SB >

Sb

DNA transposon 4,376 61* 57 4

LTR retrotransposon 2,887 51***(***) 49 2

LINE retrotransposon 1,815 19 18 1

Total TE 131(not tested) 124 7

Fatty Acid Synthase (FAS) 119 13***(***) 11 2

Reductase 530 8 8 0

Cytochrome P450 278 6 6 0

Zinc Finger 509 6 6 0

Kinase 608 3 2 1

Farnesyl Pyrophosphate Synthase (FPPS) 44 5***(***) 4 1

Histone 47 4*(*) 4 0

Coiled-Coil Domain-Containing protein 110 3 3 0

Toll-like receptor (TLR) 6 3***(***) 3 0

Serine protease 77 3 2 1

Odorant-binding protein (OBP) 26 3*(*) 2 1

Others (less than 3) 51(not tested) 45 6

NA 21(not tested) 19 2

Total non-TE 129(not tested) 115 14

Total Genes 27,279 260(not tested) 239 21

Table 3.2. Number of total genes per gene category in nr-sinv-ref compared to genes with CNV between the SB and Sb genomes, when using a log2-fold threshold of ≥0.6 or ≤-0.6 (~1.51 fold). We report the gene categories with at least 3 CNV genes. Gene categories which are over-represented among CNV genes are indicated (Fisher exact test with Bonferroni correction: *p < .05, **p < .01, ***p < .001; symbols in parenthesis are the results with a non-curated list of CNV genes).

† p values for zinc finger genes was significant for the non-curated list but not for the final curated list. This was because 10 zinc fingers were re-annotated as TEs in the curated list, due to conflicts between Uniref50 and RepeatMasker annotations. We consider this category as a false

Genes with CNV between Sb and SB:

52 kDa repressor of the inhibitor of the

protein kinase 40 3 3 0

Table 3.3. Number of total TEs per TE class/family in our nr-sinv-ref compared to TEs with CNV between the SB and Sb genomes, when using a log2-fold threshold of ≥1 or ≤-1 (2-fold). The results were obtained with a RepeatMasker analysis, and were not manually curated, so excludes homology information from NCBI nr database and Uniref50.

Hence the number of CNV TE class/family differs slightly from our curated list (Table 3.1). TE families which are over-represented among CNV genes are indicated (Fisher exact test with Bonferroni correction: *p

< .05, **p < .01, ***p < .001).

TE class/family: Total TEs in nr-sinv-ref:

TEs with CNV between Sb and SB:

DIRS-1 33 (<1%) RTAg2 31 (<1%) SART-1 16 (<1%) Tad1 13 (<1%) Daphne 12 (<1%)

other LINE 484 (5%) 5 (4%) 5 0

rRNA 6 (<1%) 1 (1%) 1 0

Simple_repeat 304 (3%)

SINE 11 (<1%)

Satellite 38 (<1%)

tRNA 2 (<1%)

Retroposon 1 (<1%)

RNA 1 (<1%)

scRNA 1 (<1%)

Low_complexity 3 (<1%) Other/DNA_virus 1 (<1%) ARTEFACT 3 (<1%)

Unknown 105 (1%)

TOTAL: 9,554 (100%) 123 (100%) 115 8

Table 3.4. Number of total TEs per TE class/family in our nr-sinv-ref compared to TEs with CNV between the SB and Sb genomes, when using a log2-fold threshold of ≥0.6 or ≤-0.6 (~1.51 fold). The results were obtained with a RepeatMasker analysis, and were not manually curated, so excludes homology information from NCBI nr database and Uniref50.

Hence the number of CNV TE class/family differs slightly from our curated list (Table 3.2). Over- and under-represented TE families are indicated (Fisher exact test with Bonferroni correction: p < .1, *p < .05,

**p < .01, ***p < .001).

TE

class/family:

Total TEs in nr-sinv-ref:

TEs with CNV between Sb and SB:

!

RTAg2 31 (<1%)

SART-1 16 (<1%)

Tad1 13 (<1%)

Daphne 12 (<1%)

other LINE 484 (5%) 6 (2%) 6 0

rRNA 6 (<1%) 1 (<1%) 1 0

Simple_repeat 304 (3%)

SINE 11 (<1%)

Satellite 38 (<1%)

tRNA 2 (<1%)

Retroposon 1 (<1%)

RNA 1 (<1%)

scRNA 1 (<1%)

Low_complexit y

3 (<1%)

Other/DNA_vir us

1 (<1%)

ARTEFACT 3 (<1%)

Unknown 105 (1%)

TOTAL: 9,554 (100%) 254 (100%) 238 16

!

Figures

! !

Figure 1.1. Depiction of the two social forms of fire ants (a) and the social chromosome (b).

Monogyne

SB/SB

SB

Polygyne

SB/SB SB/Sb

Supergene Gp-9

SB/SB SB/SB SB/SB SB/SB

SB/SB SB/Sb

SB/Sb

SB/Sb

SB/SB SB/Sb

SB/SB

SB/SB SB/Sb

SB/Sb

Sb SB

SB/SB SB/Sb

(a)

(b)

!

Figure 2.1. Flowchart of the strategy to obtain a collection of fire ant TEs and non-TE genes for CNV analysis.

13 RNA-seq from various

TE’s annotated for S. invicta in Repbase v21.01

† blastn, percent identity ≥95% and bitscore ≥200

!

Figure 2.2. Percentage of raw reads mapping against the custom list of fire ant TEs and non-TE genes (nr-sinv-ref) for each ! DNA-seq library. (a) Percentage of total mapped reads and properly paired reads. (b) Percentage of total mapped reads (black bar in A) divided into CenSol (the major centromeric satellite DNA repeat) or all other genes.

0 25 50 75 100

f1b f1B f2b f2B f3b f3B f4b f4B f5b f5B f6b f6B f7b f7B f8b f8B DNA library

% Mapped Reads

Mapped Properly paired

A

0 25 50 75 100

f1b f1B f2b f2B f3b f3B f4b f4B f5b f5B f6b f6B f7b f7B f8b f8B DNA library

% Mapped Reads

Centromere Others

B

(a) (b)

!

! Figure 3.1. Proportion of genes with copy number variation in SB or Sb, identified as transposable elements (TEs) or non-TE genes, when using a log2-fold threshold of: (a) ≥1 or ≤-1 (2-fold), and (b) ≥0.6 or ≤-0.6 (~1.51 fold).

TEs - Sb 48% (124)

non-TE genes - Sb 44% (115)

non-TE genes - SB 5% (14)

TEs - SB 3% (7) (a)

TEs - Sb 51% (241)

non-TE genes - Sb 40% (189)

non-TE genes - SB 6% (30)

TEs - SB 3% (15) (b)

!

Figure 3.2. Results of DNA qPCR (a) and ddPCR (b) to verify copy number variation of the in silico CNV analysis. For qPCR (a), copy number values are normalized to Mono SB which was set to one (1), but the actual haploid copy number for a gene in Mono SB could be >1. For ddPCR (b), copy numbers are absolute based on the assumption that the reference genes are single copy genes. ANOVA and post-hoc Tukey test: p < .1, *p < .05, **p < .01, ***p < .001. For each bar, sample size = 3; error bars = standard error; blue asterisks = significant copy number difference between Poly Sb and Mono SB, red asterisks = significant copy number difference between Poly Sb and Poly SB.

0

Higher Copy Number in Sb (CNV analysis)

Negative

Higher Copy Number in Sb (CNV analysis)

Absolute copy number

!

Figure 3.3. Results of DNA qPCR to detect copy number differences in 13 TEs with known DE between SB/SB and SB/Sb females. ANOVA and post-hoc Tukey test: p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001. For each bar, sample size = 3; error bars = standard error; values are normalized to Mono SB; blue asterisks = significant copy number difference between Poly Sb and Mono SB, red asterisks = significant copy number difference between Poly Sb and Poly SB. TEs that showed higher copy number in Sb in the in silico CNV analysis are indicated with a star.

Genes upregulated in SB/Sb versus SB/SB for both queens and workers (SB/Sb Q and W) or only in queens (SB/Sb Q); and genes upregulated in SB/SB vs SB/Sb queens (SB/SB Q) are shown.

0

!

Figure legend on next page…

0 5 10 15

2 4 6

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb reproductive queens (log2(FPKM+1))

DNA TE

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb virgin queens (log2(FPKM+1))

DNA TE LINE TE LTR TE

All value: p = 0.01*, R2 = 0.058 (p = 0.57, R2 = 0.003)

All value: p = 0.02*, R2 = 0.046 (p = 0.12, R2 = 0.020) log2(FPKM+1) of 1998 BUSCO genes (n=4): 6.15±1.99

log2(FPKM+1) of 1998 BUSCO genes (n=6): 6.15±0.042 p = 0.80, R2 = 0.001

log2(ratio of Sb:SB copy number +1)

log2(ratio of Sb:SB copy number +1) Absolute gene expression in SB/Sb virgin queens (log2(FPKM+1))Absolute gene expression in SB/Sb reproductive queens (log2(FPKM+1))

0 4 8 12

2 4 6

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb -vs- SB/SB virgin queens (logFC)

DNA TE

log2(ratio of Sb:SB copy number +1)

DE in SB/Sb versus SB/SB virgin queens (logFC)

2.0 4.0 6.0

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb virgin queens (log2(FPKM+1))

DNA TE

!

Figure 3.4. Comparison of TE copy number ratio (Sb:SB) vs DE between SB/Sb and SB/SB virgin queens (a), absolute gene expression in SB/Sb virgin queens (b), and absolute gene expression in SB/Sb mature queens (c). Copy number ratio is expressed as log2 ratio of Sb:SB copy number +1, n =8. DE is expressed as log2-FC, where positive values mean over-expression in SB/Sb and negative values means over-over-expression in SB/SB, n=4. Gene expression in SB/Sb virgin and reproductive queens is expressed as log2-FPKM+1, n=4 and n =6, respectively. p and R2 values (F-test, linear regression, p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001) are shown for DNA transposons, LINE, and LTR elements separately, and for all datapoints (in parenthesis are the result of linear regression removing one outlier datapoint).

!

! !

Figure 3.5. Chemical pathway for insect cuticular hydrocarbon synthesis (modified from Blomquist and Bagnères, 2010). For each enzyme family responsible for each step of the synthesis, we show the number of genes with greater copy number in either Sb or SB.

(Cytochrome P450s)

!

Figure legend on next page…

0

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb virgin queens (log2(FPKM+1))

Fatty acid synthase (FAS)

log2(FPKM+1) of 1998 BUSCO genes (n=4): 6.15±1.99

log2(ratio of Sb:SB copy number +1)

Absolute gene expression in SB/Sb virgin queens (log2(FPKM+1))

1.0 1.5 2.0 2.5

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb reproductive queens (log2(FPKM+1))

Fatty acid synthase (FAS)

log2(FPKM+1) of 1998 BUSCO genes (n=6): 6.15±0.042

log2(ratio of Sb:SB copy number +1)

Absolute gene expression in SB/Sb reproductive queens (log2(FPKM+1))

0.0

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb -vs- SB/SB virgin queens (logFC)

Fatty acid synthase (FAS)

log2(ratio of Sb:SB copy number +1)

DE in SB/Sb versus SB/SB virgin queens (logFC)

1.0 1.5 2.0 2.5

log2(ratio of Sb/SB copy number +1)

Gene expression in SB/Sb -vs- SB/SB virgin queens (logFC)

Fatty acid synthase (FAS)

!

Figure 3.6. Comparison of copy number ratio (Sb:SB) vs DE between SB/Sb and SB/SB virgin queens (a), gene expression in SB/Sb virgin queens (b), and gene expression in SB/Sb mature queens (c) for 27 genes with greater copy number in Sb encoding for enzymes putatively involved in the synthesis of CHCs. Copy number ratio is expressed as log2 ratio of Sb:SB copy number +1, n =8. DE is expressed as log2-FC, where positive values mean over-expression in SB/Sb and negative values means over-expression in SB/SB, n=4. Gene expression in SB/Sb virgin and reproductive queens is expressed as log2-FPKM+1, n=4 and n =6, respectively. P and R-sq values (F-test, linear regression, p < 0.1, *p <

0.05, **p < 0.01, ***p < 0.001) is shown.

!

! !

Figure legend on page 80…

SB genome

20k 2.5k 1.2k 8.5k 17k 14k

FAS 7 FAS 5 FAS 9 FAS 3 FAS 6

001216F FAS 3FAS 9FAS 5

11k

001006F FAS 5FAS 71.5k

12k TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

!

Figure legend on page 80…

Sb genome

000891F 10k

000130F

000625F

000847F

chr 8 9k 28k 30k 115k

chr 4 6k 15k 6M 243k 256k

SB genome

70k

000860F 4.5k

000599F

chr 8 28k 67k 89k 31k

000591F 56k

chr 4 6k 15k 9M 73k 4k 68k

(b)

FAS$top$#4$and$#8*,**$(gi|751215671|ref|XM_011162450.1|,$gi|751202802|ref|XR_851311.1|)

*for$DE$between$SB/Sb$and$SB/SB$virgin$queens$

**$More$than$one$gene$is$represented$as$one.$These$genes$are$always$found$near$

each$other,$possibly$a$single$gene$was$erroneously$annotated$as$mulTple$genes.

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

DNA/Mariner (TRINITY_DN23752_c0_g1_i1)

DNA/Mariner-28 (SiJWG08BDZ.scf) DNA/Harbinger (TRINITY_DN32931_c4_g4_i1)

DNA/Mariner (TRINITY_DN32147_c3_g2_i3)

DNA/P (TRINITY_DN31713_c5_g1_i1) LTR/Copia (TRINITY_DN33044_c0_g1_i6)

LINE/Penelope (SiJWG07BAM.scf) LTR/Copia

!

!

! !

Figure legend on page 80…

000344F (chr3)

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

!

! Figure legend on page 80…

000344F (chr3)

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

!

! Figure legend on page 80…

! !

TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

!

Figure legend on page 80…

SB genome

Sb genome

000198F

chr1 75 k 9.5 k

210 k 68 k

chr16

272 k 187 k 477 k 19 k

4 k 9.5 k

7.5 k

000870F chr1 63 k

chr16

5 k 6 k

6 k

2.5M 70 k 140 k

37k 20k 125k 300k 99k 116k 215k

(f)

Cytochrome*P450*#1**(gi|751240847|ref|XM_011176180.1|)

Cytochrome*P540*#2**(gi|751232988|ref|XM_011171843.1|)

*for*DE*between*SB/Sb*and*SB/SB*virgin*queens TEs more frequently surrounding CHC genes:

DNA/Unknown (TRINITY_DN30164_c1_g1_i1)

DNA/Mariner (TRINITY_DN23752_c0_g1_i1)

DNA/Mariner-28 (SiJWG08BDZ.scf) DNA/Harbinger (TRINITY_DN32931_c4_g4_i1)

DNA/Mariner (TRINITY_DN32147_c3_g2_i3)

DNA/P (TRINITY_DN31713_c5_g1_i1) LTR/Copia (TRINITY_DN33044_c0_g1_i6)

LINE/Penelope (SiJWG07BAM.scf) LTR/Copia

!

Figure 3.7. Genomic location of the 17 CHC genes with greater copy number in Sb and higher expression in SB/Sb relative to SB/SB virgin queens; and their most frequently flanking TEs. Grey scale boxes

Figure 3.7. Genomic location of the 17 CHC genes with greater copy number in Sb and higher expression in SB/Sb relative to SB/SB virgin queens; and their most frequently flanking TEs. Grey scale boxes