• 沒有找到結果。

Balancing selection is supported in the homologous sequence in the hypervariable

3. Results

3.2 Balancing selection is supported in the homologous sequence in the hypervariable

3.2.1 The description of the hypervariable region and the nearby region and genes.

After aligning and comparing the sequences of TFA and RIFA together, the structure of this region which lacks homozygotes could be described (Fig. 3). The nucleotide diversity (π) and Tajima's D were also calculated (Table 4). Basically, like RIFA, the homologous hypervariable region (about 52.5 Kb) is between the same flanking genes THUMP (about 2 Kb) and EGF-like (about 1.5 Kb) and there are no other obvious genes in this range. In addition, based on two blocks of conserved sequences the hypervariable region was been divided into three sub-fragments: UFO, HVR and CoP. From the gene orientations, the first gene is EGF-like. This is a coding gene with epidermal growth factor-like domains. Then, next is one sub-fragment of the hypervariable region, UFO, which is about 5 Kb. There are several alleles here, each one being very different from the others. All sequences from TFA belong to the same allele family. A 0.8 Kb conserved non-coding region, N4R, follows. The next sub-fragment is HVR, including two parts.

The part closest to N4R is about 3 Kb, is an extremely diverse region, contains some large indels (INsertion/DELetion, more than 100 bp), and belongs to a few discrete clades spread among all samples. It is worth mentioning that one of the indels is shared by one RIFA sample and one TFA sample (Fig. 4). The other part is an about 2 Kb region and converges into 2 conserved alleles. Again, all TFA sequences are categorized into one of these two alleles. The sequences continuously converge into an about 2 Kb conserved region with relatively few SNPs and few indels. After this region is the CoP sub-fragment which has an about 3 Kb diverse region that is just less polymorphic than HVR. This is the region that is covered by allele sequence that the alignment software suggests (see method 3.1). The neighboring region is an about 37.5 Kb intervening region between the hypervariable region and the gene THUMP. These final two regions are relatively conserved and were not further investigated here.

3.2.2 Gene tree analyses support trans-species polymorphism (TSP) in the hypervariable region.

The homologous TFA hypervariable region sequence was compared to the RIFA sequence using blast. In the TFA draft genome, the blast results showed that HVR was similar to haplotype 3 of RIFA, but UFO and CoP which flank HVR was most similar to haplotype 1 of RIFA.

The phylogenetic tree could describe the relationship among the TFA and RIFA sequences and provide hints for exploring the sex locus under the CSD mechanism. If the hypervariable region is truly the only sex locus in TFA, then it should be under balancing selection. Strong balancing selection is often associated with TSP because old alleles were kept that can even exceed the species age. In contrast, if this region is unimportant (i.e., TFA evolved another locus for sex determination), most alleles in TFA should be lost by

drift or possible selective sweeps due to hitchhiking. In this case, TFA sequences should be an out-group to the RIFA alleles or derived from one of the RIFA alleles. Thus by examining phylogenetic trees of the candidate region, we can distinguish which hypothesis is more likely. The TSP phenomenon also could be caused by convergence, introgression and new speciation. Thus, in addition to the hypervariable region, I cloned 5 neutral sequences and compared them to exclude these three possibilities.

Thus far, 5 CoP sequences, 4 HVR sequence, 3 UFO sequences, 4 EGF-like sequences and 6 to 9 samples for each neutral sequence in TFA were cloned. Additionally, the existing TFA genome assembly sequence was added as another sequence set. These sequences were analyzed and phylogenic trees constructed for each phylogenic tree. In the phylogenic tree for EGF-like, there were 5 clades and 4 of 5 TFA sequences belonged to one of these. The other one looked different than others and was classified in another clade. There were 5 clades in the UFO tree and the distances between each other were relatively far. All TFA sequences obviously belonged to the same clade. The first part of HVR was most polymorphic and TFA sequences were spread among 3 clades in the tree.

The other part was separated into two clades and all TFA sequences were grouped into one of these. Although the CoP sequences also could be divided to two parts, the results were similar in that one of 6 TFA sequences was discrete from the others (Fig. 5C to Fig.

5H). In sum, the phylogenic trees of CoP, HVR and the gene EGF-like all exhibited the TSP phenomenon, even though an out-group was only available for EGF-like. This is consistent with our prediction. In contrast, all neutral sequences showed that the TFA sequences were in one cluster with no evidence for TSP (Fig. 6A to Fig. 6E). Together, these data indicate that in TFA, the hypervariable region shares some alleles with RIFA.

In contrast, the other regions cluster as a group from RIFA, even for a control region (LG3N) that was nearby, only <100 kb away.

In fact, part of intervening region (about 3.5 Kb with out-group sequence available) and the gene THUMP were analyzed with only one TFA sample, many RIFA samples and the out-group (Fig. 5A, 5B). The intervening region tree showed a similar result: the TFA sample is mixed within the RIFA clade despite only one sample, Although this cannot be considered as a solid evidence, this region is obviously different from the phylogenetic tree for the gene THUMP (which has with the same sample size) where the TFA sample is independent of the RIFA clade. This result indicates that the TSP phenomenon is in the hypervariable region and is attenuated to both sides, at least to the gene THUMP and the neutral region LG3N.

相關文件