• 沒有找到結果。

IESs are conserved within different inbred lines

Chapter 2 Results

2.1 The Mechanism of Boundary Determination in the Internal Sequence Elimination

2.1.3 IESs are conserved within different inbred lines

We showed that IES locations were highly conserved among B strains. Next, we

wanted to know whether this conservation is shared by different inbred lines. C3

inbreeding strains were derived from different inbreeding programs and exhibited genetic

polymorphism from B inbreeding strains (67). We sequenced two C3 strains, 6a and 12a,

and observed that they contained similar numbers of IESs to the B strains; 6586 IESs

(7125 forms) and 6646 IESs (7168 forms) in 6a and 12a, respectively (Fig. 4A). Most of

these IESs were shared between these two C3 strains (95.81% in 6a and 94.94% in 12a)

(Fig. 4B), and the majority was also shared with the three B strains (at least 71.44%) (Fig.

4C), indicating that most IESs are universal among different strains. However, the IES

boundaries within inbred lines were very different. For instance, 791 precise IESs existed

in all three B strains, but only 345 of these occurred in the two C3 strains, with 80.66% of

the remainder exhibiting small variance (within 20 bp) and 12.90% exhibiting large

25

variance (>100-bp) (Fig 4D and E). The results suggest that genetic background can

influence IES boundary precision.

2.1.4 cis-acting elements near IES boundaries could be involved in IES regulation

Since a large proportion of the IES boundaries showed only small variations, we

tried to determine if there was any cis-acting element in the flanking regions of other IESs.

The nucleotide distribution near the IES junctions from the 500-bp inner sequences to the

500-bp upstream or downstream flanking regions of all IESs showed that the regions

contained a lower GC content than the average of the MIC genome (25 %GC). However,

significant fluctuations could be observed within the 100-bp flanking regions, implying

that the boundary of IESs could be marked in cis (Fig. 5).

To group cis-acting elements with a particular feature, we used a motif finding tool,

eTFBS, to scan the flanking sequences (http://bioinfo.bime.ntu.edu.tw/c4lab/). The

100-bp IES flanking regions were used as the query dataset to find 10 overrepresented

motifs (Top 10 motifs) and the IES flanking regions that are 100-200 bp from the

junctions were used as background datasets (Fig. 6A). Most of the motifs revealed high

AT patterns, which could be due to the higher AT content within these regions (Fig. 5).

However, we could not find any common pattern amongst motifs in the IES flanking

regions, except for two (motifs 2 and 7) that displayed a consistent distance to the IES

26

boundary when they served as IR but not as direct repeats (DR). Moreover, we found that

these two motifs have the same core sequence “TACCNT”. We find a total of 1881 motifs

when we used this core sequence to scan all 100-bp IES flanking regions, 199 of which

were paired at the both sides of an IES as IR and 57 of them are paired as DR (Table 2).

The distances between these IRs to respective IES boundaries were about 62 bp, and the

difference of these distances between the two ends of the same IES was about 11 bp,

which differed from the distributions of DR (Fig. 6B). Furthermore, 80.40% of IESs that

were flanked by these IRs were in the group of 20-bp variations among the B strains,

suggesting that this motif could be related to regulation of IES boundaries (Fig. 6C).

To search further for more candidate cis-acting elements that could regulate IES

boundaries, we focused on the IRs located at a similar distance (less than 10 bp difference)

to the boundaries on both ends of the same IES in CU427. We arbitrarily defined the IR as

a pentamer sequence and allowed no mismatch between the two repeats. IRs that had an

identical sequence at the 5’-end were clustered and we identified the localization

distribution by calculating the interquartile range (IQR). The IQR served as an indicator

of location concentricity in the IES flanking regions. A larger IQR indicated that the

distribution of the IR was dispersed within IES flanking regions and vice versa. We

identified 472 5-mer sequences serving as IRs at both ends. The “TAAAA” sequence

presented the highest frequency; however, it was widely distributed and had a very high

27

IQR (IQR = 56) with no apparent pattern (Fig 7A). However, we also found 138 pentamer

IRs with low IQRs (IQR ≤ 10), indicating that the distribution of these IRs is concentrated

within a small range in each group (Table 3).

Interestingly, we clustered some low IQR groups that shared the same core sequence

and found that they still maintained their concentricity; 4119 IRs contained the core

sequence “TATA”, which the most frequent group with low IQR (Fig. 7B). However,

some pentamer groups contained “TATA” were spread distributed (Fig. 7G-7J). The

sequence consensus could be further specified by aligning four pentamer IRs with low

IQRs in the “TATA” group (Fig. 7C-7F and 7K), and the distribution of this consensus

was shown in Fig. 7L. Some additional examples that showed a similar feature with the

TATA group were identified. For example, 702 pentamer IRs contained TGT (Fig. 8A)

and 696 IRs contained ACT were identified (Fig. 8D). The refined consensus maintained

the concentrated patterns in these IR groups (Fig. 8B-8C and 8E-8F). Furthermore, the

low IQR IRs that shared the same sequence as the above identified “TACCNT” IR were

observed (Fig. 9A and 9B), indicating that this method could identify the consensus of

IRs that showed distinct patterns near the IES flanking regions.

Moreover, pentamer IRs composed of G or C were also highly represented, with

distribution patterns similar to IRs with the “GGG” sequence and other IRs with at least

three G residues (G-rich). The distances of “GGG” and “G-rich” IRs to the IES

28

boundaries were 46.59 bp and 46.76 bp on average, respectively (Fig. 9C and 9D).

However, the distribution patterns of IRs with the “CCC” sequence and the C-rich IRs

differed, implying that the C-rich IRs could be utilized by several regulatory elements

(Fig. 9E and 9F).

2.1.5 G-rich and C-rich inverted repeats are the cis-acting elements of Lia3-affected

IESs

A recent study showed that the function of Lia3 is to act as a cis-acting element

binding protein that regulates the boundary of IESs containing G-rich polypurine tracts,

including the M element (59). Low progeny viability and the loss of boundary precision

of M elements and other 4 M element-like IESs in Lia3-deficient progeny lines indicates

that Lia3 is lined to maintenance of IES boundaries. In our study, we observed that the

G-rich IR were highly represented in the IES flanking regions. Hence, we postulated that

this motif could be regulated by Lia3. To test our hypothesis, we sequenced three progeny lines (3-1, 4-1 and 27-2), which were generated from Lia3 strains (Fig. 10). Note that

the Lia3 strains were generated from a mating of CU428 and BII, hence, we compared

the changes in IES distribution between three B strains and the Lia3 progeny strains. We

noticed that the number of IESs with more than 100-bp boundary variations in different

forms was 11.07%, i.e. higher compared to the WT strains (Fig. 10). Since boundary

29

regulation was lost in the Lia3 strains, we suggest that the increased number of IESs

with large variations is caused by the Lia3 deficiency.

To identify the subset of IESs that is potentially regulated by Lia3, we first focused

on the 519 Lia3-affected IESs that showed more than 100-bp boundary variations when the Lia3progeny were compared with the WT strains (Table 4). Of these 519 IESs,

59.34% exhibited within 20-bp variations in WT strains, indicating that the majority of

Lia3-affected IESs had only small variation. Hence, we extracted the 100-bp

Lia3-affected IES flanking regions that were within 20-bp variation in CU427 WT strains

and calculated the nucleotide distribution of these regions. We found a distinct

enrichment of G from 40-60 bp to the IES boundary (Fig. 11), which agrees with our

expectations based on the limited cases and further extended the potential size of this

group. We also observed a small peak of C from 25-40 bp to the IES boundary (Fig. 11),

suggesting that a C-rich motif could also play a role. By searching for significant motifs

within these regions using MEME, we identified the G-rich motif as “GAGGG” (Fig.

12A). In order to know whether this motif exists at both ends of an IES, we used this

motif and its reverse complement (CCCTC) to scan IES flanking regions (Fig. 12A and

12B). In this case, we did not limit the difference of motif distances to the IES boundary;

however, if there was more than one IR pair at an IES, the IR pair with the minimum

difference of distance was chosen. Surprisingly, if we set the threshold as 60% identity of

30

the value of position weight matrix (PWM), we observed that 94.48% of Lia3-affected

IESs contained G-rich or C-rich IR. Both copies of the IRs were at similar distances from

the IES boundaries. In fact, there were fewer differences between each IR pair than

among such distances in all IESs of this group (Table 5 and Fig. 12C and 12D). If we

further restricted the comparison to the subset of IESs that showed boundary variations of

10-bp or less among WT strains and exhibited more than 100-bp boundary variations among the Lia3 strains, the proportion of the G- or C-rich IR containing IESs increased

to 96.09%, indicating that almost all Lia3-affected IESs that have very limited boundary

variations were flanked by the G- or C-rich IR.

However, the background ratio of IESs with these IRs in the entire IES flanking

region is also high (about 73%), indicating that the threshold of the PWM value should be

adjusted. To optimize the outcome, we tested different identities of PWM value and found

that in the G- or C-rich IRs with 75% identity, more than 60% of Lia3-affected IESs

contained one of these IRs, but only about 14% of background IESs had them (Table 5).

Hence, we set the PWM threshold as 75% identity for subsequent experiments.

Next, we suspected that the G- or C-rich IRs also existed among IESs that exhibited

larger variations among strains. We extracted IESs that showed more than 100-bp boundary differences between WT and Lia3strains, and scanned for the two IRs in the

IES flanking regions in all three WT strains. Significantly, 167 out of 175 IESs that had

31

G- or C-rich IRs were shared among all WT strains (representing 58.95% of

Lia3-affected IESs), but only 9.81% of them were present in the background IESs among

all WT strains (Table 6). Moreover, the distances of the two IR copies to both ends of an

IES boundary and the difference between them in G/C-rich IRs containing Lia3-affected

IESs were quite consistent among strains (Fig. 12C and 12D), and they were significantly

different from IRs found in the background IESs (p-value<10-5 on average) (Fig. 12E and

12F).

We also considered the G-rich or C-rich motifs as DRs. We identified 41 G-rich DRs

and 45 C-rich DRs in the group of Lia3-affected IESs. However, the distances to the IES

boundaries varied (53.32 bp±23.24 in G-rich DRs and 50.12 bp±22.54 in C-rich DRs). In

addition, the difference between the distances of the two copies to the IES boundaries

were higher for DRs (about 25 bp in G-rich DRs and about 17 bp in C-rich DRs)

compared to IRs, indicating that DRs were less likely to regulate Lia3-affected IESs. In

conclusion, our results show that Lia3-affected IESs are regulated by G-rich and C-rich

cis-acting elements.

2.1.6 piggyBac transposon families in Tetrahymena

Domesticated piggyBac transposases play an important role in IES excision (41,

68). Six putative domesticated piggyBac transposase genes can be identified in the

32

Tetrahymena Genome Database (http://ciliate.org/index.php/home/welcome), TPB1,

TPB2, TPB3, TPB6, TPB7 and LIA5 (Fig. 13). Inhibition of TPB2 expression prevents

IES deletion globally and affects the developmental process (68). Tpb1p and Tpb6p

probably function as a heterodimer and both are required to regulate a small subset of

IESs that contain a specific inverted terminal repeat (ITR) highly similar to the ITR of the

piggyBac transposon in Bombyx mori (69). Moreover, unlike most TPB2-dependent IESs

that show deletion boundary heterogeneity, TPB1/6-dependent IESs are precisely deleted,

leaving one copy of the duplicated TTAA at the termini in the MAC genome (41). These

features suggest that TPB1 and TPB6 act much more like a piggyBac transposase than

TPB2 does.

Since there are several domesticated piggyBac transposases in the Tetrahymena

genome, we wanted to know whether there is any active piggyBac transposon. The

Tetrahymena Genome Database (http://ciliate.org/index.php/home/welcome) reports a

piggyBac-like gene in the MIC-specific region, which we have named as TPB3. We

analyzed the DNA structure of this gene, compiling the complete coding region to

investigate whether it has features of an active transposon. We searched upstream and

downstream of this sequence for ITR-like sequences and found that TPB3 indeed

possesses typical DNA structures of an active piggyBac transposon, with the TTAA

tetranucleotide DRs adjacent to a pair of ITR, as well as a coding region (without introns)

33

for the putative piggyBac transposase with the conserved DDD (or DDE) domain (Fig. 14

and 15). We then compared the full-length transposon structure with the first identified

piggyBac transposon, IFP2, in Trichoplusia ni and found that their compositions are quite

similar (Fig.16) (70). These observations support that TPB3 is likely an active piggyBac

transposon.

We then used the sequence of TPB3 to search the MIC genome for other copies. We

identified an additional two copies in the MIC genome and named them TPB4 and TPB5.

TPB3 and TPB4 are located in a 14 kb and an 11 kb IES, respectively, suggesting that

they were initially inserted into pre-existing IES regions (Fig. 17). Interestingly, the copy

of the TTAA DR adjacent to the 3’-ITR of TPB5 coincides with one boundary of the IES,

and the other IES boundary is only about 200 bp away from the 5’-ITR of TPB5 (Fig. 17

and 18), raising the possibility that this IES may actually be formed from a TPB5

invasion. TPB4 shows 88% DNA sequence identity to TPB3 and all three conserved

regions can be aligned (Fig. 19). However, there are frameshifts in TPB4 that cause loss

of the first and third D in the predicted amino acid sequence, suggesting TPB4 is a

dysfunctional transposase. Although the transposase part of TPB5 is destroyed as well,

the ITRs and the untranslated regions (UTRs) are still highly conserved among TPB3,

TPB4 and TPB5 (Fig. 19). It is notable that the ITRs among these three transposons are

very similar. The sequences of the 3’-ITRs of TPB3 and TPB4 are identical (Table 7),

34

suggesting that they are derived from the same origin. Interestingly, the first 11 bp of the 5’-ITR of TPB5 is identical to that of TcPLE1,a piggyBac transposon identified from the

red flour beetle Tribolium castaneum (71) but not to BmPBLEs from Bombyx mori (69).

This finding indicates that TPB3, TPB4 and TPB5 could have been derived from a

common origin that differs from the other piggyBac transposons and invaded the

Tetrahymena genome at different time points. Moreover, the evolutionary relationship of

the DDD domain truncation shows that TPB1, TPB2 and TPB6 are in the same clade,

and TPB2 is more closely related to PiggyMac transposase of Paramecium (72).

However, TPB3 and TPB4 (the frameshift-corrected form) are in a different clade (Fig.

20). Further analysis shows that the TPB3 clade is more closely related to human

PGBD5 (Fig. 21).

2.1.7 Discussion

The aim of this study was to investigate the global regulatory mechanism of IES

boundaries by comparing different Tetrahymena strains. We observed that IES boundary

deletions were highly conserved among Tetrahymena strains, but the precision of the

deletion boundaries varied. When we compared IES precision between two different

inbred lines, strains B and C3, most of the deletion boundaries presented small variations,

indicating that the majority of IESs exhibit microheterogeneity at the deletion boundaries.

35

Nevertheless, nucleotide distributions near IES junctions showed significant

fluctuations within the 100-bp flanking regions, implying that the DNA sequences could

be linked to boundary determination. To understand whether IES elimination is regulated,

we searched for cis-acting elements among the IES flanking regions, such as the A5G5 IR

of the M element. Surprisingly, we found that all of the identified cis-acting elements are

more likely to be IRs. The same group of cis-acting candidates was located at rather

invariable distances to the IES boundaries and the distance of the two copies of IRs to

both IES ends differed only slightly, suggesting that the IRs could cooperate with each

other. Hence, we hypothesize that IRs can interact with the cis-acting element binding

proteins to set boundaries and interact with other related proteins to help Tpb2p to cut two

ends at the same time. Furthermore, after Tpb2p excision, the structure could protect the

MAC-destined region from the cleavage of endonuclease and maintain two double-strand

breaks in close proximity and thereby help to join ends by NHEJ (Fig. 22).

In addition, we found a deficiency of Lia3 also affects IESs that have larger

boundary variations among different forms in WT strains. In the subset of the

Lia3-affected IESs with more than 20-bp boundary variations, the distance of the

cis-acting elements to the IES boundaries still varied only slightly, suggesting that

recognition of the cis-acting element was the determining factor of the IES boundary, i.e.

changing the position of the cis-acting element would change the IES boundary. This

36

scenario has also been described for the M element, which contains 0.6- and 0.9 kb

deletion forms (73). The two forms have the same boundary at the 3’-end and a 0.3-kb difference at the 5’-end. The two 5’-end boundaries both contain the 5’-A5G5 motif about

45 bp away (58). Furthermore, if we searched for IESs that at least one boundary with

small variations (within 20-bp) among different forms in all three B strains, 6513 of 6599

IESs (98.70%) were identified. If we hypothesize that the small boundary variations are

controlled by the nearby cis-acting element, the number of IES forms is determined by

the number of cis-acting elements near the IES boundaries. Our results suggest that the

majority of IESs contain multiple cis-acting elements in the IES flanking regions.

We observed that there are two cis-acting elements in Lia3-affected IESs, namely

G-rich and C-rich IRs. More than 90% of Lia3-affected IESs contained at least one of the

cis-acting elements under our threshold of 60% identity of the PWM value, whereas

about 60% of them contained one of the cis-acting elements under a threshold of 75%

identity. This finding indicates that almost all Lia3-affected IESs contained G-rich or

C-rich IRs, though some exhibited lower similarity that may affect the binding ability of

Lia3. Interestingly, the distance of the two copies of IRs to the IES boundaries differed

between G-rich and C-rich IRs (51bp and 38bp, respectively). A recent study showed that

Lia3 preferentially binds to single-strand sequences with five guanine residues, which

forms a parallel G-quadruplex in vitro (59), but its ability to bind C-rich sequences is very

37

poor, suggesting that Lia3 could bind to the G strand in both G-rich and C-rich IRs.

However, the represented motif in our study was “GAGGG”, which has been shown to

have the most unstable form for maintaining the G-quadruplex structure (74), suggesting

that the distance variation between G-rich and C-rich IRs is less likely to be due to the

direction of the G-quadruplex. Hence, we suspect that the orientation of the G strand

toward the IESs may affect the direction of Lia3 dimerization and further alter the

conformation of the protein complex to shift the cutting site.

Although domesticated piggyBac transposases are responsible for the excision of

IES, features of transposons (such as ITR and the TTAA cutting site) only remained in

the TPB1-dependent IESs but were lost in the TPB2-dependent IESs. In order to

understand more about the piggyBac transposase in Tetrahymena, we investigated an

active transposon TPB3, which is an MIC-specific sequence. The ITR sequence and

evolutionary analysis of the transposase showed that TPB3 and its variants had origins

different to other TPBs. A previous study showed that TPB2 has an ortholog in

different to other TPBs. A previous study showed that TPB2 has an ortholog in

相關文件