Experimental Results and Discussions - 偵測核糖核酸 H 型偽結之研究

A summary of the overall sensitivity and specificity for all experiments, which were run using the general class of the descriptor without an interior or bulge loop in the pseudoknot stems, is shown in Tables 4.2, in which we let Sbp(Sensitivity) = _{T P +F N}^{100×T P}, Pbp

(Specificity) = _{T P +F P}^{100×T P} and Π=(number of correctly predicted h-pseudoknots)/(number of predicted h-pseudoknots) (i.e., the fraction of the correctly predicted h-pseudoknots), where TP = true positive (i.e., the number of the correctly predicted base-pairs in the predicted h-pseudoknots), FN = false negative (i.e., the number of the base-pairs in

the published h-pseudoknots that were not predicted) and FP = false positive (i.e., the number of the incorrectly predicted base-pairs in the predicted h-pseudoknots).

In this set of experiments, PKNOTS and NUPACK were not able to deal with the cases of T2, T4, BCV-3⁰, MHV-3⁰ and SARS-TW1-3⁰, due to running out of memory. For the other sequences, PKNOTS and NUPACK exhibited almost the same prediction results in which the h-pseudoknot of HIV-1-RT was identified, but the h-pseudoknots of TMV-3⁰-up, TYMV-3⁰ and HPeV1-5⁰ were missed². (Note that PKNOTS could predict two real h-pseudoknots of TMV-3⁰-down, if the version of PKNOTS was 1.04, instead of 1.01.) Notably, most of the above results were improved when we conducted all the experiments using pknotsRG. However, the h-pseudoknots of T4, SARS-TW1-3⁰ and TMV-3⁰-down were still missed by pknotsRG. The inabil-ity of detecting the real h-pseudoknots described above evidences the fact that for the long RNA sequence, the MFE model might miss the h-pseudoknots that are ac-tually present in the native structure. In our experiments (as shown in Table 4.2), however, this situation was significantly improved by our HPknotter because most of the real h-pseudoknots of TMV-3⁰-up, T4, TYMV-3⁰, SARS-TW1-3⁰ and TMV-3⁰ -down were detected with high sensitivity and specificity. The key point lies in the fact that our HPknotter first uses RNAMotif to search for all fragments of the given RNA sequence that have the possibility of folding an h-pseudoknot and then applies PKNOTS/NUPACK/pknotsRG to these fragments for determining if their MFE struc-tures are indeed h-pseudoknots. In this situation, without effect on the nucleotides outside the fragments, PKNOTS/NUPACK/pknotsRG seems to give a higher proba-bility of successfully recognizing the pseudoknotted structures of fragments. In our experiments (as shown in Table 4.2), however, this situation was significantly im-proved by our HPknotter because most of the real h-pseudoknots of TMV-3⁰-up and TYMV-3⁰ were detected with high sensitivity and specificity. The key point lies in the fact that our HPknotter first uses RNAmotif to search for all fragments of the given RNA sequence that have the possibility of folding an h-pseudoknot and then applies

2Actually, PKNOTS and NUPACK both predicted an h-pseudoknot for HPeV1-5⁰, but with zero sensitivity and specificity due to incorrect basepairings.

Table 4.2: Summary of prediction results on several RNA sequences, where all experiments are run using the general class of the descriptor and the version of PKNOTS is 1.01.

HPknotter

PKNOTS NUPACK pknotsRG PKNOTS-kernel NUPACK-kernel pknotsRG-kernel Experiment S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π

1. 5S-rRNA – – 0/0 – – 0/1 – – 0/0 – – 0/1 – – 0/1 – – 0/2

2. HIV-1-RT 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 3. TMV-3⁰-up 0 0 0/0 0 0 0/0 71.4 62.5 3/3 100 77.8 2/2 100 77.8 3/3 71.4 62.5 3/3

77.8 87.5 0 0 88.9 100 77.8 87.5

88.9 100 66.7 66.7 88.9 100 88.9 100

4. T2 – – –/– – – –/– 100 100 1/1 100 100 1/4 100 100 1/10 100 100 1/16 5. T4 – – –/– – – –/– 0 0 0/1 100 100 1/3 100 100 1/17 100 100 1/17 6. TYMV-3⁰ 0 0 0/0 0 0 0/1 100 80 1/2 100 80 1/1 62.5 55.6 1/2 100 80 1/2 7. BCV-3⁰ – – –/– – – –/– 100 100 1/1 100 100 1/1 94.4 100 1/3 100 100 1/3 8. MHV-3⁰ – – –/– – – –/– 100 100 1/3 100 100 1/3 100 100 1/5 100 100 1/6 9. SARS-TW1-3⁰ – – –/– – – –/– 0 0 0/0 93.8 100 1/2 93.8 100 1/3 100 100 1/5 10. TMV-3⁰-down 0 0 0/0 60.9 42.4 1/1 0 0 0/0 100 100 2/2 100 100 2/2 100 100 2/2

91.3 91.3 95.7 100 100 95.7 11. HPeV1-5⁰ 0 0 1/1 0 0 1/1 54.5 54.5 1/1 100 100 1/1 100 100 1/1 100 100 1/1

It should be noted that PKNOTS of version 1.04 can successfully predict two h-pseudoknots of TMV-3⁰-down. The reason that HPknotter with PKNOTS-kernel missed the second h-pseudoknot of TMV-3⁰-up is that PKNOTS is not able to fold its corresponding sequence into a pseudoknot.

PKNOTS/NUPACK/pknotsRG to these fragments for determining if their MFE struc-tures are indeed h-pseudoknots. In this situation, without effect on the nucleotides out-side the fragments, PKNOTS/NUPACK/pknotsRG seems to give a higher probability of successfully recognizing the pseudoknotted structures of fragments. This approach, of course, inevitably increases the number of incorrectly predicted h-pseudoknots, be-cause it ignores the global effect of all input nucleotides by considering just the local fragments of the input RNA sequence. In fact, our experiments showed that the num-ber of the incorrectly predicted h-pseudoknots was reasonable because among all these predicted h-pseudoknots, HPknotter at the last stage applies the concept of maximum weight independent set to compute the mutually disjoint h-pseudoknots with minimum total free energy.

Generally speaking, as shown in Table 4.2, our HPknotter greatly improves sensitiv-ity, specificity and the fraction Π of correctly predicted h-pseudoknots when compared with original PKNOTS, NUPACK and pknotsRG. It should be noted that the num-bers of incorrectly predicted h-pseudoknots in the cases with PKNOTS-kernel are not greater than those in the cases with NUPACK-kernel and pknotsRG-kernel, which seems to imply that PKNOTS itself is more accurate than NUPACK and pknotsRG, even though PKNOTS is more time-consuming than NUPACK and pknotsRG from the computational point of view.

It is worth mentioning that as shown in Table 4.3, the overall prediction accuracy will be further improved if we rerun all tested RNA sequences above, except 5S-rRNA containing no pseudoknot, by choosing the specific class to which the predicted h-pseudoknots belong, instead of using the general class of descriptor. Particularly, the Π values (as shown in Table 4.3) and the performance of running time (as shown in Table 4.4) were greatly improved. These experiments indicate that our HPknotter can be served as an effective tool for validating if the tested RNA sequences have the same kind of h-pseudoknots as other closely related RNA sequences whose h-pseudoknots are already known in advance. For instance, SARS, BCV and MHV are all coronaviruses, and the h-pseudoknots of BCV-3⁰ and MHV-3⁰, both of which belong to class 2 of h-pseudoknots, are already known and have been proven by previous experiments [41].

Table 4.3: Summary of prediction results on several RNA sequences, where experiments 1–4, 5–9 and 10–11 are run using the descriptors of classes 1, 2 and 3, respectively. Notice that TMV-3⁰-down contains two h-pseudoknots with one in class 2 (that was tested in experiment 9) and the other in class 3 (that was tested in experiment 10).

HPknotter

PKNOTS NUPACK pknotsRG PKNOTS-kernel NUPACK-kernel pknotsRG-kernel Experiment S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π S_bp P_bp Π 1. HIV-1-RT 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 100 100 1/1 2. TMV-3⁰-up 0 0 0/0 0 0 0/0 71.4 62.5 3/3 100 87.5 2/2 0 0 2/2 0 0 2/2

77.8 87.5 0 0 88.9 100 77.8 87.5

88.9 100 66.7 66.7 88.9 100 88.9 100

3. T2 – – –/– – – –/– 100 100 1/1 100 100 1/3 100 100 1/6 100 100 1/14 4. T4 – – –/– – – –/– 0 0 0/1 100 100 1/3 100 100 1/11 100 100 1/11 5. TYMV-3⁰ 0 0 0/0 0 0 0/1 100 80 1/2 100 80 1/1 62.5 62.5 1/1 100 80 1/1 6. BCV-3⁰ – – –/– – – –/– 100 100 1/1 100 100 1/1 94.4 100 1/2 100 100 1/1 7. MHV-3⁰ – – –/– – – –/– 100 100 1/3 100 100 1/1 100 100 1/3 100 100 1/4 8. SARS-TW1-3⁰ – – –/– – – –/– 0 0 0/0 93.8 100 1/1 93.8 100 1/3 100 100 1/3 9. TMV-3⁰-down 0 0 0/0 0 0 0/0 0 0 0/0 100 100 1/1 100 100 1/3 100 100 1/1 10. TMV-3⁰-down 0 0 0/0 60.9 42.4 1/1 0 0 0/0 91.3 91.3 1/1 95.7 100 1/1 100 95.7 1/1 11. HPeV1-5⁰ 0 0 1/1 0 0 1/1 54.5 54.5 1/1 100 100 1/1 100 100 1/1 100 100 1/1

The first h-pseudoknot of TMV-3⁰-up was missed by HPknotter with NUPACK-kernel and pknotsRG-kernel because it was filtered out due to the incorrect class.

It is reasonable to expect that SARS-TW1-3⁰ may contain an h-pseudoknot of class 2.

Therefore, we can apply our HPknotter to SARS-TW1-3⁰ by specifying the descriptor to be class 2 so that we are able to quickly obtain the same result as the general descriptor.

在文檔中偵測核糖核酸 H 型偽結之研究 (頁 29-34)