CPU Time Usage - 偵測核糖核酸 H 型偽結之研究

In fact, our HPknotter is not CPU intensive at all because based on our experiments, a great number of the hit sequences produced by RNAMotif were filtered out by the hit filter. Take the experiments with SARS-TW1-3⁰ in Table 4.2 for an example. In the first phase, RNAMotif in total found 2,132 hits that conform to the descriptor of general class. If we directly apply PKNOTS to all of these unfiltered hits to check if they fold into a stable h-pseudoknot, then the program will require about 51 hours to finish the job. However, after running the hit filter, only 43 different hit sequences were remained, which then cost the following PKNOTS only about 5.2 minutes to determine if they are stable pseudoknots. As a result, the third phase of running pseudoknot prediction with PKNOTS left us with only 11 pseudoknot candidates that could fold into stable pseudoknots. Next, only 7 candidates were remained after running the h-pseudoknot filter in the fourth phase. In fact, some of these filtered h-pseudoknots may have an overlap among their ranges in the sequence, which suggests that they can not exist simultaneously in a stable pseudoknotted structure in SARS-TW1-3⁰. Finally, only 2 h-pseudoknots with minimum free energy were selected in the phase of computing the maximum weight independent set. Table 4.4 lists the CPU usage time for PKNOTS, NUPACK, pknotsRG and our HPknotter, where all tests were run on IBM PC with 3.06 GHz processor and 2 GB RAM under Linux system.

Table 4.4: CPU usage time for PKNOTS, NUPACK, pknotsRG and HPknotter, where in our testing computer environment, PKNOTS and NUPACK cannot deal with the sequences of length greater than 220 bp and 180 bp, respectively, due to running out of the memory.

HPknotter (General Class) HPknotter (Specific Class) Length (bp) PKNOTS NUPACK pknotsRG PKNOTS-kernel NUPACK-kernel pknotsRG-kernel PKNOTS-kernel NUPACK-kernel pknotsRG-kernel

84 7.3 min 13.1 sec 0.05 sec 31 sec 27 sec 26 sec 9 sec 7 sec 6 sec 105 35 min 44.7 sec 0.1 sec 2.2 min 35 sec 29 sec 38 sec 10 sec 8 sec

200 72 hr – 0.8 sec 5.2 min 1.8 min 1.5 min 1.6 min 33 sec 30 sec

341 – – 7.4 sec 7.1 min 2.4 min 2.3 min 2.2 min 46 sec 45 sec

946 – – 10.1 min 13.8 min 7.5 min 6.9 min 4.1 min 2.2 min 2.1 min

1340 – – 43.5 min 35.3 min 11.6 min 10.9 min 11.6 min 3.1 min 2.5 min

Chapter 5 Conclusion and Future Works

In this thesis, we designed a heuristic approach for efficiently and accurately detecting RNA h-pseudoknots, the ubiquitous pseudoknots in the naturally occurring RNAs.

The currently existing thermodynamic-based programs, like PKNOTS, NUPACK and pknotsRG, are useful for finding stable h-pseudoknots. However, most of them are very time- and memory-consuming, which limits them to predict short sequences of a couple of hundred bases long. Another main weakness of these programs is that they may not be effective to detect the actually existing h-pseudoknots that are contained in a long RNA sequence, as evidenced by our experiments. Based on our heuristic approach mentioned in this thesis, we implemented a novel program, called HPknotter, capable of efficiently and accurately detecting the h-pseudoknots of a given RNA sequence by incorporating four existing programs RNAMotif, PKNOTS, NUPACK and pknotsRG.

In summary, we demonstrated the practicability and effectiveness of our developed HPknotter by testing it on several RNA sequences, most of which have been proven to contain the h-pseudoknotted structures. By several experiments, our HPknotter has shown to be practical for the detection of h-pseudoknots in RNA sequences because it is not computationally expensive and has much better sensitivity and specificity than PKNOTS, NUPACK and pknotsRG.

In the following, we describe a couple of interesting problems for future researches.

First, how to reduce the number of the sequence fragments hit by RNAMotif by con-sidering the GC ratio of the pseudoknot stems, the conserved sequence patterns in

the structural motifs of the h-pseudoknots, etc., or even by designing a new and more efficient algorithm for identifying the sequence fragments. Second, how to develop a more efficient program for detecting the h-pseudoknots of RNA sequences so that it can replace the kernel programs used by our HPknotter, such as PKNOTS, NUPACK and pknotsRG. Finally, how to extend our heuristic approach to detecting more general classes of pseudoknots for a given RNA sequence.

References

[1] Abrahams, J. P., van den Berg, M., van Batenburg, E. & Pleij, C. (1990) Predic-tion of RNA secondary structure, including pseudoknotting, by computer simula-tion. Nucleic Acids Research, 18, 3035–3044.

[2] Akutsu, T. (2000) Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104, 45–62.

[3] Bekaert, M., Bidou, L., Denise, A., Duchateau-Nguyen, G., Forest, J. P., Froide-vaux, C., Hatin, I., Rousset, J. P. & Termier, M. (2003) Towards a computational model for -1 eukaryotic frameshifting sites. Bioinformatics, 19, 327–335.

[4] Brown, M. & Wilson, C. (1996) RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. In Proceedings of the 1996 Pacific Symposium on Biocomputing, (Hunter, L. & Klein, T., eds), pp. 109–125.

[5] Cai, L., Malmberg, R. L. & Wu, Y. (2003) Stochastic modeling of RNA pseudo-knotted structures: a grammatical approach. Bioinformatics, 19, 66–73.

[6] Condon, A., Davy, B., Rastegari, B., Zhao, S., Tarrant, F. (2004) Classifying RNA pseudoknotted structures. Theoretical Computer Science, 320, 35-50.

[7] Cannone, J. J., Subramanian, S., Schnare, M. N., Collett, J. R., D’Souza, L. M., Du, Y., Feng, B., Lin, N., Madabusi, L. V., Muller, K. M., Pande, N., Shang, Z., Yu, N. & Gutell, R. R. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics, 3, 2.

[8] Cary, R. B. & Stormo, G. D. (1995) Graph-theoretic approach to RNA modeling using comparative data. In Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology (ISMB’95), (Rawlings, C., ed.), pp.

75–80 AAAI Press, Menlo Park, Calif.

[9] Dirks, R. M. & Pierce, N. A. (2003) A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of Computational Chemistry, 24, 1664–1677.

[10] Gultyaev, A. P. (1991) The computer simulation of RNA folding involving pseudo-knot formation. Nucleic Acids Research, 19, 2489–2494.

[11] Gultyaev, A. P., van Batenburg, F. H. & Pleij, C. W. (1995) The computer sim-ulation of RNA folding pathways using a genetic algorithm. Journal of Molecular Biology, 250, 37–51.

[12] Hofacker, I. L. (2003) Vienna RNA secondary structure server. Nucleic Acids Research, 31, 3429–3431.

[13] Hsiao, J. Y., Tang, C. Y. & Chang, R. S. (1992) An efficient algorithm for finding a maximum weight 2-independent set on interval graphs. Information Processing Letters, 43, 229–235.

[14] Hammell, A. B., Taylor, R. C., Peltz, S. W. & Dinman, J. D. (1999). Identification of putative programmed -1 ribosomal frameshift signals in large DNA databases.

Genome Research, 9, 417–427.

[15] Ieong, S., Kao, M. Y., Lam, T. W., Sung, W. K. & Yiu, S. M. (2003) Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs. Journal of Computational Biology, 10, 981–995.

[16] Kolk, M. H., van der Graaf, M., Wijmenga, S. S., Pleij, C. W., Heus, H. A. &

Hilbers, C. W. (1998) NMR structure of a classical pseudoknot: interplay of single-and double-strsingle-anded RNA. Science, 280, 434–438.

[17] Lyngsø, R. B. & Pedersen, C. N. (2000) RNA pseudoknot prediction in energy-based models. Journal of Computational Biology, 7, 409–427.

[18] Moon, S., Byun, Y., Kim, H. J., Jeong, S. & Han, K. (2004) Predicting genes expressed via -1 and +1 frameshifts. Nucleic Acids Research, 32, 4884–4892.

[19] Macke, T. J., Ecker, D. J., Gutell, R. R., Gautheret, D., Case, D. A. & Sampath, R. (2001) RNAMotif, an RNA secondary structure definition and search algorithm.

Nucleic Acids Research, 29, 4724–4735.

[20] Mans, R., Pleij, C., Bosch, L. (1991) Transfer RNA-like Structures: Structure, function and evolutionary significance. European Journal of Biochemistry, 201, 303–324.

[21] McPheeters, D. S., Stormo, G. D. & Gold, L. (1988) Autogenous regulatory site on the bacteriophage T4 gene 32 messenger RNA. Journal of Molecular Biology, 201, 517–535.

[22] Nateri, A. S., Hughes, P. J. & Stanway, G. (2002) Terminal RNA replication elements in human parechovirus 1. Journal of Virology, 76, 13116–13122.

[23] Pleij, C. W. & Bosch, L. (1989) RNA pseudoknots: structure, detection, and prediction. Methods Enzymol., 180, 289–303.

[24] Pleij, C. W. (1990) Pseudoknots: a new motif in the RNA game. TIBS, 15, 143–147.

[25] Pleij, C. W. A. (1994) RNA pseudoknots. Current Opinion in Structural Biology, 4, 337–344.

[26] Rietveld, K., Poelgeest, R. V., Pleij, C. W., Boom, J. H. V. & Bosch, L. (1982) The tRNA-like structure at the 3⁰ terminus of turnip yellow mosaic virus RNA:

differences and similarities with canonical tRNA. Nucleic Acids Research, 10, 1929–1946.

[27] Rivas, E. & Eddy, S. (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology, 285, 2053–2068.

[28] Reeder, J. & Giegerich, R. (2004) Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioin-formatics, 5, 104.

[29] Ruan, J., Stormo, G. D. & Zhang, W. (2004) An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics, 20, 58–66.

[30] Shapiro, B. A., Wu, J. C., Bengali, D. & Potts, M. J. (2001) The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation. Bioinformatics, 17, 137–148.

[31] Shapiro, B. S. & Wu, J. C. (1997) Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm. CABIOS, 13, 459–471.

[32] Tabaska, J. E., Cary, R. B., Gabow, H. N. & Stormo, G. D. (1998) An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics, 14, 691–699.

[33] Tahi, F., Engelen, S. & Regnier, M. (2003) A fast algorithm for RNA secondary structure prediction including pseudoknots. In Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2003) IEEE, Los Alami-tos, CA.

[34] ten Dam, E. B., Pleij, K. & Draper, D. (1992) Structural and functional aspects of RNA pseudoknots. Biochemistry, 31, 11665–11676.

[35] Tsai, Y. T., Huang, Y. P., Yu, C. T. & Lu, C. L. (2004) MuSiC: a tool for multiple sequence alignment with constraints. Bioinformatics, 20, 2309–2311.

[36] Tuerk, C., MacDougal, S. & Gold, L. (1992) RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase. Proceedings of the National Academy of Sciences, 89, 6988–6992.

[37] van Batenburg, F. H., Gultyaev, A. P. & Pleij, C. W. (1995) An APL-programmed genetic algorithm for the prediction of RNA secondary structure. Journal of The-oretical Biology, 174, 269–280.

[38] van Batenburg, F. H., Gultyaev, A. P. & Pleij, C. W. (2001) PseudoBase: struc-tural information on RNA pseudoknots. Nucleic Acids Research, 29, 194–195.

[39] van Batenburg, F. H., Gultyaev, A. P., Pleij, C. W., Ng, J. & Oliehoek, J. (2000) PseudoBase: a database with RNA pseudoknots. Nucleic Acids Research, 28, 201–204.

[40] van Belkum, A., Abrahams, J. P., Pleij, C. W. & Bosch, L. (1985) Five pseudo-knots are present at the 204 nucleotides long 3⁰ noncoding region of tobacco mosaic virus RNA. Nucleic Acids Research, 13, 7673–7686.

[41] Williams, G. D., Chang, R.-Y. & Brian, D. A. (1999) A phylogenetically conserved hairpin-type 39 untranslated region pseudoknot functions in coronavirus RNA replication. Journal of Virology, 73, 8349–8355.

[42] Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization pre-diction. Nucleic Acids Research, 31, 3406–3415.

[43] Zuker, M. & Sankoff, D. (1984) RNA secondary structure and their prediction.

Bulletin of Mathematical Biology, 46, 591–621.

[44] Zuker, M. & Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research, 9, 133–148.

在文檔中偵測核糖核酸 H 型偽結之研究 (頁 34-42)