• 沒有找到結果。

Chapter 5 Conclusions

5.2 Future works

For genome assembly, it can be expected that the sequencing technologies will keep on improving in read length and throughput in the future. Besides, large genome assembly, such as mammalian-scale, will pay much more attentions in biological researches. Therefore, JR-Assembler should have better advantage on the efficiency in execution. A practical solution is to utilize the parallel computing of multi-core CPU nowadays. The assembly can be speeded up done by separating each seed extension by one thread so that multiple regions of the target genome can be assembled in parallel. One challenge of this task is that each thread should check if a region under

48

assembling now is being assembled by other threads. Another issue is to incorporate sequencing data from multiple platforms. Current JR-Assembler focus on assembling SRS data, however, other platforms such as Roche/454 (http://my454.com/) or PacBio (http://www.pacificbiosciences.com/) can generate reads of ~1 Kb in length. Although longer read can resolve longer repeats, mixture of specific error for each platform (e.g., homopolymer error of Roche/454 and high error rate of PacBio) could pose several new challenges.

For the SV detection, we may extend our current algorithm to take account of other complex rearrangement events, such as inverted transposition or any combination of complex rearrangements. However, the sequences that are adjacent to these regions under multiple rearrangement events might be too diverse to align well.

Therefore, more efforts should be made to un-shuffle the sequences.

49

Bibliography

[1] F. Sanger, et al., "DNA sequencing with chain-terminating inhibitors," Proc Natl Acad Sci U S A, vol. 74, pp. 5463-7, Dec 1977.

[2] E. S. Lander, et al., "Initial sequencing and analysis of the human genome,"

Nature, vol. 409, pp. 860-921, Feb 15 2001.

[3] R. H. Waterston, et al., "Initial sequencing and comparative analysis of the mouse genome," Nature, vol. 420, pp. 520-62, Dec 5 2002.

[4] M. L. Metzker, "Sequencing technologies - the next generation," Nat Rev Genet, vol. 11, pp. 31-46, Jan 2010.

[5] L. Feuk, et al., "Structural variation in the human genome," Nat Rev Genet, vol.

7, pp. 85-97, Feb 2006.

[6] M. E. Hurles, et al., "The functional impact of structural variation in humans,"

Trends Genet, vol. 24, pp. 238-45, May 2008.

[7] B. E. Stranger, et al., "Relative impact of nucleotide and copy number

variation on gene expression phenotypes," Science, vol. 315, pp. 848-53, Feb 9 2007.

[8] E. Gonzalez, et al., "The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility," Science, vol. 307, pp. 1434-40, Mar 4 2005.

[9] M. Fanciulli, et al., "FCGR3B copy number variation is associated with

susceptibility to systemic, but not organ-specific, autoimmunity," Nat Genet, vol. 39, pp. 721-3, Jun 2007.

[10] F. C. Chen, et al., "Genomic divergence between human and chimpanzee estimated from large-scale alignments of genomic sequences," J Hered, vol.

92, pp. 481-9, Nov-Dec 2001.

[11] R. J. Britten, "Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels," Proc Natl Acad Sci U S A, vol. 99, pp.

13633-5, Oct 15 2002.

[12] C. Alkan, et al., "Genome structural variation discovery and genotyping," Nat Rev Genet, vol. 12, pp. 363-76, May 2011.

[13] D. R. Zerbino and E. Birney, "Velvet: algorithms for de novo short read

assembly using de Bruijn graphs," Genome Res, vol. 18, pp. 821-9, May 2008.

[14] M. J. Chaisson, et al., "De novo fragment assembly with short mate-paired reads: Does the read length matter?," Genome Res, vol. 19, pp. 336-46, Feb 2009.

[15] J. T. Simpson, et al., "ABySS: a parallel assembler for short read sequence data," Genome Res, vol. 19, pp. 1117-23, Jun 2009.

50

[16] R. Li, et al., "De novo assembly of human genomes with massively parallel short read sequencing," Genome Res, vol. 20, pp. 265-72, Feb 2010.

[17] J. O. Korbel, et al., "PEMer: a computational framework with

simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data," Genome Biol, vol. 10, p. R23, 2009.

[18] K. Chen, et al., "BreakDancer: an algorithm for high-resolution mapping of genomic structural variation," Nat Methods, vol. 6, pp. 677-81, Sep 2009.

[19] S. Lee, et al., "MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions," Nat Methods, vol. 6, pp. 473-4, Jul 2009.

[20] P. J. Campbell, et al., "Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing," Nat Genet, vol. 40, pp. 722-9, Jun 2008.

[21] D. Y. Chiang, et al., "High-resolution mapping of copy-number alterations with massively parallel sequencing," Nat Methods, vol. 6, pp. 99-103, Jan 2009.

[22] K. Ye, et al., "Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads,"

Bioinformatics, vol. 25, pp. 2865-71, Nov 1 2009.

[23] M. Brudno, et al., "Glocal alignment: finding rearrangements during alignment," Bioinformatics, vol. 19 Suppl 1, pp. i54-62, 2003.

[24] A. C. Darling, et al., "Mauve: multiple alignment of conserved genomic

sequence with rearrangements," Genome Res, vol. 14, pp. 1394-403, Jul 2004.

[25] J. R. Miller, et al., "Assembly algorithms for next-generation sequencing data,"

Genomics, vol. 95, pp. 315-27, Jun 2010.

[26] W. Zhang, et al., "A practical comparison of de novo genome assembly

software tools for next-generation sequencing technologies," PLoS One, vol. 6, p. e17915, 2011.

[27] R. L. Warren, et al., "Assembling millions of short DNA sequences using SSAKE," Bioinformatics, vol. 23, pp. 500-1, Feb 15 2007.

[28] W. R. Jeck, et al., "Extending assembly of short DNA sequences to handle error," Bioinformatics, vol. 23, pp. 2942-4, Nov 1 2007.

[29] D. Hernandez, et al., "De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer," Genome Res, vol. 18, pp.

802-9, May 2008.

[30] B. Schmidt, et al., "A fast hybrid short read fragment assembly algorithm,"

Bioinformatics, vol. 25, pp. 2279-80, Sep 1 2009.

[31] A. Morgulis, et al., "A fast and symmetric DUST implementation to mask low-complexity DNA sequences," J Comput Biol, vol. 13, pp. 1028-40, Jun 2006.

51

[32] D. R. Kelley, et al., "Quake: quality-aware detection and correction of sequencing errors," Genome Biol, vol. 11, p. R116, 2010.

[33] M. Boetzer, et al., "Scaffolding pre-assembled contigs using SSPACE,"

Bioinformatics, vol. 27, pp. 578-9, Feb 15 2011.

[34] M. C. Schatz, et al., "Assembly of large genomes using second-generation sequencing," Genome Res, vol. 20, pp. 1165-73, Sep 2010.

[35] I. Milne, et al., "Tablet--next generation sequence assembly visualization,"

Bioinformatics, vol. 26, pp. 401-2, Feb 1 2010.

[36] E. Lyons and M. Freeling, "How to usefully compare homologous plant genes and chromosomes as DNA sequences," Plant J, vol. 53, pp. 661-73, Feb 2008.

[37] T. Zimmermann, et al., "Cloning and characterization of the promoter of Hugl-2, the human homologue of Drosophila lethal giant larvae (lgl) polarity gene," Biochem Biophys Res Commun, vol. 366, pp. 1067-73, Feb 22 2008.

[38] J. Pei and N. V. Grishin, "PROMALS: towards accurate multiple sequence alignments of distantly related proteins," Bioinformatics, vol. 23, pp. 802-8, Apr 1 2007.

[39] M. Tomomura, et al., "Structural and functional analysis of the

apoptosis-associated tyrosine kinase (AATYK) family," Neuroscience, vol. 148, pp. 510-21, Aug 24 2007.

[40] J. E. Janecka, et al., "Molecular and genomic data identify the closest living relative of primates," Science, vol. 318, pp. 792-4, Nov 2 2007.

[41] Y. Wang, et al., "Horizontal transfer of genetic determinants for degradation of phenol between the bacteria living in plant and its rhizosphere," Appl Microbiol Biotechnol, vol. 77, pp. 733-9, Dec 2007.

[42] K. Goyal, et al., "Multiple gene duplication and rapid evolution in the groEL gene: functional implications," J Mol Evol, vol. 63, pp. 781-7, Dec 2006.

[43] C. D. Town, et al., "Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy," Plant Cell, vol. 18, pp. 1348-59, Jun 2006.

[44] T. A. Tatusova and T. L. Madden, "BLAST 2 Sequences, a new tool for

comparing protein and nucleotide sequences," FEMS Microbiol Lett, vol. 174, pp. 247-50, May 15 1999.

[45] S. F. Altschul, et al., "Basic local alignment search tool," J Mol Biol, vol. 215, pp.

403-10, Oct 5 1990.

[46] S. F. Altschul, et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res, vol. 25, pp. 3389-402, Sep 1 1997.

52

[47] "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature, vol. 437, pp. 69-87, Sep 1 2005.

[48] F. C. Chen, et al., "Human-specific insertions and deletions inferred from mammalian genome sequences," Genome Res, vol. 17, pp. 16-22, Jan 2007.

53

List of Publications

Journal papers

Te-Chin Chu, Tsunglin Liu, D.T. Lee, Greg C. Lee, and Arthur Chun-Chieh Shih,

"GR-Aligner: an algorithm for aligning pairwise genomic sequences containing rearrangement events,"Bioinformatics, volume 25, number 17, pages 2188-2193, 2009.

Tzi-Yuan Wang, Hsin-Liang Chen, Mei-Yeh J Lu, Yo-Chia Chen, Huang-Mo Sung, Chi-Tang Mao, Hsing-Yi Cho, Huei-Mien Ke, Teh-Yang Hwa, Sz-Kai Ruan, Kuo-Yen Hung, Chih-Kuan Chen, Jeng-Yi Li, Yueh-Chin Wu, Yu-Hsiang Chen, Shao-Pei Chou, Ya-Wen Tsai, Te-Chin Chu, Chun-Chieh A Shih, Wen-Hsiung Li and Ming-Che Shih,

“Functional characterization of cellulases identified from the cow rumen fungus Neocallimastix patriciarum W5 by transcriptomic and secretomic analyses,”

Biotechnology for Biofuels, volume 4, number 24, 2011.

Posters

Te-Chin Chu, Tsunglin Liu, D.T. Lee, Greg C. Lee, and Arthur Chun-Chieh Shih,

“Analyzing the Breakpoint Regions of Genomic Rearrangement Events at Nucleotide Level by Sequence Alignment,” Poster sessions of 10th International Conference on Systems Biology, August 2009.

相關文件